This agent can take audio inputs.

audio_input.py
from agno.agent import Agent
from agno.media import Audio
from agno.models.openai import OpenAIChat

url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"

agent = Agent(
    model=OpenAIChat(id="gpt-4o-audio-preview", modalities=["text"]),
    markdown=True,
)

if __name__ == "__main__":
    agent.print_response(
        "What is in this audio?", audio=[Audio(url=url, format="wav")], stream=True
    )

Usage

1

Install libraries

pip install -U agno openai
2

Export API keys

export OPENAI_API_KEY=***
3

Run the agent

python audio_input.py