Agno supports audio as input to agents and teams. Take a look at the compatibility matrix to see which models support audio as input.Let’s create an agent that can understand audio input.
audio_agent.py
Copy
Ask AI
import base64import requestsfrom agno.agent import Agent, RunOutput # noqafrom agno.media import Audiofrom agno.models.openai import OpenAIChat# Fetch the audio file and convert it to a base64 encoded stringurl = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"response = requests.get(url)response.raise_for_status()wav_data = response.contentagent = Agent( model=OpenAIChat(id="gpt-5-mini-audio-preview", modalities=["text"]), markdown=True,)agent.print_response( "What is in this audio?", audio=[Audio(content=wav_data, format="wav")])