Audio Input Output

This agent takes an audio input and responds with an audio output.

import httpx
from agno.agent import Agent
from agno.media import Audio
from agno.models.openai import OpenAIChat
from agno.utils.audio import write_audio_to_file

# Fetch the audio file and convert it to a base64 encoded string
url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"
response = httpx.get(url)
response.raise_for_status()
wav_data = response.content

agent = Agent(
    model=OpenAIChat(
        id="gpt-4o-audio-preview",
        modalities=["text", "audio"],
        audio={"voice": "alloy", "format": "wav"},
    ),
    markdown=True,
)

agent.run(
    "What's in these recording?",
    audio=[Audio(content=wav_data, format="wav")],
)

if agent.run_response.response_audio is not None:
    write_audio_to_file(
        audio=agent.run_response.response_audio.content, filename="tmp/result.wav"
    )

Usage

1

Install libraries

pip install -U agno openai
2

Export API keys

export OPENAI_API_KEY=***
3

Run the agent

python audio_input_output.py