Examples
- Examples
- Getting Started
- Agents
- Teams
- Workflows
- Applications
- Streamlit Apps
- Evals
Agent Concepts
- Reasoning
- Multimodal
- Audio Input Output
- Audio to text Agent
- Audio Sentiment Analysis Agent
- Blog to Podcast Agent
- Multi-turn Audio Agent
- Audio Streaming Agent
- Generate Images with Intermediate Steps
- Generate Music using Models Lab
- Generate Video using Models Lab
- Generate Video using Replicate
- Image to Audio Agent
- Image to Image Agent
- Image to Text Agent
- Video Caption Agent
- Video to Shorts Agent
- RAG
- User Control Flows
- Knowledge
- Memory
- Async
- Hybrid Search
- Storage
- Tools
- Vector Databases
- Context
- Embedders
- Agent State
- Observability
- Testing
- Miscellaneous
Models
- Anthropic
- AWS Bedrock
- AWS Bedrock Claude
- Azure AI Foundry
- Azure OpenAI
- Cerebras
- Cerebras OpenAI
- Cohere
- DeepInfra
- DeepSeek
- Fireworks
- Gemini
- Groq
- Hugging Face
- IBM
- LM Studio
- LiteLLM
- LiteLLM OpenAI
- Meta
- Mistral
- Nebius
- NVIDIA
- Ollama
- OpenAI
- Perplexity
- Together
- XAI
- Vercel
- vLLM
Multimodal
Image to Audio Agent
Code
Copy
Ask AI
from pathlib import Path
from agno.agent import Agent, RunResponse
from agno.media import Image
from agno.models.openai import OpenAIChat
from agno.utils.audio import write_audio_to_file
from rich import print
from rich.text import Text
image_agent = Agent(model=OpenAIChat(id="gpt-4o"))
image_path = Path(__file__).parent.joinpath("sample.jpg")
image_story: RunResponse = image_agent.run(
"Write a 3 sentence fiction story about the image",
images=[Image(filepath=image_path)],
)
formatted_text = Text.from_markup(
f":sparkles: [bold magenta]Story:[/bold magenta] {image_story.content} :sparkles:"
)
print(formatted_text)
audio_agent = Agent(
model=OpenAIChat(
id="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
),
)
audio_story: RunResponse = audio_agent.run(
f"Narrate the story with flair: {image_story.content}"
)
if audio_story.response_audio is not None:
write_audio_to_file(
audio=audio_story.response_audio.content, filename="tmp/sample_story.wav"
)
Usage
1
Create a virtual environment
Open the Terminal
and create a python virtual environment.
Copy
Ask AI
python3 -m venv .venv
source .venv/bin/activate
2
Set your API key
Copy
Ask AI
export OPENAI_API_KEY=xxx
3
Install libraries
Copy
Ask AI
pip install -U openai rich agno
4
Run Agent
Copy
Ask AI
python cookbook/agent_concepts/multimodal/image_to_audio.py
Was this page helpful?
Assistant
Responses are generated using AI and may contain mistakes.