Agno agents support text, image, audio and video inputs and can generate text, image, audio and video outputs. For a complete overview, please checkout the compatibility matrix.
from agno.agent import Agentfrom agno.media import Imagefrom agno.models.openai import OpenAIChatfrom agno.tools.duckduckgo import DuckDuckGoToolsagent = Agent( model=OpenAIChat(id="gpt-4o"), tools=[DuckDuckGoTools()], markdown=True,)agent.print_response( "Tell me about this image and give me the latest news about it.", images=[ Image( url="https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg" ) ], stream=True,)
Run the agent:
Copy
Ask AI
python image_agent.py
Similar to images, you can also use audio and video as an input.
The following example demonstrates how to generate an image using DALL-E with an agent.
image_agent.py
Copy
Ask AI
from agno.agent import Agentfrom agno.models.openai import OpenAIChatfrom agno.tools.dalle import DalleToolsimage_agent = Agent( model=OpenAIChat(id="gpt-4o"), tools=[DalleTools()], description="You are an AI agent that can generate images using DALL-E.", instructions="When the user asks you to create an image, use the `create_image` tool to create the image.", markdown=True, show_tool_calls=True,)image_agent.print_response("Generate an image of a white siamese cat")images = image_agent.get_images()if images and isinstance(images, list): for image_response in images: image_url = image_response.url print(image_url)
The following example demonstrates how to obtain both text and audio responses from an agent. The agent will respond with text and audio bytes that can be saved to a file.
audio_agent.py
Copy
Ask AI
from agno.agent import Agent, RunResponsefrom agno.models.openai import OpenAIChatfrom agno.utils.audio import write_audio_to_fileagent = Agent( model=OpenAIChat( id="gpt-4o-audio-preview", modalities=["text", "audio"], audio={"voice": "alloy", "format": "wav"}, ), markdown=True,)response: RunResponse = agent.run("Tell me a 5 second scary story")# Save the response audio to a fileif response.response_audio is not None: write_audio_to_file( audio=agent.run_response.response_audio.content, filename="tmp/scary_story.wav" )
You can create Agents that can take multimodal inputs and return multimodal outputs. The following example demonstrates how to provide a combination of audio and text inputs to an agent and obtain both text and audio outputs.