Multimodal - Agno

Example	Description
Audio Input Output	Fetch the audio file and convert it to a base64 encoded string.
Audio Sentiment Analysis	Give a sentiment analysis of this audio conversation. Use speaker A, speaker B to identify speakers.
Audio Streaming	Mono (Change to 2 if Stereo).
Audio To Text	Give a transcript of this audio conversation. Use speaker A, speaker B to identify speakers.
Image To Audio	Convert image descriptions to audio output.
Image To Image	Transform images using agent-driven processing.
Image To Structured Output	Extract structured data from images.
Image To Text	Image to Text Example.
Media Input For Tool	Example showing how tools can access media (images, videos, audio, files) passed to the agent.
Video Caption	Generate captions from video content.

⌘I