> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agno.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Multimodal

> Examples for image/audio/video processing patterns.

| Example                                                                              | Description                                                                                          |
| ------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------- |
| [Audio Input Output](/examples/agents/multimodal/audio-input-output)                 | Fetch the audio file and convert it to a base64 encoded string.                                      |
| [Audio Sentiment Analysis](/examples/agents/multimodal/audio-sentiment-analysis)     | Give a sentiment analysis of this audio conversation. Use speaker A, speaker B to identify speakers. |
| [Audio Streaming](/examples/agents/multimodal/audio-streaming)                       | Mono (Change to 2 if Stereo).                                                                        |
| [Audio To Text](/examples/agents/multimodal/audio-to-text)                           | Give a transcript of this audio conversation. Use speaker A, speaker B to identify speakers.         |
| [Image To Audio](/examples/agents/multimodal/image-to-audio)                         | Convert image descriptions to audio output.                                                          |
| [Image To Image](/examples/agents/multimodal/image-to-image)                         | Transform images using agent-driven processing.                                                      |
| [Image To Structured Output](/examples/agents/multimodal/image-to-structured-output) | Extract structured data from images.                                                                 |
| [Image To Text](/examples/agents/multimodal/image-to-text)                           | Image to Text Example.                                                                               |
| [Media Input For Tool](/examples/agents/multimodal/media-input-for-tool)             | Example showing how tools can access media (images, videos, audio, files) passed to the agent.       |
| [Video Caption](/examples/agents/multimodal/video-caption)                           | Generate captions from video content.                                                                |