Overview

Agno agents support text, image, audio and video inputs and can generate text, image, audio and video outputs. For a complete overview, please check out the compatibility matrix.

To get started, take a look at the multimodal examples.

Image

Image As Input

Learn how to use image as input with Agno agents.

Image As Output

Learn how to use image as output with Agno agents.

Image Generation

Learn how to use image generation with Agno agents.

Audio

Audio As Input

Learn how to use audio as input with Agno agents.

Audio As Output

Learn how to use audio as output with Agno agents.

Speech-to-Text

Learn how to use speech-to-text with Agno agents.

Audio Generation

Learn how to use audio generation with Agno agents.

Video

Video As Input

Learn how to use video as input with Agno agents.

Video Generation

Learn how to use video generation with Agno agents.

Files

Files As Input

Learn how to use files as input with Agno agents.

Files Generation

Learn how to use files generation with Agno agents.

Introduction

Learn

Help

Image

Image As Input

Image As Output

Image Generation

Audio

Audio As Input

Audio As Output

Speech-to-Text

Audio Generation

Video

Video As Input

Video Generation

Files

Files As Input

Files Generation

Introduction

Learn

Help

​Image

Image As Input

Image As Output

Image Generation

​Audio

Audio As Input

Audio As Output

Speech-to-Text

Audio Generation

​Video

Video As Input

Video Generation

​Files

Files As Input

Files Generation

Image

Audio

Video

Files