Introduction
Welcome to Agnoβs Hackathon Documentation!
Here youβll find setup guides, examples, and resources to bring your multimodal agents to life. For dedicated hackathon support, join our discord where our team is ready to help you.
Quick Start Guide
Setup
Get your environment ready for building Agents
Ready-to-Use Code
Copy-paste solutions for common use cases
Replit Template
Fork our pre-built Replit template
Text Examples
Simple Text Agent
Agent with text input and output
Agent with Tools
Agent with tools to search the web
Agent with Knowledge
Agent with a knowledge base it can search
Agent with Structured Outputs
Agent with a structured output (pydantic object)
Research Agent
Agent that searches Exa and generates a report in a consistent format
YouTube Agent
Analyze YouTube videos and provide detailed summaries, timestamps, and key points
Image Processing & Generation
Image Input + Tools
Agent that takes image input and makes tool calls to search the web
Generate Image
Agent that generates images using DALL-E
Image to Structured Output
Agent that extracts structured data from images
Generate Audio from Image
Agent that generates audio from an image
Image Input + Output
Agent that takes an image input and outputs an image
Image Transcription
Agent that transcribes images using Mistral
Image Search using Giphy
Agent that searches for gifs using Giphy
Audio Processing & Generation
Audio Input
Agent that takes audio input
Audio Input + Output
Agent that takes audio input and outputs audio
Audio Sentiment
Agent that takes audio input and outputs a sentiment analysis
Audio Transcript
Agent that takes audio input and outputs a transcript of the audio
Audio Multi Turn
Agent that continues an audio input - output conversation
Audio Generate Podcast
Agent that generates a podcast from an audio input
Video Processing & Generation
Video Input
Agent that takes video input
Video Generation with Models Lab
Agent that generates videos using Models Lab
Video Generation with Replicate
Agent that generates videos using Replicate
Video Captions
Agent that generates captions for videos
Video to Shorts
Agent that converts videos to shorts
Streamlit Applications
A list of streamlit applications built with Agno that you can use as a starting point for your hackathon project.
Answer Engine
A powerful answer engine that combines web search and exa search to answer questions
Chess Game
A simple chess game built with Agno
Geobuddy
Geeography agent that analyzes images to predict locations based on visible cues like landmarks, architecture, and cultural symbols.
Medical Imaging
Medical imaging analysis agent that analyzes medical images and provides detailed findings by utilizing models and external tools.
Game Generator
Game generator agent that generates games based on user input
SQL Agent
SQL agent that can query a database and return the results
Podcast Generator
Agent that generates a podcast from an audio input
Tic Tac Toe
Tic Tac Toe game
Models
Choose the right model for your project. Each has unique capabilities and strengths.
Gemini
Multimodal examples using Google Gemini
OpenAI
Multimodal examples using OpenAI
Ollama
Multimodal examples using Ollama models locally
Anthropic
Multimodal examples using Anthropic models like Claude
Groq
Multimodal examples using Groqβs fast inference
Mistral
Multimodal examples using Mistral models
More Examples
Model Capabilities at a Glance
Hereβs a quick comparison of multimodal support across models:
Model | Image | Audio | Video | Text |
---|---|---|---|---|
Gemini | β | β | β | β |
OpenAI | β | β | β | β |
Anthropic | β | β | β | β |
Groq | β | β | β | β |
Mistral | β | β | β | β |
For more details, see our compatibility matrix.