Documentation Index
Fetch the complete documentation index at: https://docs.agno.com/llms.txt
Use this file to discover all available pages before exploring further.
Code
from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.google import Gemini
from agno.os.app import AgentOS
from agno.os.interfaces.telegram import Telegram
from agno.tools.dalle import DalleTools
from agno.tools.eleven_labs import ElevenLabsTools
agent_db = SqliteDb(
session_table="telegram_media_sessions", db_file="tmp/telegram_media.db"
)
media_agent = Agent(
name="Media Agent",
model=Gemini(id="gemini-2.5-pro"),
db=agent_db,
tools=[
DalleTools(model="dall-e-3", size="1024x1024", quality="standard"),
ElevenLabsTools(
enable_text_to_speech=True,
enable_generate_sound_effect=True,
enable_get_voices=False,
),
],
instructions=[
"You are a helpful multimedia assistant on Telegram.",
"When asked to generate, create, or draw an image, use the DALL-E tool.",
"When asked to speak, read aloud, or convert text to speech, use the ElevenLabs text_to_speech tool.",
"When asked for a sound effect, use the ElevenLabs generate_sound_effect tool.",
"Keep text responses concise and friendly.",
"You can also analyze images, audio, and video that users send you.",
],
add_history_to_context=True,
num_history_runs=3,
add_datetime_to_context=True,
markdown=True,
)
agent_os = AgentOS(
agents=[media_agent],
interfaces=[
Telegram(
agent=media_agent,
reply_to_mentions_only=True,
)
],
)
app = agent_os.get_app()
if __name__ == "__main__":
agent_os.serve(app="agent_with_media:app", reload=True)
Usage
Set up your virtual environment
uv venv --python 3.12
source .venv/bin/activate
Set Environment Variables
export TELEGRAM_TOKEN=your-bot-token-from-botfather
export GOOGLE_API_KEY=your-google-api-key
export OPENAI_API_KEY=your-openai-api-key
export ELEVENLABS_API_KEY=your-elevenlabs-api-key
export APP_ENV=development
Install dependencies
uv pip install -U "agno[telegram]" openai elevenlabs
Run Example
python agent_with_media.py
Key Features
- Image Generation: DALL-E 3 generates images from text prompts, sent as native Telegram photos
- Text-to-Speech: ElevenLabs converts text to audio, sent as Telegram voice messages
- Sound Effects: Generate sound effects from descriptions
- Inbound Media Analysis: Analyze images, audio, video, and documents sent by users
- Persistent Memory: SQLite database for session storage