Multimodal Team

Code

multimodal_team.py

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.os.app import AgentOS
from agno.os.interfaces.slack import Slack
from agno.team import Team
from agno.tools.dalle import DalleTools
from agno.tools.websearch import WebSearchTools

vision_analyst = Agent(
    name="Vision Analyst",
    model=OpenAIChat(id="gpt-4o"),
    role="Analyzes images, files, and visual content in detail.",
    instructions=[
        "You are an expert visual analyst.",
        "When given an image, describe it thoroughly: subjects, colors, composition, text, mood.",
        "When given files (CSV, code, text), analyze their content and provide insights.",
        "Always format with markdown: bold, italics, bullet points.",
    ],
    markdown=True,
)

creative_agent = Agent(
    name="Creative Agent",
    model=OpenAIChat(id="gpt-4o"),
    role="Generates images with DALL-E and searches the web.",
    tools=[DalleTools(), WebSearchTools()],
    instructions=[
        "You are a creative assistant with image generation abilities.",
        "Use DALL-E to generate images when asked.",
        "Use web search when you need reference information.",
        "Describe generated images briefly after creation.",
    ],
    markdown=True,
)

multimodal_team = Team(
    name="Multimodal Team",
    mode="coordinate",
    model=OpenAIChat(id="gpt-4o"),
    members=[vision_analyst, creative_agent],
    instructions=[
        "Route image analysis and file analysis tasks to Vision Analyst.",
        "Route image generation and web search tasks to Creative Agent.",
        "If the user sends an image and asks to recreate/modify it, first ask Vision Analyst to describe it, then ask Creative Agent to generate a new version.",
    ],
    show_members_responses=False,
    markdown=True,
)

agent_os = AgentOS(
    teams=[multimodal_team],
    interfaces=[
        Slack(
            team=multimodal_team,
            streaming=True,
            reply_to_mentions_only=True,
            suggested_prompts=[
                {
                    "title": "Analyze",
                    "message": "Send me an image and I'll analyze it in detail",
                },
                {
                    "title": "Generate",
                    "message": "Generate an image of a sunset over mountains",
                },
                {"title": "Search", "message": "Search for the latest AI art trends"},
            ],
        )
    ],
)
app = agent_os.get_app()


if __name__ == "__main__":
    agent_os.serve(app="multimodal_team:app", reload=True)

Usage

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate

Set Environment Variables

export SLACK_TOKEN=xoxb-your-bot-user-token
export SLACK_SIGNING_SECRET=your-signing-secret
export OPENAI_API_KEY=your-openai-api-key

Install Dependencies

uv pip install -U "agno[slack]" openai

Run Example

python multimodal_team.py

Key Features

Coordinated Multi-Agent Team: A coordinator routes tasks to specialized members based on the request type
Vision Analysis: The Vision Analyst processes images and files shared in Slack using GPT-4o multimodal capabilities
Image Generation: The Creative Agent generates images with DALL-E and searches the web for reference material
Suggested Prompts: Pre-configured prompt buttons appear in the Slack assistant drawer for quick access

Get Started

AgentOS

Example Usage

Multimodal Team

Code

Usage

Key Features

Get Started

AgentOS

Example Usage

Documentation Index

​Code

​Usage

​Key Features

Code

Usage

Key Features