Code

from pathlib import Path

from agno.agent import Agent
from agno.media import Image
from agno.models.openai import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    agent_id="image-to-text",
    name="Image to Text Agent",
    markdown=True,
    debug_mode=True,
    show_tool_calls=True,
    instructions=[
        "You are an AI agent that can generate text descriptions based on an image.",
        "You have to return a text response describing the image.",
    ],
)
image_path = Path(__file__).parent.joinpath("sample.jpg")
agent.print_response(
    "Write a 3 sentence fiction story about the image",
    images=[Image(filepath=image_path)],
    stream=True,
)

Usage

1

Create a virtual environment

Open the Terminal and create a python virtual environment.

python3 -m venv .venv
source .venv/bin/activate
2

Set your API key

export OPENAI_API_KEY=xxx
3

Install libraries

pip install -U openai agno
4

Run Agent

python cookbook/agent_concepts/multimodal/image_to_text_agent.py