Skip to main content
This example demonstrates how to analyze images and generate structured output using Pydantic models, creating movie scripts based on image content.

Code

image_to_structured_output.py
from typing import List

from agno.agent import Agent
from agno.media import Image
from agno.models.openai import OpenAIResponses
from pydantic import BaseModel, Field
from rich.pretty import pprint


class MovieScript(BaseModel):
    name: str = Field(..., description="Give a name to this movie")
    setting: str = Field(
        ..., description="Provide a nice setting for a blockbuster movie."
    )
    characters: List[str] = Field(..., description="Name of characters for this movie.")
    storyline: str = Field(
        ..., description="3 sentence storyline for the movie. Make it exciting!"
    )


agent = Agent(model=OpenAIResponses(id="gpt-5.2"), output_schema=MovieScript)

response = agent.run(
    "Write a movie about this image",
    images=[
        Image(
            url="https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg"
        )
    ],
    stream=True,
)

for event in response:
    pprint(event.content)

Usage

1

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate
2

Install dependencies

uv pip install -U agno openai pydantic rich
3

Export your OpenAI API key

  export OPENAI_API_KEY="your_openai_api_key_here"
4

Run Agent

python image_to_structured_output.py