Image to Structured Output

This example demonstrates how to analyze images and generate structured output using Pydantic models, creating movie scripts based on image content.

Code

image_to_structured_output.py

from typing import List

from agno.agent import Agent
from agno.media import Image
from agno.models.openai import OpenAIChat
from pydantic import BaseModel, Field
from rich.pretty import pprint


class MovieScript(BaseModel):
    name: str = Field(..., description="Give a name to this movie")
    setting: str = Field(
        ..., description="Provide a nice setting for a blockbuster movie."
    )
    characters: List[str] = Field(..., description="Name of characters for this movie.")
    storyline: str = Field(
        ..., description="3 sentence storyline for the movie. Make it exciting!"
    )


agent = Agent(model=OpenAIChat(id="gpt-5-mini"), output_schema=MovieScript)

response = agent.run(
    "Write a movie about this image",
    images=[
        Image(
            url="https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg"
        )
    ],
    stream=True,
)

for event in response:
    pprint(event.content)

Usage

Create a virtual environment

Open the Terminal and create a python virtual environment.

python3 -m venv .venv
source .venv/bin/activate

Install libraries

pip install -U agno openai pydantic rich

Export your OpenAI API key

  export OPENAI_API_KEY="your_openai_api_key_here"

Create a Python file

Create a Python file and add the above code.

touch image_to_structured_output.py

Run Agent

python image_to_structured_output.py

Find All Cookbooks

Explore all the available cookbooks in the Agno repository. Click the link below to view the code on GitHub:Agno Cookbooks on GitHub

Overview

Use Cases

Concepts

Models

Image to Structured Output

Code

Usage

Overview

Use Cases

Concepts

Models

​Code

​Usage

Code

Usage