Skip to main content
This example demonstrates how to create an Agno agent that can process videos to generate and embed captions using MoviePy and OpenAI tools.

Code

from agno.agent import Agent
from agno.models.openai import OpenAIResponses
from agno.tools.moviepy_video import MoviePyVideoTools
from agno.tools.openai import OpenAITools

video_tools = MoviePyVideoTools(
    process_video=True, generate_captions=True, embed_captions=True
)

openai_tools = OpenAITools()

video_caption_agent = Agent(
    name="Video Caption Generator Agent",
    model=OpenAIResponses(
        id="gpt-5.2",
    ),
    tools=[video_tools, openai_tools],
    description="You are an AI agent that can generate and embed captions for videos.",
    instructions=[
        "When a user provides a video, process it to generate captions.",
        "Use the video processing tools in this sequence:",
        "1. Extract audio from the video using extract_audio",
        "2. Transcribe the audio using transcribe_audio",
        "3. Generate SRT captions using create_srt",
        "4. Embed captions into the video using embed_captions",
    ],
    markdown=True,
)

video_caption_agent.print_response(
    "Generate captions for {video with location} and embed them in the video"
)

Usage

1

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate
2

Set your API key

export OPENAI_API_KEY=xxx
3

Install dependencies

uv pip install -U openai moviepy ffmpeg agno
4

Run Agent

python cookbook/agent_basics/multimodal/video_caption_agent.py