Video Caption Agent

This example demonstrates how to create an Agno agent that can process videos to generate and embed captions using MoviePy and OpenAI tools.

Code

from agno.agent import Agent
from agno.models.openai import OpenAIResponses
from agno.tools.moviepy_video import MoviePyVideoTools
from agno.tools.openai import OpenAITools

video_tools = MoviePyVideoTools(
    process_video=True, generate_captions=True, embed_captions=True
)

openai_tools = OpenAITools()

video_caption_agent = Agent(
    name="Video Caption Generator Agent",
    model=OpenAIResponses(
        id="gpt-5.2",
    ),
    tools=[video_tools, openai_tools],
    description="You are an AI agent that can generate and embed captions for videos.",
    instructions=[
        "When a user provides a video, process it to generate captions.",
        "Use the video processing tools in this sequence:",
        "1. Extract audio from the video using extract_audio",
        "2. Transcribe the audio using transcribe_audio",
        "3. Generate SRT captions using create_srt",
        "4. Embed captions into the video using embed_captions",
    ],
    markdown=True,
)

video_caption_agent.print_response(
    "Generate captions for {video with location} and embed them in the video"
)

Usage

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate

Set your API key

export OPENAI_API_KEY=xxx

Install dependencies

uv pip install -U openai moviepy ffmpeg agno

Run Agent

python cookbook/agent_basics/multimodal/video_caption_agent.py

Get Started

Basics

Advanced

Other

Video Caption Agent

Code

Usage

Get Started

Basics

Advanced

Other

​Code

​Usage

Code

Usage