Skip to main content
The Docling Reader processes multiple document formats using IBM’s Docling library. It handles PDFs, documents, presentations, spreadsheets, images, audio, video and markup files.

Code

from agno.agent import Agent
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.docling_reader import DoclingReader
from agno.vectordb.pgvector import PgVector

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

# Create a knowledge base with docling reader
knowledge = Knowledge(
    vector_db=PgVector(
        table_name="docling_documents",
        db_url=db_url,
    )
)

# Add documents using DoclingReader
knowledge.insert(
    path="documents/report.pdf",
    reader=DoclingReader(),
)

# Create an agent with the knowledge base
agent = Agent(
    knowledge=knowledge,
    search_knowledge=True,
)

# Query the knowledge base
agent.print_response(
    "Summarize the key findings from the report",
    markdown=True,
)

Usage

1

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate
2

Install dependencies

# Base dependencies
uv pip install -U docling sqlalchemy psycopg pgvector agno openai

# For audio/video processing
uv pip install -U openai-whisper
Install ffmpeg (required for audio/video processing):
3

Set environment variables

export OPENAI_API_KEY=xxx
4

Run PgVector

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  -v pgvolume:/var/lib/postgresql/data \
  -p 5532:5432 \
  --name pgvector \
  agno/pgvector:16
5

Run Agent

python examples/basics/knowledge/concepts/readers/overview/docling_reader_sync.py

Params

ParameterTypeDefaultDescription
output_formatstr"markdown"Export format ("markdown", "text", "json", "yaml", "html", "html_split_page", "doctags", "vtt")
converterOptional[DocumentConverter]NoneCustom Docling converter configuration
format_optionsOptional[dict]NoneFormat options dictionary for DocumentConverter
chunking_strategyOptional[ChunkingStrategy]DocumentChunking()Strategy for chunking the document