Docling Reader

The Docling Reader processes multiple document formats using IBM’s Docling library. It handles PDFs, documents, presentations, spreadsheets, images, audio, video and markup files.

Code

from agno.agent import Agent
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.docling_reader import DoclingReader
from agno.vectordb.pgvector import PgVector

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

# Create a knowledge base with docling reader
knowledge = Knowledge(
    vector_db=PgVector(
        table_name="docling_documents",
        db_url=db_url,
    )
)

# Add documents using DoclingReader
knowledge.insert(
    path="documents/report.pdf",
    reader=DoclingReader(),
)

# Create an agent with the knowledge base
agent = Agent(
    knowledge=knowledge,
    search_knowledge=True,
)

# Query the knowledge base
agent.print_response(
    "Summarize the key findings from the report",
    markdown=True,
)

Usage

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate

Install dependencies

# Base dependencies
uv pip install -U docling sqlalchemy psycopg pgvector agno openai

# For audio/video processing
uv pip install -U openai-whisper

Install ffmpeg (required for audio/video processing):

macOS: brew install ffmpeg
Ubuntu: sudo apt-get install ffmpeg
Windows: Download from https://ffmpeg.org/download.html

Set environment variables

export OPENAI_API_KEY=xxx

Run PgVector

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  -v pgvolume:/var/lib/postgresql/data \
  -p 5532:5432 \
  --name pgvector \
  agno/pgvector:16

Run Agent

python examples/basics/knowledge/concepts/readers/overview/docling_reader_sync.py

Params

Parameter	Type	Default	Description
`output_format`	`str`	`"markdown"`	Export format (`"markdown"`, `"text"`, `"json"`, `"yaml"`, `"html"`, `"html_split_page"`, `"doctags"`, `"vtt"`)
`converter`	`Optional[DocumentConverter]`	`None`	Custom Docling converter configuration
`format_options`	`Optional[dict]`	`None`	Format options dictionary for DocumentConverter
`chunking_strategy`	`Optional[ChunkingStrategy]`	`DocumentChunking()`	Strategy for chunking the document

​Code

​Usage

​Params

Code

Usage

Params