Document Chunking

Document chunking is a method of splitting documents into smaller chunks based on document structure like paragraphs and sections. It analyzes natural document boundaries rather than splitting at fixed character counts. This is useful when you want to process large documents while preserving semantic meaning and context.

Create a Python file

touch document_chunking.py

Add the following code to your Python file

document_chunking.py

import asyncio
from agno.agent import Agent
from agno.knowledge.chunking.document import DocumentChunking
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.pdf_reader import PDFReader
from agno.vectordb.pgvector import PgVector

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

knowledge = Knowledge(
    vector_db=PgVector(table_name="recipes_document_chunking", db_url=db_url),
)

asyncio.run(knowledge.add_content_async(
    url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
    reader=PDFReader(
        name="Document Chunking Reader",
        chunking_strategy=DocumentChunking(),
    ),
))

agent = Agent(
    knowledge=knowledge,
    search_knowledge=True,
)

agent.print_response("How to make Thai curry?", markdown=True)

Create a virtual environment

Open the Terminal and create a python virtual environment.

python3 -m venv .venv
source .venv/bin/activate

Install libraries

pip install -U sqlalchemy psycopg pgvector agno

Run PgVector

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  -v pgvolume:/var/lib/postgresql/data \
  -p 5532:5432 \
  --name pgvector \
  agno/pgvector:16

Run the script

python document_chunking.py

Document Chunking Params

Parameter	Type	Default	Description
`chunk_size`	`int`	`5000`	The maximum size of each chunk.
`overlap`	`int`	`0`	The number of characters to overlap between chunks.

Get Started

Basics

Context Management

Execution Control

Additional Features

Integrations

Help

Document Chunking

Document Chunking Params

Get Started

Basics

Context Management

Execution Control

Additional Features

Integrations

Help

​Document Chunking Params

Document Chunking Params