Document chunking is a method of splitting documents into smaller chunks based on document structure like paragraphs and sections. It analyzes natural document boundaries rather than splitting at fixed character counts. This is useful when you want to process large documents while preserving semantic meaning and context.
Usage
from agno.agent import Agent
from agno.document.chunking.document import DocumentChunking
from agno.knowledge.pdf_url import PDFUrlKnowledgeBase
from agno.vectordb.pgvector import PgVector
db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
knowledge_base = PDFUrlKnowledgeBase(
urls=["https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
vector_db=PgVector(table_name="recipes_document_chunking", db_url=db_url),
chunking_strategy=DocumentChunking(),
)
knowledge_base.load(recreate=False) # Comment out after first run
agent = Agent(
knowledge_base=knowledge_base,
search_knowledge=True,
)
agent.print_response("How to make Thai curry?", markdown=True)
Document Chunking Params
Parameter | Type | Default | Description |
---|
chunk_size | int | 5000 | The maximum size of each chunk. |
overlap | int | 0 | The number of characters to overlap between chunks. |