Chunking is the process of dividing content into manageable pieces before converting them into embeddings and storing them in vector databases. The chunking strategy you choose directly impacts search quality and retrieval accuracy. Different chunking strategies serve different purposes. For example, when processing a recipe book, different strategies produce different results:
  • Fixed Size: Splits text every 500 characters (which may break recipes mid-instruction)
  • Semantic: Keeps complete recipes together based on meaning
  • Document: Each page becomes a chunk
The strategy affects whether you get complete, relevant results or fragmented pieces.

Available Chunking Strategies

Using Chunking Strategies

Chunking strategies are configured when setting up readers for your knowledge base:
from agno.knowledge.chunking.semantic import SemanticChunking
from agno.knowledge.reader.pdf_reader import PDFReader
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.pgvector import PgVector
from agno.db.postgres import PostgresDb

# Configure chunking strategy with a reader
reader = PDFReader(
    chunking_strategy=SemanticChunking(similarity_threshold=0.7)
)

# Set up ContentsDB - tracks content metadata
contents_db = PostgresDb(
    db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
    knowledge_table="knowledge_contents"
)

# Set up vector database - stores embeddings
vector_db = PgVector(
    table_name="documents",
    db_url="postgresql+psycopg://ai:ai@localhost:5532/ai"
)

# Create Knowledge with both databases
knowledge = Knowledge(
    name="Chunking Knowledge Base",
    vector_db=vector_db,
    contents_db=contents_db
)

# Add content with chunking applied
knowledge.add_content(
    path="documents/cookbook.pdf",
    reader=reader,
)

Choosing a Strategy

The choice of chunking strategy depends on your content type and use case:
  • Text documents: Semantic chunking maintains context and meaning
  • Structured documents: Document or Markdown chunking preserves hierarchy
  • Tabular data: CSV Row chunking treats each row as a separate entity
  • Mixed content: Recursive chunking provides flexibility with multiple separators
  • Uniform processing: Fixed Size chunking ensures consistent chunk dimensions
Each reader has a default chunking strategy that works well for its content type, but you can override it by specifying a chunking_strategy parameter when configuring the reader.
Consider your specific use case and performance requirements when choosing a chunking strategy, since different strategies vary in processing time and memory usage.