Skip to main content
Code chunking splits code based on its structure, leveraging Abstract Syntax Trees (ASTs) to create contextually relevant segments. It uses the Chonkie library to identify natural code boundaries like functions, classes, and blocks. Learn more about code chunking. This preserves code semantics better than fixed-size chunking by ensuring related code stays together in the same chunk, while splitting occurs at meaningful structural boundaries. Code chunking supports several built-in tokenizers or a custom Tokenizer instance.
1

Create a Python file

from agno.agent import Agent
from agno.knowledge.chunking.code import CodeChunking
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.text_reader import TextReader
from agno.vectordb.pgvector import PgVector

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

knowledge = Knowledge(
    vector_db=PgVector(table_name="python_code_chunking", db_url=db_url),
)

knowledge.insert(
    url="https://raw.githubusercontent.com/agno-agi/agno/main/libs/agno/agno/session/workflow.py",
    reader=TextReader(
        chunking_strategy=CodeChunking(
            tokenizer="gpt2",
            chunk_size=500,
            language="python",
        ),
    ),
)

agent = Agent(knowledge=knowledge, search_knowledge=True)
agent.print_response("How does the Workflow class work?", markdown=True)
2

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate
3

Install dependencies

uv pip install -U agno sqlalchemy psycopg pgvector "chonkie[code]" openai
4
Set OpenAI Key
5
Set your OPENAI_API_KEY as an environment variable. You can get one from OpenAI.
6
Mac
export OPENAI_API_KEY=sk-***
Windows
setx OPENAI_API_KEY sk-***
7

Run PgVector

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  -v pgvolume:/var/lib/postgresql/data \
  -p 5532:5432 \
  --name pgvector \
  agno/pgvector:16
8

Run the script

python code_chunking.py

Code Chunking Params

ParameterTypeDefaultDescription
tokenizerUnion[str, TokenizerProtocol]"character"The tokenizer for measuring chunk sizes. Supports several built-in tokenizers or a custom Tokenizer instance.
chunk_sizeint2048Maximum size of each chunk in tokens (based on the selected tokenizer).
languageUnion[Literal["auto"], Any]"auto"The programming language to parse. Use "auto" for automatic detection or specify a tree-sitter language name (e.g., "python", "javascript", "go", "rust").
include_nodesboolFalseWhether to include AST nodes. Note: Chonkie's base Chunk type does not store node information.
chunker_paramsOptional[Dict[str, Any]]NoneAdditional parameters to pass directly to Chonkie's CodeChunker.