Code chunking splits code based on its structure, leveraging Abstract Syntax Trees (ASTs) to create contextually relevant segments. It uses the Chonkie library to identify natural code boundaries like functions, classes, and blocks. Learn more about code chunking. This preserves code semantics better than fixed-size chunking by ensuring related code stays together in the same chunk, while splitting occurs at meaningful structural boundaries. Code chunking supports several built-in tokenizers or a customDocumentation Index
Fetch the complete documentation index at: https://docs.agno.com/llms.txt
Use this file to discover all available pages before exploring further.
Tokenizer instance.
Set your
OPENAI_API_KEY as an environment variable. You can get one from OpenAI.Code Chunking Params
| Parameter | Type | Default | Description |
|---|---|---|---|
tokenizer | Union[str, TokenizerProtocol] | "character" | The tokenizer for measuring chunk sizes. Supports several built-in tokenizers or a custom Tokenizer instance. |
chunk_size | int | 2048 | Maximum size of each chunk in tokens (based on the selected tokenizer). |
language | Union[Literal["auto"], Any] | "auto" | The programming language to parse. Use "auto" for automatic detection or specify a tree-sitter language name (e.g., "python", "javascript", "go", "rust"). |
include_nodes | bool | False | Whether to include AST nodes. Note: Chonkie's base Chunk type does not store node information. |
chunker_params | Optional[Dict[str, Any]] | None | Additional parameters to pass directly to Chonkie's CodeChunker. |