Skip to main content
Code chunking splits code based on its structure, leveraging Abstract Syntax Trees (ASTs) to create contextually relevant segments. It uses the Chonkie library to identify natural code boundaries like functions, classes, and blocks. This preserves code semantics better than fixed-size chunking by ensuring related code stays together in the same chunk, while splitting occurs at meaningful structural boundaries.
ParameterTypeDefaultDescription
tokenizerUnion[str, TokenizerProtocol]"character"The tokenizer for measuring chunk sizes. Supports several built-in tokenizers or a custom Tokenizer instance.
chunk_sizeint2048Maximum size of each chunk in tokens (based on the selected tokenizer).
languageUnion[Literal["auto"], Any]"auto"The programming language to parse. Use "auto" for automatic detection or specify a tree-sitter language name (e.g., "python", "javascript", "go", "rust").
include_nodesboolFalseWhether to include AST nodes. Note: Chonkie's base Chunk type does not store node information.
chunker_paramsOptional[Dict[str, Any]]NoneAdditional parameters to pass directly to Chonkie's CodeChunker.