Tokenizer instance.
Set your
OPENAI_API_KEY as an environment variable. You can get one from OpenAI.Code Chunking Params
| Parameter | Type | Default | Description |
|---|---|---|---|
tokenizer | Union[str, TokenizerProtocol] | "character" | The tokenizer for measuring chunk sizes. Supports several built-in tokenizers or a custom Tokenizer instance. |
chunk_size | int | 2048 | Maximum size of each chunk in tokens (based on the selected tokenizer). |
language | Union[Literal["auto"], Any] | "auto" | The programming language to parse. Use "auto" for automatic detection or specify a tree-sitter language name (e.g., "python", "javascript", "go", "rust"). |
include_nodes | bool | False | Whether to include AST nodes. Note: Chonkie's base Chunk type does not store node information. |
chunker_params | Optional[Dict[str, Any]] | None | Additional parameters to pass directly to Chonkie's CodeChunker. |