1
Create a Python file
2
Set up your virtual environment
3
Install dependencies
4
Set OpenAI Key
5
Set your
OPENAI_API_KEY as an environment variable. You can get one from OpenAI.6
Mac
Windows
7
Run PgVector
8
Run the script
Semantic Chunking Params
| Parameter | Type | Default | Description |
|---|---|---|---|
embedder | Union[str, Embedder, BaseEmbeddings] | OpenAIEmbedder | The embedder configuration. Can be an Agno Embedder (e.g., OpenAIEmbedder, GeminiEmbedder), a Chonkie BaseEmbeddings instance (e.g., OpenAIEmbeddings), or a string model identifier (e.g., "text-embedding-3-small") for Chonkie's AutoEmbeddings. |
chunk_size | int | 5000 | Maximum tokens allowed per chunk. |
similarity_threshold | float | 0.5 | Similarity threshold for grouping sentences (0-1). Lower values create larger groups (fewer chunks). |
similarity_window | int | 3 | Number of sentences to consider for similarity calculation. |
min_sentences_per_chunk | int | 1 | Minimum number of sentences per chunk. |
min_characters_per_sentence | int | 24 | Minimum number of characters per sentence. |
delimiters | List[str] | [". ", "! ", "? ", "\n"] | Delimiters to split sentences on. |
include_delimiters | Literal["prev", "next", None] | "prev" | Include delimiters in the chunk text. Specify whether to include with the previous or next sentence. |
skip_window | int | 0 | Number of groups to skip when looking for similar content to merge. 0 (default) uses standard semantic grouping; higher values enable merging of non-consecutive semantically similar groups. |
filter_window | int | 5 | Window length for the Savitzky-Golay filter used in boundary detection. |
filter_polyorder | int | 3 | Polynomial order for the Savitzky-Golay filter. |
filter_tolerance | float | 0.2 | Tolerance for the Savitzky-Golay filter boundary detection. |
chunker_params | Dict[str, Any] | None | Additional parameters to pass directly to Chonkie's SemanticChunker. |