1
Create a Python file
2
Add the following code to your Python file
3
Create a virtual environment
Open the
Terminal and create a python virtual environment.4
Install libraries
5
Set OpenAI Key
6
Set your
OPENAI_API_KEY as an environment variable. You can get one from OpenAI.7
Mac
Windows
8
Run PgVector
9
Run the script
Semantic Chunking Params
| Parameter | Type | Default | Description |
|---|---|---|---|
embedder | Union[str, Embedder, BaseEmbeddings] | OpenAIEmbedder | The embedder configuration. Can be an Agno Embedder (e.g., OpenAIEmbedder, GeminiEmbedder), a Chonkie BaseEmbeddings instance (e.g., OpenAIEmbeddings), or a string model identifier (e.g., "text-embedding-3-small") for Chonkie's AutoEmbeddings. |
chunk_size | int | 5000 | Maximum tokens allowed per chunk. |
similarity_threshold | float | 0.5 | Similarity threshold for grouping sentences (0-1). Lower values create larger groups (fewer chunks). |
similarity_window | int | 3 | Number of sentences to consider for similarity calculation. |
min_sentences_per_chunk | int | 1 | Minimum number of sentences per chunk. |
min_characters_per_sentence | int | 24 | Minimum number of characters per sentence. |
delimiters | List[str] | [". ", "! ", "? ", "\n"] | Delimiters to split sentences on. |
include_delimiters | Literal["prev", "next", None] | "prev" | Include delimiters in the chunk text. Specify whether to include with the previous or next sentence. |
skip_window | int | 0 | Number of groups to skip when looking for similar content to merge. 0 (default) uses standard semantic grouping; higher values enable merging of non-consecutive semantically similar groups. |
filter_window | int | 5 | Window length for the Savitzky-Golay filter used in boundary detection. |
filter_polyorder | int | 3 | Polynomial order for the Savitzky-Golay filter. |
filter_tolerance | float | 0.2 | Tolerance for the Savitzky-Golay filter boundary detection. |
chunker_params | Dict[str, Any] | None | Additional parameters to pass directly to Chonkie's SemanticChunker. |