Document chunking is a method of splitting documents into smaller chunks based on document structure like paragraphs and sections.
It analyzes natural document boundaries rather than splitting at fixed character counts. This is useful when you want to process large documents while preserving semantic meaning and context.
Parameter
Type
Default
Description
chunk_size
int
5000
The maximum size of each chunk.
overlap
int
0
The number of characters to overlap between chunks.