Knowledge
PDF Bytes Knowledge Base
Learn how to use in-memory PDF bytes in your knowledge base.
The PDFBytesKnowledgeBase reads PDF content from bytes or IO streams, converts them into vector embeddings and loads them to a vector database. This is useful when working with dynamically generated PDFs, API responses, or file uploads without needing to save files to disk.
Usage
We are using a local LanceDB database for this example. Make sure it’s running
knowledge_base.py
Params
Parameter | Type | Default | Description |
---|---|---|---|
pdfs | Union[List[bytes], List[IO]] | - | List of PDF content as bytes or IO streams. |
exclude_files | List[str] | [] | List of file patterns to exclude (inherited from base class). |
reader | Union[PDFReader, PDFImageReader] | PDFReader() | A PDFReader or PDFImageReader that converts the PDFs into Documents for the vector database. |
PDFBytesKnowledgeBase
is a subclass of the AgentKnowledge class and has access to the same params.
Developer Resources
- View Sync loading Cookbook
- View Async loading Cookbook