The PDFBytesKnowledgeBase reads PDF content from bytes or IO streams, converts them into vector embeddings and loads them to a vector database. This is useful when working with dynamically generated PDFs, API responses, or file uploads without needing to save files to disk.

Usage

We are using a local LanceDB database for this example. Make sure it’s running

pip install pypdf
knowledge_base.py
from agno.agent import Agent
from agno.knowledge.pdf import PDFBytesKnowledgeBase
from agno.vectordb.lancedb import LanceDb

vector_db = LanceDb(
    table_name="recipes_async",
    uri="tmp/lancedb",
)

with open("data/pdfs/ThaiRecipes.pdf", "rb") as f:
    pdf_bytes = f.read()

knowledge_base = PDFBytesKnowledgeBase(
    pdfs=[pdf_bytes],
    vector_db=vector_db,
)
knowledge_base.load(recreate=False)  # Comment out after first run

agent = Agent(
    knowledge=knowledge_base,
    search_knowledge=True,
)

agent.print_response("How to make Tom Kha Gai?", markdown=True)

Params

ParameterTypeDefaultDescription
pdfsUnion[List[bytes], List[IO]]-List of PDF content as bytes or IO streams.
exclude_filesList[str][]List of file patterns to exclude (inherited from base class).
readerUnion[PDFReader, PDFImageReader]PDFReader()A PDFReader or PDFImageReader that converts the PDFs into Documents for the vector database.

PDFBytesKnowledgeBase is a subclass of the AgentKnowledge class and has access to the same params.

Developer Resources