Agno Knowledge uses content as the building block of any piece of knowledge. Content can be added to knowledge from different sources.
Content OriginDescription
PathLocal files or directories containing files
UrlDirect links to files or other sites
TextRaw text content
TopicSearch topics from repositories like Arxiv or Wikipedia
Remote ContentContent stored in remote repositories like S3 or Google Cloud Storage
Knowledge content needs to be read and chunked before it can be passed to any VectorDB for embedding, storage and ultimately, retrieval. When content is added to Knowledge, a default reader is selected. Readers are used to parse content from the origin and then chunk it into smaller pieces that will then be embedded by the VectorDB. Custom readers or an override to the default reader and/or its settings can be passed when adding the content. In the below example, an instance of the standard PDFReader class is created but we update the chunk_size. Similarly, we can update the chunking_strategy and other parameters that will influence how content is ingested and processed.
from agno.knowledge.reader.pdf_reader import PDFReader

reader = PDFReader(
    chunk_size=1000,
)

knowledge_base = Knowledge(
    vector_db=vector_db,
)

asyncio.run(
        knowledge_base.add_content_async(
            path="data/pdf",
            reader=reader
        )
    )
For more information about the different readers and their capabilities checkout the Readers page.