Document objects that can be embedded, chunked, and stored in vector databases.
What are Readers?
A Reader is a specialized component that knows how to parse and extract content from specific data sources or file formats. Think of readers as translators that convert different content formats into a standardized format that Agno can work with. Every piece of content that enters your knowledge base must pass through a reader first. The reader’s job is to:- Parse the raw content from its original format
- Extract the meaningful text and metadata
- Structure the content into
Documentobjects - Apply chunking strategies to break large content into manageable pieces
How Readers Work
All readers inherit from the baseReader class and follow a consistent pattern:
The Reading Process
When a reader processes content, it follows these steps:- Content Ingestion: The reader receives raw content (file, URL, text, etc.)
- Parsing: Extract text and metadata using format-specific logic
- Document Creation: Convert parsed content into
Documentobjects - Chunking: Apply chunking strategies to break content into smaller pieces
- Return: Provide a list of processed documents ready for embedding
Content Types and Specialization
Each reader specializes in handling specific content types:- Use format-specific parsing libraries
- Extract relevant metadata
- Handle format-specific challenges (encryption, encoding, etc.)
- Optimize processing for that content type
Reader Configuration
Readers are highly configurable to meet different processing needs:Chunking Control
Content Processing Options
Encoding Control
For text-based readers, you can override the file encoding:Metadata and Naming
The Document Output
Readers convert raw content intoDocument objects with this structure:
Chunking Integration
One of the most important features of readers is their integration with chunking strategies:Automatic Chunking
Whenchunk=True, readers automatically apply chunking strategies to break large documents into smaller, more manageable pieces:
Chunking Strategy Support
Different readers support different chunking strategies based on their content type:Reader Factory and Auto-Selection
Agno provides intelligent reader selection through theReaderFactory:
Supported Readers
The following readers are currently supported:| Reader Name | Description |
|---|---|
| ArxivReader | Fetches and processes academic papers from arXiv |
| CSVReader | Parses CSV files and converts rows to documents |
| FieldLabeledCSVReader | Converts CSV rows to field-labeled text documents |
| FirecrawlReader | Uses Firecrawl API to scrape and crawl web content |
| JSONReader | Processes JSON files and converts them into documents |
| MarkdownReader | Reads and parses Markdown files |
| PDFReader | Reads and extracts text from PDF files |
| PPTXReader | Reads and extracts text from PowerPoint (.pptx) files |
| TextReader | Handles plain text files |
| WebsiteReader | Crawls entire websites following links recursively |
| WebSearchReader | Searches and reads web search results |
| WikipediaReader | Searches and reads Wikipedia articles |
| YouTubeReader | Extracts transcripts and metadata from YouTube videos |
Async Processing
All readers support asynchronous processing for better performance:Usage in Knowledge
Readers integrate seamlessly with Agno Knowledge:Best Practices
Choose the Right Reader
- Use specialized readers for better extraction quality
- Consider format-specific features (PDF encryption, CSV delimiters, etc.)
Configure Chunking Appropriately
- Smaller chunks for precise retrieval
- Larger chunks for maintaining context
- Use semantic chunking for structured documents
Optimize for Performance
- Use async readers for I/O-heavy operations
- Batch process multiple files when possible
- Cache readers through ReaderFactory when processing many files
Handle Errors Gracefully
- Readers return empty lists for failed processing
- Check reader logs for debugging information
- Provide fallback readers for unknown formats