Learn how to use readers to convert raw data into searchable knowledge for your Agents.
Document
objects that can be embedded, chunked, and stored in vector databases.
Document
objectsReader
class and follow a consistent pattern:
Document
objectsDocument
objects with this structure:
chunk=True
, readers automatically apply chunking strategies to break large documents into smaller, more manageable pieces:
ReaderFactory
:
Reader Name | Description |
---|---|
ArxivReader | Fetches and processes academic papers from arXiv |
CSVReader | Parses CSV files and converts rows to documents |
FirecrawlReader | Uses Firecrawl API to scrape and crawl web content |
JSONReader | Processes JSON files and converts them into documents |
MarkdownReader | Reads and parses Markdown files |
PDFReader | Reads and extracts text from PDF files |
TextReader | Handles plain text files |
WebPageReader | Scrapes and processes content from web pages |
WebsiteReader | Crawls entire websites following links recursively |
WikipediaReader | Searches and reads Wikipedia articles |
YouTubeReader | Extracts transcripts and metadata from YouTube videos |