Example

The following agent can extract and analyze web content:
from agno.agent import Agent
from agno.tools.trafilatura import TrafilaturaTools

agent = Agent(
    instructions=[
        "You are a web content extraction specialist",
        "Extract clean text and structured data from web pages",
        "Provide detailed analysis of web content and metadata",
        "Help with content research and web data collection",
    ],
    tools=[TrafilaturaTools()],
)

agent.print_response("Extract the main content from https://example.com/article", stream=True)

Toolkit Params

ParameterTypeDefaultDescription
output_formatstr"txt"Default output format (txt, json, xml, markdown, csv, html).
include_commentsboolFalseWhether to extract comments along with main text.
include_tablesboolFalseWhether to include table content.
include_imagesboolFalseWhether to include image information (experimental).
include_formattingboolFalseWhether to preserve text formatting.
include_linksboolFalseWhether to preserve links (experimental).
with_metadataboolFalseWhether to include metadata in extractions.
favor_precisionboolFalseWhether to prefer precision over recall.
favor_recallboolFalseWhether to prefer recall over precision.
target_languageOptional[str]NoneTarget language filter (ISO 639-1 format).
deduplicateboolTrueWhether to remove duplicate segments.
max_crawl_urlsint100Maximum number of URLs to crawl per website.
max_known_urlsint1000Maximum number of known URLs during crawling.
enable_extract_textboolTrueWhether to extract text content.
enable_extract_metadataboolTrueWhether to extract metadata information.
enable_html_to_textboolTrueWhether to convert HTML content to clean text.
enable_batch_extractboolTrueWhether to extract content from multiple URLs in batch.

Toolkit Functions

FunctionDescription
extract_textExtract clean text content from a URL or HTML.
extract_metadataExtract metadata information from web pages.
html_to_textConvert HTML content to clean text.
crawl_websiteCrawl a website and extract content from multiple pages.
batch_extractExtract content from multiple URLs in batch.
get_page_infoGet comprehensive page information including metadata.

Developer Resources