Trafilatura

Example

The following agent can extract and analyze web content:

from agno.agent import Agent
from agno.tools.trafilatura import TrafilaturaTools

agent = Agent(
    instructions=[
        "You are a web content extraction specialist",
        "Extract clean text and structured data from web pages",
        "Provide detailed analysis of web content and metadata",
        "Help with content research and web data collection",
    ],
    tools=[TrafilaturaTools()],
)

agent.print_response("Extract the main content from https://example.com/article", stream=True)

Toolkit Params

Parameter	Type	Default	Description
`output_format`	`str`	`"txt"`	Default output format (txt, json, xml, markdown, csv, html).
`include_comments`	`bool`	`False`	Whether to extract comments along with main text.
`include_tables`	`bool`	`False`	Whether to include table content.
`include_images`	`bool`	`False`	Whether to include image information (experimental).
`include_formatting`	`bool`	`False`	Whether to preserve text formatting.
`include_links`	`bool`	`False`	Whether to preserve links (experimental).
`with_metadata`	`bool`	`False`	Whether to include metadata in extractions.
`favor_precision`	`bool`	`False`	Whether to prefer precision over recall.
`favor_recall`	`bool`	`False`	Whether to prefer recall over precision.
`target_language`	`Optional[str]`	`None`	Target language filter (ISO 639-1 format).
`deduplicate`	`bool`	`True`	Whether to remove duplicate segments.
`max_crawl_urls`	`int`	`100`	Maximum number of URLs to crawl per website.
`max_known_urls`	`int`	`1000`	Maximum number of known URLs during crawling.
`enable_extract_text`	`bool`	`True`	Whether to extract text content.
`enable_extract_metadata`	`bool`	`True`	Whether to extract metadata information.
`enable_html_to_text`	`bool`	`True`	Whether to convert HTML content to clean text.
`enable_batch_extract`	`bool`	`True`	Whether to extract content from multiple URLs in batch.

Toolkit Functions

Function	Description
`extract_text`	Extract clean text content from a URL or HTML.
`extract_metadata`	Extract metadata information from web pages.
`html_to_text`	Convert HTML content to clean text.
`crawl_website`	Crawl a website and extract content from multiple pages.
`batch_extract`	Extract content from multiple URLs in batch.
`get_page_info`	Get comprehensive page information including metadata.

Introduction

Learn

Help

Trafilatura

Example

Toolkit Params

Toolkit Functions

Developer Resources

Introduction

Learn

Help

​Example

​Toolkit Params

​Toolkit Functions

​Developer Resources

Example

Toolkit Params

Toolkit Functions

Developer Resources