crawl4ai
library.
Parameter | Type | Default | Description |
---|---|---|---|
max_length | int | 1000 | Specifies the maximum length of the text from the webpage to be returned. |
timeout | int | 60 | Timeout in seconds for web crawling operations. |
use_pruning | bool | False | Enable content pruning to remove less relevant content. |
pruning_threshold | float | 0.48 | Threshold for content pruning relevance scoring. |
bm25_threshold | float | 1.0 | BM25 scoring threshold for content relevance. |
headless | bool | True | Run browser in headless mode. |
wait_until | str | "domcontentloaded" | Browser wait condition before crawling (e.g., “domcontentloaded”, “load”, “networkidle”). |
enable_crawl | bool | True | Enable the web crawling functionality. |
all | bool | False | Enable all available functions. When True, all enable flags are ignored. |
Function | Description |
---|---|
web_crawler | Crawls a website using crawl4ai’s WebCrawler. Parameters include ‘url’ for the URL to crawl and an optional ‘max_length’ to limit the length of extracted content. The default value for ‘max_length’ is 1000. |