Web Scraping
BrightData
BrightDataTools enable an Agent to perform web scraping, search engine queries, screenshots, and structured data extraction using BrightData’s API.
BrightDataTools provide comprehensive web scraping capabilities including markdown conversion, screenshots, search engine results, and structured data feeds from various platforms like LinkedIn, Amazon, Instagram, and more.
Prerequisites
The following examples require the requests
library.
You’ll also need a BrightData API key. Set the BRIGHT_DATA_API_KEY
environment variable:
Optionally, you can configure zone settings:
Examples
Basic Web Scraping
Extract structured data from platforms like LinkedIn, Amazon, etc.:
Toolkit Parameters
These parameters are passed to the BrightDataTools
constructor.
Parameter | Type | Default | Description |
---|---|---|---|
api_key | Optional[str] | None | BrightData API key. If not provided, uses BRIGHT_DATA_API_KEY environment variable. |
serp_zone | str | "serp_api" | Zone for search engine requests. Can be overridden with BRIGHT_DATA_SERP_ZONE environment variable. |
web_unlocker_zone | str | "web_unlocker1" | Zone for web scraping requests. Can be overridden with BRIGHT_DATA_WEB_UNLOCKER_ZONE environment variable. |
scrape_as_markdown | bool | True | Enable the scrape_as_markdown tool. |
get_screenshot | bool | False | Enable the get_screenshot tool. |
search_engine | bool | True | Enable the search_engine tool. |
web_data_feed | bool | True | Enable the web_data_feed tool. |
verbose | bool | False | Enable verbose logging. |
timeout | int | 600 | Timeout in seconds for web data feed requests. |
Toolkit Functions
Function | Description |
---|---|
scrape_as_markdown | Scrapes a webpage and returns content in Markdown format. Parameters: url (str) - URL to scrape. |
get_screenshot | Captures a screenshot of a webpage and adds it as an image artifact. Parameters: url (str) - URL to screenshot, output_path (str, optional) - Output path (default: “screenshot.png”). |
search_engine | Searches using Google, Bing, or Yandex and returns results in Markdown. Parameters: query (str), engine (str, default: “google”), num_results (int, default: 10), language (Optional[str]), country_code (Optional[str]). |
web_data_feed | Retrieves structured data from various sources like LinkedIn, Amazon, Instagram, etc. Parameters: source_type (str), url (str), num_of_reviews (Optional[int]). |
Supported Data Sources
The web_data_feed
function supports the following source types:
E-commerce
amazon_product
- Amazon product detailsamazon_product_reviews
- Amazon product reviewsamazon_product_search
- Amazon product search resultswalmart_product
- Walmart product detailswalmart_seller
- Walmart seller informationebay_product
- eBay product detailshomedepot_products
- Home Depot productszara_products
- Zara productsetsy_products
- Etsy productsbestbuy_products
- Best Buy products
Professional Networks
linkedin_person_profile
- LinkedIn person profileslinkedin_company_profile
- LinkedIn company profileslinkedin_job_listings
- LinkedIn job listingslinkedin_posts
- LinkedIn postslinkedin_people_search
- LinkedIn people search results
Social Media
instagram_profiles
- Instagram profilesinstagram_posts
- Instagram postsinstagram_reels
- Instagram reelsinstagram_comments
- Instagram commentsfacebook_posts
- Facebook postsfacebook_marketplace_listings
- Facebook Marketplace listingsfacebook_company_reviews
- Facebook company reviewsfacebook_events
- Facebook eventstiktok_profiles
- TikTok profilestiktok_posts
- TikTok poststiktok_shop
- TikTok shoptiktok_comments
- TikTok commentsx_posts
- X (Twitter) posts
Other Platforms
google_maps_reviews
- Google Maps reviewsgoogle_shopping
- Google Shopping resultsgoogle_play_store
- Google Play Store appsapple_app_store
- Apple App Store appsyoutube_profiles
- YouTube profilesyoutube_videos
- YouTube videosyoutube_comments
- YouTube commentsreddit_posts
- Reddit postszillow_properties_listing
- Zillow property listingsbooking_hotel_listings
- Booking.com hotel listingscrunchbase_company
- Crunchbase company datazoominfo_company_profile
- ZoomInfo company profilesreuter_news
- Reuters newsgithub_repository_file
- GitHub repository filesyahoo_finance_business
- Yahoo Finance business data
Developer Resources
- View Tools Source
- View Cookbook Example