llama-index-integrations/tools/llama-index-tools-brightdata/README.md
This tool connects to Bright Data to enable your agent to crawl websites, search the web, and access structured data from platforms like LinkedIn, Amazon, and social media.
Bright Data's tools provide robust web scraping capabilities with built-in CAPTCHA solving and bot detection avoidance, allowing you to reliably extract data from the web.
pip install llama-index llama-index-core llama-index-tools-brightdata
Sign up at Bright Data and retrieve your API key from your account settings. Replace "your-api-key" with your actual API key in the examples below:
Here's an example of how to use the BrightDataToolSpec with LlamaIndex:
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
from llama_index.tools.brightdata import BrightDataToolSpec
brightdata_tool = BrightDataToolSpec(api_key="your-api-key", zone="unlocker")
tool_list = brightdata_tool.to_tool_list()
for tool in tool_list:
tool.original_description = tool.metadata.description
tool.metadata.description = "Bright Data web scraping tool"
agent = FunctionAgent(
tools=tool_list,
llm=OpenAI(model="gpt-4.1"),
)
query = (
"Find and summarize the latest news about AI from major tech news sites"
)
tool_descriptions = "\n\n".join(
[
f"Tool Name: {tool.metadata.name}\nTool Description: {tool.original_description}"
for tool in tool_list
]
)
query_with_descriptions = f"{tool_descriptions}\n\nQuery: {query}"
response = await agent.run(query_with_descriptions)
print(response)
The Bright Data tool provides the following capabilities:
scrape_as_markdown: Scrape a webpage and convert the content to Markdown format. This tool can bypass CAPTCHA and bot detection.result = brightdata_tool.scrape_as_markdown("https://example.com")
print(result.text)
get_screenshot: Take a screenshot of a webpage and save it to a file.screenshot_path = brightdata_tool.get_screenshot(
"https://example.com", output_path="example_screenshot.png"
)
search_engine: Search Google, Bing, or Yandex and get structured search results as JSON or Markdown. Supports advanced parameters for more specific searches.search_results = brightdata_tool.search_engine(
query="climate change solutions",
engine="google",
language="en",
country_code="us",
num_results=20,
)
print(search_results.text)
web_data_feed: Retrieve structured data from various platforms including LinkedIn, Amazon, Instagram, Facebook, X (Twitter), Zillow, and more.linkedin_profile = brightdata_tool.web_data_feed(
source_type="linkedin_person_profile",
url="https://www.linkedin.com/in/username/",
)
print(linkedin_profile)
amazon_product = brightdata_tool.web_data_feed(
source_type="amazon_product", url="https://www.amazon.com/dp/B08N5KWB9H"
)
print(amazon_product)
The Bright Data tool offers various configuration options for specialized use cases:
The search_engine function supports advanced parameters like:
language parameter)country_code parameter)results = brightdata_tool.search_engine(
query="best hotels in paris",
engine="google",
language="fr",
country_code="fr",
search_type="shopping",
device="mobile",
hotel_dates="2025-06-01,2025-06-05",
hotel_occupancy=2,
)
The web_data_feed function supports retrieving structured data from:
For more information, visit the Bright Data documentation.