llama-index-integrations/tools/llama-index-tools-scrapegraph/README.md
This tool integrates Scrapegraph with LlamaIndex, providing intelligent web scraping capabilities with structured data extraction.
pip install llama-index-tools-scrapegraph
First, import and initialize the ScrapegraphToolSpec:
from llama_index.tools.scrapegraph import ScrapegraphToolSpec
scrapegraph_tool = ScrapegraphToolSpec()
The tool provides the following capabilities:
from pydantic import BaseModel
# Define your schema (optional)
class ProductSchema(BaseModel):
name: str
price: float
description: str
schema = [ProductSchema]
# Perform the scraping
result = scrapegraph_tool.scrapegraph_smartscraper(
prompt="Extract product information",
url="https://example.com/product",
api_key="your-api-key",
schema=schema, # Optional
)
Convert webpage content to markdown format:
markdown_content = scrapegraph_tool.scrapegraph_markdownify(
url="https://example.com", api_key="your-api-key"
)
Extract structured data from raw text:
text = """
Your raw text content here...
"""
structured_data = scrapegraph_tool.scrapegraph_local_scrape(
text=text, api_key="your-api-key"
)
scrapegraph-py package