Back to Autogpt

Firecrawl Extract

docs/integrations/block-integrations/firecrawl/extract.md

0.6.441.8 KB
Original Source

Firecrawl Extract

<!-- MANUAL: file_description -->

Blocks for extracting structured data from web pages using Firecrawl's AI extraction.

<!-- END MANUAL -->

Firecrawl Extract

What it is

Firecrawl crawls websites to extract comprehensive data while bypassing blockers.

How it works

<!-- MANUAL: how_it_works -->

This block uses Firecrawl's extraction API to pull structured data from web pages based on a prompt or schema. It crawls the specified URLs and uses AI to extract information matching your requirements.

Define the data structure you want using a JSON schema for precise extraction, or use natural language prompts for flexible extraction. Wildcards in URLs allow extracting data from multiple pages matching a pattern.

<!-- END MANUAL -->

Inputs

InputDescriptionTypeRequired
urlsThe URLs to crawl - at least one is required. Wildcards are supported. (/*)List[str]Yes
promptThe prompt to use for the crawlstrNo
output_schemaA Json Schema describing the output structure if more rigid structure is desired.Dict[str, Any]No
enable_web_searchWhen true, extraction can follow links outside the specified domain.boolNo

Outputs

OutputDescriptionType
errorError message if the extraction failedstr
dataThe result of the crawlDict[str, Any]

Possible use case

<!-- MANUAL: use_case -->

Product Data Extraction: Extract structured product information (prices, specs, reviews) from e-commerce sites.

Contact Scraping: Pull business contact information from company websites in a structured format.

Data Pipeline Input: Automatically extract and structure web data for analysis or database population.

<!-- END MANUAL -->