Back to Autogpt

Firecrawl Scrape

docs/integrations/block-integrations/firecrawl/scrape.md

0.6.442.4 KB
Original Source

Firecrawl Scrape

<!-- MANUAL: file_description -->

Blocks for scraping individual web pages and extracting content using Firecrawl.

<!-- END MANUAL -->

Firecrawl Scrape

What it is

Firecrawl scrapes a website to extract comprehensive data while bypassing blockers.

How it works

<!-- MANUAL: how_it_works -->

This block uses Firecrawl's scraping API to extract content from a single URL. It handles JavaScript rendering, bypasses anti-bot measures, and can return content in multiple formats including markdown, HTML, and screenshots.

Configure output formats, filter to main content only, and set wait times for dynamic pages. The block returns comprehensive results including extracted content, links found on the page, and optional change tracking data.

<!-- END MANUAL -->

Inputs

InputDescriptionTypeRequired
urlThe URL to crawlstrYes
limitThe number of pages to crawlintNo
only_main_contentOnly return the main content of the page excluding headers, navs, footers, etc.boolNo
max_ageThe maximum age of the page in milliseconds - default is 1 hourintNo
wait_forSpecify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.intNo
formatsThe format of the crawlList["markdown" | "html" | "rawHtml" | "links" | "screenshot" | "screenshot@fullPage" | "json" | "changeTracking"]No

Outputs

OutputDescriptionType
errorError message if the scrape failedstr
dataThe result of the crawlDict[str, Any]
markdownThe markdown of the crawlstr
htmlThe html of the crawlstr
raw_htmlThe raw html of the crawlstr
linksThe links of the crawlList[str]
screenshotThe screenshot of the crawlstr
screenshot_full_pageThe screenshot full page of the crawlstr
json_dataThe json data of the crawlDict[str, Any]
change_trackingThe change tracking of the crawlDict[str, Any]

Possible use case

<!-- MANUAL: use_case -->

Article Extraction: Scrape news articles or blog posts to extract clean, readable content.

Price Monitoring: Regularly scrape product pages to track price changes over time.

Content Backup: Create markdown backups of important web pages for offline reference.

<!-- END MANUAL -->