Back to Autogpt

Firecrawl Crawl

docs/integrations/block-integrations/firecrawl/crawl.md

0.6.442.4 KB
Original Source

Firecrawl Crawl

<!-- MANUAL: file_description -->

Blocks for crawling multiple pages of a website using Firecrawl.

<!-- END MANUAL -->

Firecrawl Crawl

What it is

Firecrawl crawls websites to extract comprehensive data while bypassing blockers.

How it works

<!-- MANUAL: how_it_works -->

This block uses Firecrawl's API to crawl multiple pages of a website starting from a given URL. It navigates through links, handling JavaScript rendering and bypassing anti-bot measures to extract clean content from each page.

Configure the crawl depth with the limit parameter, choose output formats (markdown, HTML, or raw HTML), and optionally filter to main content only. The block supports caching with configurable max age and wait times for dynamic content.

<!-- END MANUAL -->

Inputs

InputDescriptionTypeRequired
urlThe URL to crawlstrYes
limitThe number of pages to crawlintNo
only_main_contentOnly return the main content of the page excluding headers, navs, footers, etc.boolNo
max_ageThe maximum age of the page in milliseconds - default is 1 hourintNo
wait_forSpecify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.intNo
formatsThe format of the crawlList["markdown" | "html" | "rawHtml" | "links" | "screenshot" | "screenshot@fullPage" | "json" | "changeTracking"]No

Outputs

OutputDescriptionType
errorError message if the crawl failedstr
dataThe result of the crawlList[Dict[str, Any]]
markdownThe markdown of the crawlstr
htmlThe html of the crawlstr
raw_htmlThe raw html of the crawlstr
linksThe links of the crawlList[str]
screenshotThe screenshot of the crawlstr
screenshot_full_pageThe screenshot full page of the crawlstr
json_dataThe json data of the crawlDict[str, Any]
change_trackingThe change tracking of the crawlDict[str, Any]

Possible use case

<!-- MANUAL: use_case -->

Documentation Indexing: Crawl entire documentation sites to build searchable knowledge bases or training data.

Competitor Research: Extract content from competitor websites for market analysis and comparison.

Content Archival: Systematically archive website content for backup or compliance purposes.

<!-- END MANUAL -->