Back to Crawlee

Cheerio crawler

website/versioned_docs/version-3.15/examples/cheerio_crawler.mdx

3.16.0661 B
Original Source

import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import ApiLink from '@site/src/components/ApiLink'; import CheerioCrawlerSource from '!!raw-loader!roa-loader!./cheerio_crawler.ts';

This example demonstrates how to use <ApiLink to="cheerio-crawler/class/CheerioCrawler">CheerioCrawler</ApiLink> to crawl a list of URLs from an external file, load each URL using a plain HTTP request, parse the HTML using the Cheerio library and extract some data from it: the page title and all h1 tags.

<RunnableCodeBlock className="language-js" type="cheerio"> {CheerioCrawlerSource} </RunnableCodeBlock>