Back to Crawlee

JSDOM crawler

website/versioned_docs/version-3.10/examples/jsdom_crawler.mdx

3.16.01.2 KB
Original Source

import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; import ApiLink from '@site/src/components/ApiLink'; import JSDOMCrawlerSource from '!!raw-loader!roa-loader!./jsdom_crawler.ts'; import JSDOMCrawlerRunScriptSource from '!!raw-loader!roa-loader!./jsdom_crawler_react.ts';

This example demonstrates how to use <ApiLink to="jsdom-crawler/class/JSDOMCrawler">JSDOMCrawler</ApiLink> to interact with a website using jsdom DOM implementation. Here the script will open a calculator app from the React examples, click 1 + 1 = and extract the result.

<RunnableCodeBlock className="language-ts" type="cheerio"> {JSDOMCrawlerRunScriptSource} </RunnableCodeBlock>

In the following example, we use <ApiLink to="jsdom-crawler/class/JSDOMCrawler">JSDOMCrawler</ApiLink> to crawl a list of URLs from an external file, load each URL using a plain HTTP request, parse the HTML using the jsdom DOM implementation and extract some data from it: the page title and all h1 tags.

<RunnableCodeBlock className="language-ts" type="cheerio"> {JSDOMCrawlerSource} </RunnableCodeBlock>