docs/index.md
</a>
Scrapling is an adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl.
Its parser learns from website changes and automatically relocates your elements when pages update. Its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. And its spider framework lets you scale up to concurrent, multi-session crawls with pause/resume and automatic proxy rotation - all in a few lines of Python. One library, zero compromises.
Blazing fast crawls with real-time stats and streaming. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.
from scrapling.fetchers import Fetcher, StealthyFetcher, DynamicFetcher
StealthyFetcher.adaptive = True
page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True) # Fetch website under the radar!
products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
products = page.css('.product', adaptive=True) # Later, if the website structure changes, pass `adaptive=True` to find them!
Or scale up to full crawls
from scrapling.spiders import Spider, Response
class MySpider(Spider):
name = "demo"
start_urls = ["https://example.com/"]
async def parse(self, response: Response):
for item in response.css('.product'):
yield {"title": item.css('h2::text').get()}
MySpider().start()
<i><sub>Do you want to show your ad here? Click here, choose a plan, and enjoy the rest of the perks!</sub></i>
start_urls, async parse callbacks, and Request/Response objects.async for item in spider.stream() with real-time stats - ideal for UI, pipelines, and long-running crawls.robots_txt_obey flag that respects Disallow, Crawl-delay, and Request-rate directives with per-domain caching.parse() logic without re-hitting the target servers.result.items.to_json() / result.items.to_jsonl() respectively.Fetcher class. Can impersonate browsers' TLS fingerprint, headers, and use HTTP/3.DynamicFetcher class supporting Playwright's Chromium and Google's Chrome.StealthyFetcher and fingerprint spoofing. Can easily bypass all types of Cloudflare's Turnstile/Interstitial with automation.FetcherSession, StealthySession, and DynamicSession classes for cookie and state management across requests.ProxyRotator with cyclic or custom rotation strategies across all session types, plus per-request proxy overrides.Scraplingβs GitHub stars have grown steadily since its release (see chart below).
<div id="chartContainer"> <a href="https://github.com/D4Vinci/Scrapling"> </a> </div> <script> const observer = new MutationObserver((mutations) => { mutations.forEach((mutation) => { if (mutation.attributeName === 'data-md-color-media') { const colorMedia = document.body.getAttribute('data-md-color-media'); const isDarkScheme = document.body.getAttribute('data-md-color-scheme') === 'slate'; const chartImg = document.querySelector('#chartImage'); const baseUrl = 'https://api.star-history.com/svg?repos=D4Vinci/Scrapling&type=Date'; if (colorMedia === '(prefers-color-scheme)' ? isDarkScheme : colorMedia.includes('dark')) { chartImg.src = `${baseUrl}&theme=dark`; } else { chartImg.src = baseUrl; } } }); }); observer.observe(document.body, { attributes: true, attributeFilter: ['data-md-color-media', 'data-md-color-scheme'] }); </script>Scrapling requires Python 3.10 or higher:
pip install scrapling
This installation only includes the parser engine and its dependencies, without any fetchers or commandline dependencies.
If you are going to use any of the extra features below, the fetchers, or their classes, you will need to install fetchers' dependencies and their browser dependencies as follows:
pip install "scrapling[fetchers]"
scrapling install # normal install
scrapling install --force # force reinstall
This downloads all browsers, along with their system dependencies and fingerprint manipulation dependencies.
Or you can install them from the code instead of running a command like this:
from scrapling.cli import install
install([], standalone_mode=False) # normal install
install(["--force"], standalone_mode=False) # force reinstall
Extra features:
pip install "scrapling[ai]"
extract command):
pip install "scrapling[shell]"
pip install "scrapling[all]"
Don't forget that you need to install the browser dependencies with scrapling install after any of these extras (if you didn't already)
You can also install a Docker image with all extras and browsers with the following command from DockerHub:
docker pull pyd4vinci/scrapling
Or download it from the GitHub registry:
docker pull ghcr.io/d4vinci/scrapling:latest
This image is automatically built and pushed using GitHub Actions and the repository's main branch.
Scrapling has extensive documentation, so we try to follow the DiΓ‘taxis documentation framework.
If you like Scrapling and want to support its development:
This project is licensed under the BSD-3 License. See the LICENSE file for details.