agent-skill/Scrapling-Skill/references/fetching/choosing.md
Fetchers are classes that do requests or fetch pages in a single-line fashion with many features and return a Response object. All fetchers have separate session classes to keep the session running (e.g., a browser fetcher keeps the browser open until you finish all requests).
Fetchers are not wrappers built on top of other libraries. They use these libraries as an engine to request/fetch pages but add features the underlying engines don't have, while still fully leveraging and optimizing them for web scraping.
Scrapling provides three different fetcher classes with their session classes; each fetcher is designed for a specific use case.
The following table compares them and can be quickly used for guidance.
| Feature | Fetcher | DynamicFetcher | StealthyFetcher |
|---|---|---|---|
| Relative speed | 🐇🐇🐇🐇🐇 | 🐇🐇🐇 | 🐇🐇🐇 |
| Stealth | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Anti-Bot options | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| JavaScript loading | ❌ | ✅ | ✅ |
| Memory Usage | ⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
| Best used for | Basic scraping when HTTP requests alone can do it | - Dynamically loaded websites |
All fetchers share the same import method, as you will see in the upcoming pages
from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
Then you use it right away without initializing like this, and it will use the default parser settings:
page = StealthyFetcher.fetch('https://example.com')
If you want to configure the parser (Selector class) that will be used on the response before returning it for you, then do this first:
from scrapling.fetchers import Fetcher
Fetcher.configure(adaptive=True, keep_comments=False, keep_cdata=False) # and the rest
or
from scrapling.fetchers import Fetcher
Fetcher.adaptive=True
Fetcher.keep_comments=False
Fetcher.keep_cdata=False # and the rest
Then, continue your code as usual.
The available configuration arguments are: adaptive, adaptive_domain, huge_tree, keep_comments, keep_cdata, storage, and storage_args, which are the same ones you give to the Selector class. You can display the current configuration anytime by running <fetcher_class>.display_config().
Info: The adaptive argument is disabled by default; you must enable it to use that feature.
As you probably understand, the logic above for setting the parser config will apply globally to all requests/fetches made through that class, and it's intended for simplicity.
If your use case requires a different configuration for each request/fetch, you can pass a dictionary to the request method (fetch/get/post/...) to an argument named selector_config.
The Response object is the same as the Selector class, but it has additional details about the response, like response headers, status, cookies, etc., as shown below:
from scrapling.fetchers import Fetcher
page = Fetcher.get('https://example.com')
page.status # HTTP status code
page.reason # Status message
page.cookies # Response cookies as a dictionary
page.headers # Response headers
page.request_headers # Request headers
page.history # Response history of redirections, if any
page.body # Raw response body as bytes
page.encoding # Response encoding
page.meta # Response metadata dictionary (e.g., proxy used). Mainly helpful with the spiders system.
page.captured_xhr # List of captured XHR/fetch responses (when capture_xhr is enabled on a browser session)
All fetchers return the Response object.
Note: Unlike the Selector class, the Response class's body is always bytes since v0.4.