apps/api/src/scraper/WebScraper/utils/ENGINE_FORCING.md
This feature allows you to force specific scraping engines for certain domains based on URL patterns. This is useful when you know that certain websites work better with specific engines.
The engine forcing is configured via the FORCED_ENGINE_DOMAINS environment variable. This should be a JSON object mapping domain patterns to engines.
{
"example.com": "playwright",
"test.com": "fetch",
"*.subdomain.com": "fire-engine;chrome-cdp",
"google.com": ["fire-engine;chrome-cdp", "playwright"]
}
"example.com" matches example.com and all its subdomains (www.example.com, api.example.com, etc.)"*.subdomain.com" matches only subdomains of subdomain.com (e.g., api.subdomain.com, www.subdomain.com) but NOT the base domain itself"playwright" forces a single engine["fire-engine;chrome-cdp", "playwright"] provides a fallback list of engines to try in orderfire-engine;chrome-cdp - Advanced browser with Chrome DevTools Protocolfire-engine;tlsclient - TLS fingerprinting for anti-bot bypassfire-engine;chrome-cdp;stealth - Chrome CDP with stealth modefire-engine;tlsclient;stealth - TLS client with stealth modeplaywright - Direct Playwright integrationfetch - Simple HTTP requestspdf - PDF document parsingdocument - Office document handlingFORCED_ENGINE_DOMAINSforceEngine is not already set in the internal optionsexport FORCED_ENGINE_DOMAINS='{"linkedin.com":"playwright","twitter.com":"playwright"}'
This forces Playwright for LinkedIn and Twitter URLs.
export FORCED_ENGINE_DOMAINS='{"example.com":"fetch","httpbin.org":"fetch"}'
This uses the simple fetch engine for example.com and httpbin.org.
export FORCED_ENGINE_DOMAINS='{
"google.com": ["fire-engine;chrome-cdp", "playwright"],
"*.cloudflare.com": "fire-engine;tlsclient;stealth",
"wikipedia.org": "fetch"
}'
This configuration:
The engine forcing logic is implemented in:
apps/api/src/scraper/WebScraper/utils/engine-forcing.ts - Core logicapps/api/src/scraper/scrapeURL/index.ts - Integration into scraping pipelineThe system is initialized at startup in:
apps/api/src/index.ts - Main API serverapps/api/src/services/queue-worker.ts - Queue workerapps/api/src/services/extract-worker.ts - Extract workerapps/api/src/services/worker/nuq-worker.ts - NuQ workerThe engine forcing has the following precedence:
forceEngine is already set in InternalOptions, it takes precedence (engine forcing is skipped)Unit tests are available in apps/api/src/scraper/WebScraper/utils/__tests__/engine-forcing.test.ts.
undefinedFORCED_ENGINE_DOMAINS is invalid, the system logs an error and continues with empty mappings