Back to Crawl4ai

Undetected Browser Mode

docs/md_v2/advanced/undetected-browser.md

0.8.611.6 KB
Original Source

Undetected Browser Mode

Overview

Crawl4AI offers two powerful anti-bot features to help you access websites with bot detection:

  1. Stealth Mode - Uses playwright-stealth to modify browser fingerprints and behaviors
  2. Undetected Browser Mode - Advanced browser adapter with deep-level patches for sophisticated bot detection

This guide covers both features and helps you choose the right approach for your needs.

Anti-Bot Features Comparison

FeatureRegular BrowserStealth ModeUndetected Browser
WebDriver Detection
Navigator Properties
Plugin Emulation
CDP DetectionPartial
Deep Browser Patches
Performance ImpactNoneMinimalModerate
Setup ComplexityNoneNoneMinimal

When to Use Each Approach

Use Regular Browser + Stealth Mode When:

  • Sites have basic bot detection (checking navigator.webdriver, plugins, etc.)
  • You need good performance with basic protection
  • Sites check for common automation indicators

Use Undetected Browser When:

  • Sites employ sophisticated bot detection services (Cloudflare, DataDome, etc.)
  • Stealth mode alone isn't sufficient
  • You're willing to trade some performance for better evasion

Best Practice: Progressive Enhancement

  1. Start with: Regular browser + Stealth mode
  2. If blocked: Switch to Undetected browser
  3. If still blocked: Combine Undetected browser + Stealth mode

Stealth Mode

Stealth mode is the simpler anti-bot solution that works with both regular and undetected browsers:

python
from crawl4ai import AsyncWebCrawler, BrowserConfig

# Enable stealth mode with regular browser
browser_config = BrowserConfig(
    enable_stealth=True,  # Simple flag to enable
    headless=False       # Better for avoiding detection
)

async with AsyncWebCrawler(config=browser_config) as crawler:
    result = await crawler.arun("https://example.com")

What Stealth Mode Does:

  • Removes navigator.webdriver flag
  • Modifies browser fingerprints
  • Emulates realistic plugin behavior
  • Adjusts navigator properties
  • Fixes common automation leaks

Undetected Browser Mode

For sites with sophisticated bot detection that stealth mode can't bypass, use the undetected browser adapter:

Key Features

  • Drop-in Replacement: Uses the same API as regular browser mode
  • Enhanced Stealth: Built-in patches to evade common detection methods
  • Browser Adapter Pattern: Seamlessly switch between regular and undetected modes
  • Automatic Installation: crawl4ai-setup installs all necessary browser dependencies

Quick Start

python
import asyncio
from crawl4ai import (
    AsyncWebCrawler, 
    BrowserConfig, 
    CrawlerRunConfig,
    UndetectedAdapter
)
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy

async def main():
    # Create the undetected adapter
    undetected_adapter = UndetectedAdapter()
    
    # Create browser config
    browser_config = BrowserConfig(
        headless=False,  # Headless mode can be detected easier
        verbose=True,
    )
    
    # Create the crawler strategy with undetected adapter
    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
        browser_config=browser_config,
        browser_adapter=undetected_adapter
    )
    
    # Create the crawler with our custom strategy
    async with AsyncWebCrawler(
        crawler_strategy=crawler_strategy,
        config=browser_config
    ) as crawler:
        # Your crawling code here
        result = await crawler.arun(
            url="https://example.com",
            config=CrawlerRunConfig()
        )
        print(result.markdown[:500])

asyncio.run(main())

Combining Both Features

For maximum evasion, combine stealth mode with undetected browser:

python
from crawl4ai import AsyncWebCrawler, BrowserConfig, UndetectedAdapter
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy

# Create browser config with stealth enabled
browser_config = BrowserConfig(
    enable_stealth=True,  # Enable stealth mode
    headless=False
)

# Create undetected adapter
adapter = UndetectedAdapter()

# Create strategy with both features
strategy = AsyncPlaywrightCrawlerStrategy(
    browser_config=browser_config,
    browser_adapter=adapter
)

async with AsyncWebCrawler(
    crawler_strategy=strategy,
    config=browser_config
) as crawler:
    result = await crawler.arun("https://protected-site.com")

Examples

Example 1: Basic Stealth Mode

python
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig

async def test_stealth_mode():
    # Simple stealth mode configuration
    browser_config = BrowserConfig(
        enable_stealth=True,
        headless=False
    )
    
    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url="https://bot.sannysoft.com",
            config=CrawlerRunConfig(screenshot=True)
        )
        
        if result.success:
            print("✓ Successfully accessed bot detection test site")
            # Save screenshot to verify detection results
            if result.screenshot:
                import base64
                with open("stealth_test.png", "wb") as f:
                    f.write(base64.b64decode(result.screenshot))
                print("✓ Screenshot saved - check for green (passed) tests")

asyncio.run(test_stealth_mode())

Example 2: Undetected Browser Mode

python
import asyncio
from crawl4ai import (
    AsyncWebCrawler,
    BrowserConfig,
    CrawlerRunConfig,
    UndetectedAdapter
)
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy


async def main():
    # Create browser config
    browser_config = BrowserConfig(
        headless=False,
        verbose=True,
    )
    
    # Create the undetected adapter
    undetected_adapter = UndetectedAdapter()
    
    # Create the crawler strategy with the undetected adapter
    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
        browser_config=browser_config,
        browser_adapter=undetected_adapter
    )
    
    # Create the crawler with our custom strategy
    async with AsyncWebCrawler(
        crawler_strategy=crawler_strategy,
        config=browser_config
    ) as crawler:
        # Configure the crawl
        crawler_config = CrawlerRunConfig(
            markdown_generator=DefaultMarkdownGenerator(
                content_filter=PruningContentFilter()
            ),
            capture_console_messages=True,  # Test adapter console capture
        )
        
        # Test on a site that typically detects bots
        print("Testing undetected adapter...")
        result: CrawlResult = await crawler.arun(
            url="https://www.helloworld.org", 
            config=crawler_config
        )
        
        print(f"Status: {result.status_code}")
        print(f"Success: {result.success}")
        print(f"Console messages captured: {len(result.console_messages or [])}")
        print(f"Markdown content (first 500 chars):\n{result.markdown.raw_markdown[:500]}")


if __name__ == "__main__":
    asyncio.run(main())

Browser Adapter Pattern

The undetected browser support is implemented using an adapter pattern, allowing seamless switching between different browser implementations:

python
# Regular browser adapter (default)
from crawl4ai import PlaywrightAdapter
regular_adapter = PlaywrightAdapter()

# Undetected browser adapter
from crawl4ai import UndetectedAdapter
undetected_adapter = UndetectedAdapter()

The adapter handles:

  • JavaScript execution
  • Console message capture
  • Error handling
  • Browser-specific optimizations

Best Practices

  1. Avoid Headless Mode: Detection is easier in headless mode

    python
    browser_config = BrowserConfig(headless=False)
    
  2. Use Reasonable Delays: Don't rush through pages

    python
    crawler_config = CrawlerRunConfig(
        wait_time=3.0,  # Wait 3 seconds after page load
        delay_before_return_html=2.0  # Additional delay
    )
    
  3. Rotate User Agents: You can customize user agents

    python
    browser_config = BrowserConfig(
        headers={"User-Agent": "your-user-agent"}
    )
    
  4. Handle Failures Gracefully: Some sites may still detect and block

    python
    if not result.success:
        print(f"Crawl failed: {result.error_message}")
    

Advanced Usage Tips

Progressive Detection Handling

python
async def crawl_with_progressive_evasion(url):
    # Step 1: Try regular browser with stealth
    browser_config = BrowserConfig(
        enable_stealth=True,
        headless=False
    )
    
    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(url)
        if result.success and "Access Denied" not in result.html:
            return result
    
    # Step 2: If blocked, try undetected browser
    print("Regular + stealth blocked, trying undetected browser...")
    
    adapter = UndetectedAdapter()
    strategy = AsyncPlaywrightCrawlerStrategy(
        browser_config=browser_config,
        browser_adapter=adapter
    )
    
    async with AsyncWebCrawler(
        crawler_strategy=strategy,
        config=browser_config
    ) as crawler:
        result = await crawler.arun(url)
        return result

Installation

The undetected browser dependencies are automatically installed when you run:

bash
crawl4ai-setup

This command installs all necessary browser dependencies for both regular and undetected modes.

Limitations

  • Performance: Slightly slower than regular mode due to additional patches
  • Headless Detection: Some sites can still detect headless mode
  • Resource Usage: May use more resources than regular mode
  • Not 100% Guaranteed: Advanced anti-bot services are constantly evolving

Troubleshooting

Browser Not Found

Run the setup command:

bash
crawl4ai-setup

Detection Still Occurring

Try combining with other features:

python
crawler_config = CrawlerRunConfig(
    simulate_user=True,  # Add user simulation
    magic=True,  # Enable magic mode
    wait_time=5.0,  # Longer waits
)

Performance Issues

If experiencing slow performance:

python
# Use selective undetected mode only for protected sites
if is_protected_site(url):
    adapter = UndetectedAdapter()
else:
    adapter = PlaywrightAdapter()  # Default adapter

Future Plans

Note: In future versions of Crawl4AI, we may enable stealth mode and undetected browser by default to provide better out-of-the-box success rates. For now, users should explicitly enable these features when needed.

Conclusion

Crawl4AI provides flexible anti-bot solutions:

  1. Start Simple: Use regular browser + stealth mode for most sites
  2. Escalate if Needed: Switch to undetected browser for sophisticated protection
  3. Combine for Maximum Effect: Use both features together when facing the toughest challenges

Remember:

  • Always respect robots.txt and website terms of service
  • Use appropriate delays to avoid overwhelming servers
  • Consider the performance trade-offs of each approach
  • Test progressively to find the minimum necessary evasion level

See Also