WebScrapingStrategy Migration Guide

Overview

Crawl4AI has simplified its content scraping architecture. The BeautifulSoup-based WebScrapingStrategy has been deprecated in favor of the faster LXML-based implementation. However, no action is required - your existing code will continue to work.

What Changed?

WebScrapingStrategy is now an alias for LXMLWebScrapingStrategy
The BeautifulSoup implementation has been removed (~1000 lines of redundant code)
LXMLWebScrapingStrategy inherits directly from ContentScrapingStrategy
Performance remains optimal with LXML as the sole implementation

Backward Compatibility

Your existing code continues to work without any changes:

python

# This still works perfectly
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, WebScrapingStrategy

config = CrawlerRunConfig(
    scraping_strategy=WebScrapingStrategy()  # Works as before
)

Migration Options

You have three options:

Option 1: Do Nothing (Recommended)

Your code will continue to work. WebScrapingStrategy is permanently aliased to LXMLWebScrapingStrategy.

Option 2: Update Imports (Optional)

For clarity, you can update your imports:

python

# Old (still works)
from crawl4ai import WebScrapingStrategy
strategy = WebScrapingStrategy()

# New (more explicit)
from crawl4ai import LXMLWebScrapingStrategy
strategy = LXMLWebScrapingStrategy()

Option 3: Use Default Configuration

Since LXMLWebScrapingStrategy is the default, you can omit the strategy parameter:

python

# Simplest approach - uses LXMLWebScrapingStrategy by default
config = CrawlerRunConfig()

Type Hints

If you use type hints, both work:

python

from crawl4ai import WebScrapingStrategy, LXMLWebScrapingStrategy

def process_with_strategy(strategy: WebScrapingStrategy) -> None:
    # Works with both WebScrapingStrategy and LXMLWebScrapingStrategy
    pass

# Both are valid
process_with_strategy(WebScrapingStrategy())
process_with_strategy(LXMLWebScrapingStrategy())

Subclassing

If you've subclassed WebScrapingStrategy, it continues to work:

python

class MyCustomStrategy(WebScrapingStrategy):
    def __init__(self):
        super().__init__()
        # Your custom code

Performance Benefits

By consolidating to LXML:

10-20x faster HTML parsing for large documents
Lower memory usage
Consistent behavior across all use cases
Simplified maintenance and bug fixes

Summary

This change simplifies Crawl4AI's internals while maintaining 100% backward compatibility. Your existing code continues to work, and you get better performance automatically.