Back to Crawl4ai

WebScrapingStrategy Migration Guide

docs/md_v2/migration/webscraping-strategy-migration.md

0.8.62.6 KB
Original Source

WebScrapingStrategy Migration Guide

Overview

Crawl4AI has simplified its content scraping architecture. The BeautifulSoup-based WebScrapingStrategy has been deprecated in favor of the faster LXML-based implementation. However, no action is required - your existing code will continue to work.

What Changed?

  1. WebScrapingStrategy is now an alias for LXMLWebScrapingStrategy
  2. The BeautifulSoup implementation has been removed (~1000 lines of redundant code)
  3. LXMLWebScrapingStrategy inherits directly from ContentScrapingStrategy
  4. Performance remains optimal with LXML as the sole implementation

Backward Compatibility

Your existing code continues to work without any changes:

python
# This still works perfectly
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, WebScrapingStrategy

config = CrawlerRunConfig(
    scraping_strategy=WebScrapingStrategy()  # Works as before
)

Migration Options

You have three options:

Your code will continue to work. WebScrapingStrategy is permanently aliased to LXMLWebScrapingStrategy.

Option 2: Update Imports (Optional)

For clarity, you can update your imports:

python
# Old (still works)
from crawl4ai import WebScrapingStrategy
strategy = WebScrapingStrategy()

# New (more explicit)
from crawl4ai import LXMLWebScrapingStrategy
strategy = LXMLWebScrapingStrategy()

Option 3: Use Default Configuration

Since LXMLWebScrapingStrategy is the default, you can omit the strategy parameter:

python
# Simplest approach - uses LXMLWebScrapingStrategy by default
config = CrawlerRunConfig()

Type Hints

If you use type hints, both work:

python
from crawl4ai import WebScrapingStrategy, LXMLWebScrapingStrategy

def process_with_strategy(strategy: WebScrapingStrategy) -> None:
    # Works with both WebScrapingStrategy and LXMLWebScrapingStrategy
    pass

# Both are valid
process_with_strategy(WebScrapingStrategy())
process_with_strategy(LXMLWebScrapingStrategy())

Subclassing

If you've subclassed WebScrapingStrategy, it continues to work:

python
class MyCustomStrategy(WebScrapingStrategy):
    def __init__(self):
        super().__init__()
        # Your custom code

Performance Benefits

By consolidating to LXML:

  • 10-20x faster HTML parsing for large documents
  • Lower memory usage
  • Consistent behavior across all use cases
  • Simplified maintenance and bug fixes

Summary

This change simplifies Crawl4AI's internals while maintaining 100% backward compatibility. Your existing code continues to work, and you get better performance automatically.