docs/md_v2/migration/webscraping-strategy-migration.md
Crawl4AI has simplified its content scraping architecture. The BeautifulSoup-based WebScrapingStrategy has been deprecated in favor of the faster LXML-based implementation. However, no action is required - your existing code will continue to work.
WebScrapingStrategy is now an alias for LXMLWebScrapingStrategyLXMLWebScrapingStrategy inherits directly from ContentScrapingStrategyYour existing code continues to work without any changes:
# This still works perfectly
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, WebScrapingStrategy
config = CrawlerRunConfig(
scraping_strategy=WebScrapingStrategy() # Works as before
)
You have three options:
Your code will continue to work. WebScrapingStrategy is permanently aliased to LXMLWebScrapingStrategy.
For clarity, you can update your imports:
# Old (still works)
from crawl4ai import WebScrapingStrategy
strategy = WebScrapingStrategy()
# New (more explicit)
from crawl4ai import LXMLWebScrapingStrategy
strategy = LXMLWebScrapingStrategy()
Since LXMLWebScrapingStrategy is the default, you can omit the strategy parameter:
# Simplest approach - uses LXMLWebScrapingStrategy by default
config = CrawlerRunConfig()
If you use type hints, both work:
from crawl4ai import WebScrapingStrategy, LXMLWebScrapingStrategy
def process_with_strategy(strategy: WebScrapingStrategy) -> None:
# Works with both WebScrapingStrategy and LXMLWebScrapingStrategy
pass
# Both are valid
process_with_strategy(WebScrapingStrategy())
process_with_strategy(LXMLWebScrapingStrategy())
If you've subclassed WebScrapingStrategy, it continues to work:
class MyCustomStrategy(WebScrapingStrategy):
def __init__(self):
super().__init__()
# Your custom code
By consolidating to LXML:
This change simplifies Crawl4AI's internals while maintaining 100% backward compatibility. Your existing code continues to work, and you get better performance automatically.