TESTING_INFRASTRUCTURE.md
A comprehensive testing infrastructure has been implemented for ScrapeGraphAI with support for unit tests, integration tests, performance benchmarking, and automated CI/CD pipelines.
pytest.initests/conftest.pytests/fixtures/mock_server/)A fully functional HTTP server for consistent testing without external dependencies:
Features:
Endpoints:
/ - Home page/products - Product listings with prices and stock status/projects - Project listings with descriptions/api/data.json - JSON data endpoint/api/data.xml - XML data endpoint/api/data.csv - CSV data endpoint/slow - 2-second delay simulation/error/404 - 404 error page/error/500 - 500 error page/rate-limited - Rate limit testing (5 requests max)/dynamic - Dynamically generated content/pagination?page=N - Paginated contenttests/fixtures/benchmarking.py)Components:
BenchmarkResult - Individual test result trackingBenchmarkSummary - Statistical analysis across multiple runsBenchmarkTracker - Result collection and reportingbenchmark() - Decorator/function for benchmarkingMetrics Tracked:
Features:
tests/fixtures/helpers.py)Assertion Helpers:
assert_valid_scrape_result() - Validate scraping resultsassert_execution_info_valid() - Validate execution metadataassert_response_time_acceptable() - Performance assertionsassert_no_errors_in_result() - Error detectionMock Response Builders:
create_mock_llm_response() - Generate mock LLM responsescreate_mock_graph_result() - Mock graph execution resultsData Generators:
generate_test_html() - Customizable HTML generationgenerate_test_json() - Test JSON datagenerate_test_csv() - Test CSV dataValidation Utilities:
validate_schema_match() - Pydantic schema validationvalidate_extracted_fields() - Field extraction validationAdditional Utilities:
RateLimitHelper - Rate limiting testingretry_with_backoff() - Retry logic with exponential backoffcompare_results() - Result comparisonfuzzy_match_strings() - Fuzzy string matchingtests/integration/test_smart_scraper_integration.pytests/integration/test_multi_graph_integration.pytests/integration/test_file_formats_integration.py.github/workflows/test-suite.yml)Jobs:
Unit Tests
Integration Tests
Performance Benchmarks
Code Quality
Test Coverage Report
Test Summary
Triggers:
tests/README_TESTING.mdComprehensive guide covering:
Test compatibility across all supported LLM providers:
Organized test categorization:
@pytest.mark.unit - Fast unit tests@pytest.mark.integration - Integration tests@pytest.mark.slow - Long-running tests@pytest.mark.benchmark - Performance tests@pytest.mark.requires_api_key - Needs API credentials# Unit tests only
pytest -m "unit or not integration"
# Integration tests
pytest --integration
# Performance benchmarks
pytest --benchmark -m benchmark
# Slow tests
pytest --slow
# With coverage
pytest --cov=scrapegraphai --cov-report=html
def test_with_mock(mock_llm_model):
"""Fast test with mocked LLM."""
result = some_function(mock_llm_model)
assert result is not None
@pytest.mark.integration
@pytest.mark.requires_api_key
def test_real_scraping(openai_config, mock_server):
"""Test with real LLM and mock server."""
url = mock_server.get_url("/products")
scraper = SmartScraperGraph(
prompt="Extract products",
source=url,
config=openai_config
)
result = scraper.run()
assert_valid_scrape_result(result)
@pytest.mark.benchmark
def test_performance(benchmark_tracker, openai_config):
"""Benchmark scraping performance."""
import time
start = time.perf_counter()
# Run operation
end = time.perf_counter()
benchmark_tracker.record(BenchmarkResult(
test_name="my_test",
execution_time=end - start,
success=True
))
New Files:
pytest.ini - Pytest configurationtests/conftest.py - Shared fixturestests/fixtures/mock_server/server.py - Mock HTTP servertests/fixtures/benchmarking.py - Performance frameworktests/fixtures/helpers.py - Test utilitiestests/integration/test_smart_scraper_integration.pytests/integration/test_multi_graph_integration.pytests/integration/test_file_formats_integration.py.github/workflows/test-suite.yml - CI/CD workflowtests/README_TESTING.md - Testing documentationTESTING_INFRASTRUCTURE.md - This fileDirectories Created:
tests/fixtures/tests/fixtures/mock_server/tests/integration/benchmark_results/ (auto-created when running benchmarks)When adding new tests:
For questions or issues with the testing infrastructure, please open an issue on GitHub.