Back to Datahub

Smoke Test & Integration Test Standards

.agent-skills/test-review/standards/smoke-and-integration.md

1.5.0.429.4 KB
Original Source

Smoke Test & Integration Test Standards

Standards extracted from the DataHub repository. Every rule cites its source file.


1. Smoke Test Standards

1.1 Fixture Conventions

Session-scoped fixtures provide shared state across all tests in a session.

FixtureScopeSource
auth_sessionsessionsmoke-test/conftest.py:42-46
graph_clientsessionsmoke-test/conftest.py:60-62
openapi_graph_clientsessionsmoke-test/conftest.py:65-67
clear_graph_cachefunction (autouse)smoke-test/conftest.py:70-79

Module-scoped data fixtures must follow the _ingest_cleanup_data_impl pattern:

python
@pytest.fixture(scope="module", autouse=True)
def ingest_cleanup_data(auth_session, graph_client):
    yield from _ingest_cleanup_data_impl(
        auth_session, graph_client, "tests/<module>/data.json", "<module>"
    )

Source: smoke-test/conftest.py:82-117, used by 13+ test modules including tests/incidents/incidents_test.py:10-14, tests/domains/domains_test.py, tests/tags_and_terms/.

Rules:

  • BLOCKER: Data fixture must be scope="module" with autouse=True
  • BLOCKER: Must include both pre-delete (for idempotency) and post-delete (cleanup)
  • WARNING: Custom lifecycle fixtures should follow the same pre-delete/ingest/yield/cleanup pattern

1.2 Data Lifecycle

The canonical pattern is: pre-delete -> ingest -> wait -> yield -> cleanup -> wait.

Source: smoke-test/conftest.py:82-117

python
def _ingest_cleanup_data_impl(auth_session, graph_client, data_file, test_name, to_delete_urns=None):
    delete_urns_from_file(graph_client, data_file)   # Pre-delete for idempotency
    ingest_file_via_rest(auth_session, data_file)     # Ingest test data
    wait_for_writes_to_sync()                         # Wait for consistency
    yield                                              # Tests run here
    delete_urns_from_file(graph_client, data_file)   # Cleanup
    if to_delete_urns:
        delete_urns(graph_client, to_delete_urns)
    wait_for_writes_to_sync()                         # Wait after cleanup

Rules:

  • BLOCKER: Tests that create entities MUST clean them up
  • BLOCKER: Pre-delete step required for idempotent test runs
  • WARNING: wait_for_writes_to_sync() should follow both ingest and cleanup

1.3 Authentication

Source: smoke-test/tests/utils.py:363-478

Authentication uses TestSessionWrapper which:

  • Wraps requests.Session with Authorization: Bearer token injection
  • Auto-generates GMS token via exponential backoff (10 attempts, 4-30s)
  • Clones header dicts to prevent cross-test pollution (line 388)
  • Auto-waits for sync on POST/PUT calls (line 396-402)
  • Revokes token on session destroy

Rules:

  • BLOCKER: Never create auth tokens inline -- use the auth_session fixture
  • BLOCKER: Never hardcode credentials -- use get_admin_credentials() from utils.py
  • WARNING: Don't bypass TestSessionWrapper with raw requests.get/post

1.4 Retry and Consistency Patterns

Six mechanisms exist, ordered from most precise to most general:

A. Trace API (write confirmation) -- confirms a specific async write completed

Source: smoke-test/tests/trace/test_api_trace.py

python
# After an async write, extract trace_id from response headers/system metadata
trace_id = response.headers.get("X-DataHub-Trace-Id")

# Query the trace endpoint to confirm processing
trace_resp = auth_session.post(
    f"{auth_session.gms_url()}/openapi/v1/trace/write/{trace_id}",
    params={"onlyIncludeErrors": "false", "detailed": "true"},
    json={urn: [aspect_name]},
)
trace_data = trace_resp.json()
assert trace_data["success"] is True
assert trace_data["primaryStorage"]["writeStatus"] == "ACTIVE_STATE"
assert trace_data["searchStorage"]["writeStatus"] == "ACTIVE_STATE"

B. wait_for_writes_to_sync() with consumer group targeting -- polls Kafka lag for specific consumers

Source: smoke-test/tests/consistency_utils.py:31-119

python
# Target only the MCP consumer for faster waits (primary storage only)
wait_for_writes_to_sync(mcp_only=True)

# Target only the MAE consumer (search indexing only)
wait_for_writes_to_sync(mae_only=True)

# Shorter timeout for scoped waits (default is 120s)
wait_for_writes_to_sync(max_timeout_in_sec=30)

# Target a specific consumer group
wait_for_writes_to_sync(consumer_group="my-consumer-group")

C. @with_test_retry() decorator -- for read-after-write assertions

Source: smoke-test/tests/utils.py:126-156

python
@with_test_retry()
def _ensure_entity_present(auth_session, urn):
    response = auth_session.get(...)
    assert response.json()["value"]

D. Assertion-scoped waits -- time-boxed retries for specific assertions

Source: smoke-test/tests/assertions/sdk/helpers.py, smoke-test/tests/schema_fields/schema_evolution.py

python
# Pattern 1: scoped wait helper with shorter timeout
def wait_for_assertion_sync():
    wait_for_writes_to_sync(max_timeout_in_sec=30)

# Pattern 2: tenacity stop_after_delay for time-boxed assertions
@retry(stop=stop_after_delay(30), wait=wait_fixed(2), reraise=True)
def _verify_schema_field():
    result = graph_client.get_aspect(...)
    assert result is not None

E. Integration service status polling -- for action/integration lifecycle tests

Source: smoke-test/tests/integrations_service_utils.py

python
# Wait until an action has processed at least one event
wait_until_action_has_processed_event(action_urn, integrations_url, event_time)

# Wait until an action finishes reloading after config change
wait_for_reload_completion(action_urn, integrations_url)

F. Direct tenacity.retry -- for custom retry requirements

Source: smoke-test/tests/search/test_lineage_search_index_fields.py:99-104

Choosing the right mechanism:

ScenarioMechanismWhy
Confirming a single async write was processedTrace API (A)Most precise; confirms the exact write
Waiting for all pending writes after bulk ingest/cleanupwait_for_writes_to_sync() (B)Drains Kafka lag to zero across consumers
Waiting for only primary storage or only searchwait_for_writes_to_sync(mcp_only/mae_only) (B)Faster; skips irrelevant consumers
Read-after-write assertion (entity should exist)@with_test_retry() (C)Retries the assertion until consistent
Time-boxed assertion (must pass within N seconds)Assertion-scoped wait (D)stop_after_delay(N) caps total wait time
Action/integration lifecycle completedService status polling (E)Polls action-specific status endpoints
Custom retry logic not covered aboveDirect tenacity.retry (F)Full control over stop/wait/retry conditions
Cypress UI element should appearcy.waitTextVisible / cy.interceptBuilt-in Cypress retry; never use cy.wait(ms)

Rules:

  • BLOCKER: Never use bare time.sleep() for eventual consistency. Use one of the mechanisms above.
  • WARNING: Prefer Trace API over blanket wait_for_writes_to_sync() when confirming a single known write
  • WARNING: Prefer @with_test_retry() over custom tenacity.retry for standard read-after-write patterns
  • SUGGESTION: Use max_timeout_in_sec parameter to scope wait_for_writes_to_sync() waits to the minimum needed
  • SUGGESTION: Use consumer group targeting (mcp_only, mae_only) when only one storage layer is relevant
  • SUGGESTION: Use max_attempts parameter when fewer retries are appropriate

1.5 GraphQL Testing

Source: smoke-test/tests/utils.py:188-224

The standard pattern:

python
res_data = execute_graphql(auth_session, query, variables)
assert res_data["data"]["entity"]["field"] == expected_value

execute_graphql() already asserts:

  • Response is not empty
  • res_data["data"] is not None
  • No errors key in response

Rules:

  • WARNING: Use execute_graphql() instead of manual auth_session.post() to GraphQL endpoint
  • WARNING: Assert specific field values, not just that the response exists
  • SUGGESTION: Inline GraphQL queries are acceptable but should be readable (use triple-quoted strings)

1.6 REST API Testing

Source: smoke-test/test_e2e.py:32-34, smoke-test/tests/utils.py:243-265

python
restli_default_headers = {"X-RestLi-Protocol-Version": "2.0.0"}

Ingestion via REST uses ingest_file_via_rest(auth_session, filename) which creates a Pipeline with datahub-rest sink.

Rules:

  • WARNING: Use restli_default_headers constant for RestLi API calls
  • WARNING: Use ingest_file_via_rest() helper instead of manual Pipeline creation for test data ingestion
  • SUGGESTION: For OpenAPI v3 tests, use the concurrent_openapi.evaluate_test() JSON fixture pattern

1.7 Marker Conventions

Source: smoke-test/pyproject.toml:84-88

MarkerPurposeWhen to Use
read_onlyTests that don't mutate dataService health, search, analytics
no_cypress_suite1Module-level batch separationLarge test modules
dependency()Test orderingWhen test B depends on test A's side effects

Rules:

  • WARNING: read_only tests must not create, modify, or delete any entities
  • WARNING: @pytest.mark.dependency() chains should be kept short (ideally <=3 levels)
  • SUGGESTION: New test modules should specify batch markers for CI parallelism

1.8 Environment Variable Discipline

Source: smoke-test/tests/utilities/env_vars.py (250 lines, 30+ variables)

Categories: Core DataHub config, admin credentials, database config, testing config, consistency testing, Cypress, integration testing, Slack.

Violations found:

  • tests/cypress/integration_test.py:279-280 -- direct os.getenv("BATCH_NUMBER")
  • tests/analytics/conftest.py:147 -- direct ELASTICSEARCH_URL read
  • Multiple files set os.environ["DATAHUB_TELEMETRY_ENABLED"] redundantly

Rules:

  • BLOCKER: New tests must use env_vars.py getters for all DataHub configuration
  • WARNING: Do not hardcode URLs, ports, or hostnames -- use env_vars registry
  • SUGGESTION: Consolidate telemetry suppression to conftest.py session scope

1.9 Idempotent Test Setup

Tests must create exactly what they need and be safely re-runnable. Two patterns ensure this:

A. Pre-delete before ingest (the data lifecycle pattern)

Source: smoke-test/conftest.py:107 -- delete_urns_from_file(graph_client, data_file) runs before ingest to clear stale data from previous runs.

B. UUID-based unique entity names for tests that create entities mid-test

Source: smoke-test/tests/cli/user_cmd/test_user_add.py:15-16

python
def generate_test_email():
    """Generate a unique email for testing to avoid conflicts."""
    return f"test-user-{uuid.uuid4()}@example.com"

Source: smoke-test/tests/assertions/assertions_test.py:369

python
assertion_urn = f"urn:li:assertion:{uuid.uuid4()}"
dataset_urn = make_dataset_urn(platform="postgres", name=f"assertion_patch_{uuid.uuid4()}")

Rules:

  • BLOCKER: Tests must be safely re-runnable (idempotent). Either pre-delete existing data or use UUID-based unique names.
  • WARNING: Do not assume a clean environment. Previous test runs, failed tests, or parallel batches may have left data behind.
  • SUGGESTION: Prefer UUID-based unique names for entities created mid-test; use pre-delete for fixture-managed bulk data.

1.10 Guaranteed Cleanup

Cleanup must be guaranteed even when tests fail. Two mechanisms exist:

A. Fixture teardown via yield (preferred for bulk test data)

Source: smoke-test/conftest.py:82-117 -- the _ingest_cleanup_data_impl pattern uses yield to separate setup from teardown. pytest guarantees the code after yield runs even if the test fails.

python
@pytest.fixture(scope="module", autouse=True)
def ingest_cleanup_data(auth_session, graph_client):
    yield from _ingest_cleanup_data_impl(...)
    # Cleanup runs even on test failure

B. try/finally blocks (required for entities created mid-test)

Source: smoke-test/tests/assertions/assertions_test.py:373-452

python
def test_assertion_info_patch_preserves_note(graph_client):
    assertion_urn = f"urn:li:assertion:{uuid.uuid4()}"
    dataset_urn = make_dataset_urn(platform="postgres", name=f"assertion_patch_{uuid.uuid4()}")
    try:
        graph_client.emit(MetadataChangeProposalWrapper(...))
        wait_for_writes_to_sync()
        # ... test logic and assertions ...
    finally:
        delete_urn(graph_client, assertion_urn)
        delete_urn(graph_client, dataset_urn)
        wait_for_writes_to_sync()

Source: smoke-test/test_authentication_e2e.py:381-384

python
    finally:
        # Cleanup
        try:
            if token_id:
                revoke_api_token(auth_session, token_id)

Source: smoke-test/test_system_info.py:324-327 (same pattern for token cleanup)

Rules:

  • BLOCKER: Entities created mid-test (not via fixture) MUST be cleaned up in a try/finally block
  • BLOCKER: Fixture-managed data MUST use yield-based teardown (pytest guarantees execution)
  • WARNING: Cleanup finally blocks should themselves be wrapped in try/except to avoid masking the original test failure
  • SUGGESTION: Prefer fixture-based lifecycle over try/finally when the data setup/teardown is shared across tests

1.11 Test Isolation

Tests must be independently runnable without relying on side effects from other tests.

Source: Observed anti-pattern in smoke-test/tests/cli/datahub_cli.py:14-15

python
# ANTI-PATTERN: global mutable state
ingested_dataset_run_id = ""
ingested_editable_run_id = ""

Source of isolation mechanisms:

  1. Cache clearing -- smoke-test/conftest.py:70-79: The clear_graph_cache autouse function-scoped fixture clears the get_default_graph LRU cache before each test, preventing stale credentials from leaking between tests.

  2. Header dict cloning -- smoke-test/tests/utils.py:388: TestSessionWrapper clones header dicts before modification (kwargs["headers"] = dict(kwargs["headers"])) to prevent cross-test pollution.

  3. Unique entity names -- smoke-test/tests/cli/user_cmd/test_user_add.py:215: Uses f"testuser_{uuid.uuid4().hex[:8]}" to avoid collisions with other tests or parallel batches.

Rules:

  • BLOCKER: No global mutable state -- use fixture return values or request.config cache
  • BLOCKER: No cross-test dependencies via shared module-level variables
  • WARNING: Tests should be independently runnable (order-independent where possible)
  • WARNING: Use unique identifiers (UUID) when creating entities to avoid collisions with parallel test batches
  • SUGGESTION: If test ordering is truly required, use @pytest.mark.dependency() and keep chains short (<=3 levels)

1.12 Multi-Environment Configuration

Tests must be configurable to run against different environments (local quickstart, CI, remote DataHub instances) without code changes.

Source: smoke-test/tests/utilities/env_vars.py (full file, 250 lines)

The env_vars.py module provides environment-based configuration for all infrastructure endpoints:

GetterEnv VarDefaultPurpose
get_gms_url()DATAHUB_GMS_URLNoneGMS endpoint
get_frontend_url()DATAHUB_FRONTEND_URLNoneFrontend endpoint
get_kafka_url()DATAHUB_KAFKA_URLNoneKafka broker
get_mysql_url()DATAHUB_MYSQL_URLlocalhost:3306MySQL database
get_postgres_url()DATAHUB_POSTGRES_URLlocalhost:5432PostgreSQL database
get_elasticsearch_url()ELASTICSEARCH_URLhttp://localhost:9200Elasticsearch
get_admin_username()ADMIN_USERNAMEdatahubAuth credentials
get_admin_password()ADMIN_PASSWORDdatahubAuth credentials

Consistency mode toggle:

Source: smoke-test/tests/consistency_utils.py:7, smoke-test/tests/utilities/env_vars.py:157-159

python
USE_STATIC_SLEEP: bool = env_vars.get_use_static_sleep()

When USE_STATIC_SLEEP=true, wait_for_writes_to_sync() falls back to a fixed sleep instead of polling Kafka consumer lag. This is required for environments where the test runner cannot access the Kafka broker container (e.g., remote DataHub instances, k8s clusters without docker exec).

Environment-aware features:

Source: smoke-test/tests/utilities/env_vars.py:142-149

  • K8S_CLUSTER_ENABLED -- toggles Kubernetes-specific behavior
  • TEST_DATAHUB_VERSION -- allows version-specific test logic

Source: smoke-test/smoke.sh:15 -- RUN_QUICKSTART controls whether to launch DataHub locally or use an existing instance.

Source: metadata-ingestion/tests/test_helpers/docker_helpers.py:28-29 -- cleanup_image() skips image cleanup when not in CI (is_ci() check) to speed up local development.

Rules:

  • BLOCKER: Never hardcode localhost, port numbers, or URLs. Use env_vars.py getters.
  • WARNING: Tests that depend on Docker container access (e.g., docker exec for Kafka lag checks) must have a USE_STATIC_SLEEP fallback path.
  • WARNING: Credential defaults (datahub/datahub) are acceptable for local dev, but tests must support override via ADMIN_USERNAME/ADMIN_PASSWORD.
  • WARNING: When deploying a DataHub instance for testing, always use --local unless actively told not to. This ensures the instance runs locally and avoids remote deployment surprises.
  • SUGGESTION: Use env_vars.get_k8s_cluster_enabled() to skip Docker-dependent tests in Kubernetes environments.
  • SUGGESTION: Document which environment variables must be set for each test execution mode (local, CI, remote).

1.13 Concurrent Testing

Source: smoke-test/tests/utilities/concurrent_test_runner.py, concurrent_openapi.py

Worker pool pattern for parallel test execution within a single test function:

python
run_concurrent_tests(test_cases, test_fn, num_workers=5)

Rules:

  • SUGGESTION: Use run_concurrent_tests() for parametric API testing
  • WARNING: Concurrent tests must be thread-safe (no shared mutable state)

2. Integration Test Standards (Cypress)

Integration tests are Cypress UI tests located in smoke-test/tests/cypress/. They are launched by the Python wrapper integration_test.py and run actual browser-based tests against a running DataHub instance.

2.1 Cypress Test Launcher (Python)

Source: smoke-test/tests/cypress/integration_test.py

The Python pytest wrapper handles:

  • Data ingestion (lines 144-173): Ingests multiple JSON fixture files via REST
  • Fixture teardown (lines 176-202): Cleans up all ingested data
  • Batching (lines 214-242): Uses bin_pack_tasks with test_weights.json for CI parallelism
  • Filtered tests (lines 245-269): Supports FILTERED_TESTS env var for retry mode
  • Cypress execution (lines 272-343): Launches npx cypress run via subprocess
python
@pytest.fixture(scope="module", autouse=True)
def ingest_cleanup_data(auth_session, graph_client):
    ingest_data(auth_session, graph_client)  # Ingest multiple JSON fixtures
    yield
    # Cleanup: delete_urns_from_file for each fixture
    delete_urns_from_file(graph_client, f"{CYPRESS_TEST_DATA_DIR}/{TEST_DATA_FILENAME}")
    # ... more cleanup ...

Rules:

  • BLOCKER: Cypress launcher must clean up ALL ingested data in fixture teardown
  • WARNING: Use env_vars.py getters, not direct os.getenv() (violation at lines 279-280)
  • WARNING: Must support FILTERED_TESTS for CI retry workflows
  • WARNING: The CYPRESS_BASE_URL env var does NOT override Cypress e2e.baseUrl -- Cypress expects CYPRESS_baseUrl (case-sensitive) or --config baseUrl=<url>. On k3d where the frontend is not on localhost:9002, pass --config baseUrl=http://<k3d-host>:<port> to the npx cypress run command.

2.2 Cypress Spec Structure

Source: smoke-test/tests/cypress/cypress/e2e/mutations/domains.js:1-60

Cypress specs follow the describe/it pattern:

javascript
const test_domain_id = Math.floor(Math.random() * 100000);
const test_domain = `CypressDomainTest ${test_domain_id}`;

describe("add remove domain", () => {
  beforeEach(() => {
    cy.intercept("POST", "/api/v2/graphql", (req) => {
      aliasQuery(req, "appConfig");
    });
  });

  it("create domain", () => {
    cy.login();
    cy.goToDomainList();
    cy.clickOptionWithText("New Domain");
    cy.get('[data-testid="create-domain-name"]').click().type(test_domain);
    // ...
    cy.waitTextVisible(test_domain);
  });
});

Rules:

  • WARNING: Each describe block should use unique test data (randomized IDs) for isolation
  • WARNING: Use cy.login() per test or in beforeEach -- do not assume a logged-in state
  • SUGGESTION: Use data-testid selectors (not CSS classes or tag selectors) for stability

2.3 Cypress Test Data Management

Source: smoke-test/tests/cypress/integration_test.py:28-34, smoke-test/tests/cypress/data.json

Test data files:

  • data.json -- primary test entities (datasets, dashboards, users)
  • cypress_dbt_data.json -- dbt-specific test data
  • patch-data.json -- data for patch/update tests
  • incidents_test.json -- incident test data
  • onboarding.json -- generated dynamically for onboarding step states

Source: smoke-test/tests/cypress/integration_test.py:125-137 -- timestamp updater for fixture freshness:

python
def update_fixture_timestamps(cypress_test_data_dir):
    updater = TimestampUpdater(timestamp_config)
    updater.update_all_configured_files(cypress_test_data_dir)

Rules:

  • BLOCKER: Test data JSON files must be committed to the repository
  • WARNING: Use TimestampUpdater for files with time-sensitive data
  • WARNING: Dynamically generated data files (like onboarding.json) must be cleaned up in teardown

2.4 Cypress Batching and CI

Source: smoke-test/tests/cypress/integration_test.py:214-242

python
def _get_cypress_tests_batch():
    all_tests = _get_js_files("tests/cypress/cypress/e2e")
    # Load weights from test_weights.json
    test_batches = bin_pack_tasks(tests_with_weights, env_vars.get_batch_count())
    return test_batches[env_vars.get_batch_number()]

Uses the same bin_pack_tasks algorithm as smoke tests but with Cypress-specific test weights.

Rules:

  • WARNING: New Cypress specs should be added to test_weights.json after initial runs
  • SUGGESTION: Keep Cypress specs small (one feature per spec) for better batch distribution

2.5 Cypress Assertions and Selectors

Source: smoke-test/tests/cypress/cypress/e2e/mutations/domains.js

Good patterns observed:

  • cy.waitTextVisible(text) -- waits for text to appear (handles async rendering)
  • cy.get('[data-testid="..."]') -- stable selectors via test IDs
  • cy.clickOptionWithText(text) -- custom command for text-based clicks
  • cy.intercept("POST", "/api/v2/graphql", ...) -- GraphQL request interception

Rules:

  • BLOCKER: Every Cypress it block must have at least one assertion (.should(), cy.waitTextVisible, or assert)
  • WARNING: Prefer data-testid selectors over CSS class selectors (classes change with styling)
  • WARNING: Use cy.intercept for GraphQL mocking/waiting, not arbitrary cy.wait(ms)
  • SUGGESTION: Use Cypress custom commands (cy.login(), cy.goToDomainList()) for common operations

3. Anti-Patterns (Automatic Blockers)

These patterns trigger automatic BLOCKER findings:

3.1 Empty or Trivial Tests

python
# ANTI-PATTERN
def test_basic():
    assert True

def test_defaults():
    config = MyConfig()
    assert config.platform == "myplatform"  # Testing defaults is trivial

3.2 Missing Cleanup

python
# ANTI-PATTERN: Creates data but never cleans up
def test_create_entity(auth_session):
    ingest_file_via_rest(auth_session, "data.json")
    # ... assertions ...
    # No cleanup! Data persists across tests.

Source of correct pattern: smoke-test/conftest.py:82-117

3.3 Hardcoded URLs and Ports

python
# ANTI-PATTERN
response = requests.get("http://localhost:8080/config")

# CORRECT
response = auth_session.get(f"{auth_session.gms_url()}/config")

Source: smoke-test/tests/utilities/env_vars.py

3.4 Inline Authentication

python
# ANTI-PATTERN
token = generate_token(username="datahub", password="datahub")
headers = {"Authorization": f"Bearer {token}"}

# CORRECT: Use auth_session fixture
def test_something(auth_session):
    response = auth_session.get(...)

Source: smoke-test/tests/utils.py:363-478

3.5 Bare Sleep for Consistency

python
# ANTI-PATTERN
time.sleep(5)  # Wait for indexing
assert search_results(query) == expected

# CORRECT: Use the most precise mechanism available

# Option 1 (best for single writes): Trace API
trace_resp = auth_session.post(
    f"{auth_session.gms_url()}/openapi/v1/trace/write/{trace_id}",
    params={"onlyIncludeErrors": "false", "detailed": "true"},
    json={urn: [aspect_name]},
)
assert trace_resp.json()["success"] is True

# Option 2: Targeted consumer wait with scoped timeout
wait_for_writes_to_sync(mae_only=True, max_timeout_in_sec=30)

# Option 3: Retry the assertion itself
@with_test_retry()
def _verify_search():
    assert search_results(query) == expected

Source of anti-pattern: smoke-test/tests/incidents/incidents_test.py:25-26 (60+ occurrences across suite) Source of Trace API pattern: smoke-test/tests/trace/test_api_trace.py Source of targeted wait: smoke-test/tests/consistency_utils.py:31-119

3.6 Cross-Test Dependencies via Global State

python
# ANTI-PATTERN
global shared_run_id
shared_run_id = pipeline.run_id()

Source of anti-pattern: smoke-test/tests/cli/datahub_cli.py:14-15

3.7 Overly Broad Assertions

python
# ANTI-PATTERN
assert response.status_code == 200  # Only checks HTTP status, not content

# CORRECT
response.raise_for_status()
data = response.json()
assert data["value"]["entityName"] == "expected_name"

3.8 Commented-Out Test Code

python
# ANTI-PATTERN
# breakpoint()
# TODO: Re-enable this test
# def test_important_feature():

Source of anti-pattern: smoke-test/tests/lineage/test_lineage.py:119,757,763,777,789,791


4. Quality Gates

These are the minimum requirements for test approval:

GateSmoke Tests (Python)Integration Tests (Cypress)
Idempotent setupREQUIRED (pre-delete or UUID names)REQUIRED (random IDs per describe)
Guaranteed cleanupREQUIRED (fixture yield or try/finally)REQUIRED (launcher fixture teardown)
Data lifecycle patternREQUIRED (_ingest_cleanup_data_impl)Launcher handles ingest/cleanup
Non-trivial assertions>= 1 per test function>= 1 per it block (.should(), etc.)
Test isolationNo global state, UUID entity namesRandom IDs, cy.login() per test
Descriptive test namesREQUIRED (not test_1)REQUIRED (descriptive it("..."))
Environment configFrom env_vars.py (no hardcoded URLs)Launcher uses env_vars.py
Retry/consistencyTrace API, wait_for_writes_to_sync (targeted), @with_test_retry, assertion-scoped waits, service pollingcy.waitTextVisible, cy.intercept
Markersread_only, no_cypress_suite1, etc.N/A (spec files, not pytest markers)
Multi-env supportURLs via env vars, USE_STATIC_SLEEP fallbackLauncher handles env config
Stable selectorsN/AREQUIRED (data-testid, not CSS classes)