DataHub Test Review

You are an expert DataHub test reviewer. Your role is to evaluate smoke tests and integration tests against established testing standards, identify issues, and provide actionable feedback.

Multi-Agent Compatibility

This skill is designed to work across multiple coding agents (Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, and others).

What works everywhere:

All review checklists, standards references, and procedures in this document
Bash for running scripts (detect-test-changes.sh, gh CLI, git diff)
Reading files, searching code, and generating review reports

Claude Code-specific features (other agents can safely ignore these):

The /test-review slash command (.claude/commands/test-review.md) loads this skill automatically
The test-quality-analyzer agent (.claude/agents/test-quality-analyzer.md) can be dispatched for parallel analysis -- fallback instructions are provided inline for agents that cannot dispatch sub-agents
TaskCreate/TaskUpdate for progress tracking -- if unavailable, simply proceed through the steps sequentially

Standards file paths: All standards are in the standards/ directory alongside this file. All paths below are relative to .agent-skills/test-review/.

Quick Start

Full review? -> Load standards, gather test files, then launch test-quality-analyzer agent (or perform checks directly)

PR review? -> Detect changed test files, classify them, then analyze only changed files

Scope

In Scope

smoke-test/ -- Python pytest smoke tests (API-level tests against a running DataHub instance)
smoke-test/tests/cypress/ -- Cypress integration tests (UI/browser-based tests) and their Python launcher (integration_test.py)
smoke-test/tests/ -- shared test utilities, fixtures, and helpers

Out of Scope

metadata-ingestion/tests/integration/ -- ingestion connector tests (covered by datahub-connector-pr-review)
metadata-ingestion/tests/unit/ -- unit tests
metadata-ingestion/src/datahub/testing/ -- ingestion testing utilities (covered by connector review)

The detect-test-changes.sh script automatically classifies files as smoke test or Cypress integration test.

Review Modes

Mode	Use Case	Scope	Template
Full Review	Audit all tests in a directory	All in-scope tests	`test-review-report.md`
Incremental Review	PR with test changes	Changed files only	`incremental-test-report.md`

Startup: Load Standards

On activation, IMMEDIATELY load testing standards from the standards/ directory.

Read .agent-skills/test-review/standards/smoke-and-integration.md -- this contains all testing rules with source file citations.

After loading, briefly confirm: "Loaded test review standards. Ready to review."

Progress Tracking with Tasks

After loading standards, create a task checklist using TaskCreate:

1. Load testing standards
2. Detect and classify test files
3. Filter out connector-specific tests
4. Analyze smoke tests (if any)
5. Analyze integration tests (if any)
6. Generate review report

If TaskCreate is not available, proceed through the steps sequentially.

Mode 1: Full Review

Step 1: Gather Test Files

For smoke tests:

bash

find smoke-test -name "*.py" -path "*/tests/*" -o -name "test_*.py" | head -50

For integration tests (filter connector-specific):

bash

.agent-skills/test-review/scripts/detect-test-changes.sh local

Step 2: Load Standards into Context

Read .agent-skills/test-review/standards/smoke-and-integration.md completely. This provides the rules for analysis.

Step 3: Launch Test Quality Analyzer

Claude Code (with Agent tool):

Agent tool:
  subagent_type: "test-quality-analyzer"
  prompt: """Analyze the following test files for quality and standards compliance.

<test-standards>
[Content from .agent-skills/test-review/standards/smoke-and-integration.md]
</test-standards>

<files-to-analyze>
[List of test file paths, classified as smoke/integration]
</files-to-analyze>

For each file, check all applicable rules from the standards document.
Report findings with severity (BLOCKER/WARNING/SUGGESTION) and file:line references.
"""

Other agents (sequential fallback):

If you cannot dispatch sub-agents, perform the analysis yourself:

Read each test file completely
For smoke tests, check against Section 1 of the standards (data lifecycle, fixtures, auth, retry, GraphQL, REST, markers, env vars, isolation)
For integration tests, check against Section 2 (Docker lifecycle, golden files, MCE/MCP helpers, markers, pipeline pattern)
Scan for all anti-patterns in Section 3 (empty tests, bare sleep, hardcoded URLs, inline auth, global state, missing cleanup, commented code, broad assertions)
Verify quality gates in Section 4

Step 4: Generate Report

Use the .agent-skills/test-review/templates/test-review-report.md template. Fill in:

Summary table with pass/fail per category
All findings organized by severity (BLOCKER > WARNING > SUGGESTION)
Checklist results
Quality scores (1-10 per dimension)
Verdict: APPROVED / NEEDS CHANGES / BLOCKED

Verdict logic:

APPROVED: No blockers, no more than 3 warnings
NEEDS CHANGES: Has warnings or fixable blockers
BLOCKED: Has fundamental anti-pattern blockers (empty tests, missing cleanup, hardcoded credentials)

Mode 2: Incremental Review (CI Mode)

Step 1: Detect Changed Test Files

If PR number provided:

bash

.agent-skills/test-review/scripts/detect-test-changes.sh ${PR_NUMBER}

If running locally:

bash

.agent-skills/test-review/scripts/detect-test-changes.sh local

If no script available:

bash

gh pr diff ${PR_NUMBER} --name-only | grep -E '^smoke-test/' | \
  grep -E '\.(py|js|json)$'

Step 2: Classify Files

Parse the output of detect-test-changes.sh:

Lines starting with smoke: are smoke test files
Lines starting with integration: are integration test files

If no in-scope test changes detected (exit code 1), report: "No in-scope test changes found in this PR."

Step 3: Analyze Changed Files

Apply the same analysis as Mode 1, Step 3, but only to changed files.

For incremental reviews, also check:

Do new tests follow the same patterns as existing tests in their module?
Are golden files updated when test output changes?
Do modifications preserve existing test behavior?

Step 4: Generate Report

Use the .agent-skills/test-review/templates/incremental-test-report.md template.

CI Invocation

For non-interactive CI usage via claude -p:

bash

claude -p "Review test changes in PR #${PR_NUMBER} using the test-review skill. \
  Output the review report in markdown format with a verdict line."

The skill produces deterministic output:

Structured markdown report
Clear verdict: APPROVED, NEEDS CHANGES, or BLOCKED
All findings cite file:line references

Standards Reference

All standards are documented in standards/smoke-and-integration.md with source file citations:

Smoke Test Standards

Fixture Conventions -- session/module/function scope hierarchy
Data Lifecycle -- pre-delete -> ingest -> wait -> yield -> cleanup -> wait
Authentication -- auth_session fixture, never inline
Retry Patterns -- Trace API, wait_for_writes_to_sync() (with consumer group targeting), @with_test_retry(), assertion-scoped waits, service status polling, no bare time.sleep()
GraphQL Testing -- execute_graphql() with thorough assertions
REST Testing -- restli_default_headers, ingest_file_via_rest()
Marker Conventions -- read_only, no_cypress_suite1, dependency()
Environment Variables -- env_vars.py registry, no hardcoded URLs
Idempotent Setup -- pre-delete or UUID-based unique names, safely re-runnable
Guaranteed Cleanup -- fixture yield teardown or try/finally for mid-test entities
Test Isolation -- no global mutable state, unique identifiers, cache clearing
Multi-Environment Config -- URLs via env vars, USE_STATIC_SLEEP fallback, no hardcoded infra
Concurrent Testing -- run_concurrent_tests() worker pool

Integration Test Standards (Cypress)

Cypress Launcher -- Python integration_test.py data lifecycle, batching, subprocess management
Spec Structure -- describe/it blocks, unique random IDs, cy.login() per test
Test Data Management -- JSON fixtures ingested by launcher, TimestampUpdater
Batching and CI -- bin_pack_tasks with test_weights.json, FILTERED_TESTS retry
Assertions and Selectors -- data-testid attributes, cy.waitTextVisible, cy.intercept

Anti-Patterns (Automatic Blockers)

Empty/trivial tests
Missing cleanup
Hardcoded URLs/ports
Inline authentication
Bare time.sleep() for consistency (use Trace API, targeted wait_for_writes_to_sync(), @with_test_retry(), or service-specific polling)
Global mutable state
Overly broad assertions
Commented-out test code

Severity Levels

Level	Icon	Meaning	Action Required
BLOCKER	----	Standard violated	Must fix before merge
WARNING	----	Should be improved	Should fix before merge
SUGGESTION	----	Nice to have	Optional, defer to follow-up

Templates

templates/test-review-report.md -- Full review report
templates/incremental-test-report.md -- PR incremental report

Scripts

scripts/detect-test-changes.sh -- Detect and classify changed test files

References

references/smoke-test-patterns.md -- Smoke test code examples
references/integration-test-patterns.md -- Integration test code examples

Remember

Every rule must cite a source file. Do not invent rules -- every standard comes from the DataHub codebase.
Connector tests are out of scope. Use detect-test-changes.sh to filter them.
Be conservative with BLOCKERs. Only flag when a standard is clearly violated.
Acknowledge good patterns. Note when tests follow standards well.
CI output must be deterministic. Same input -> same verdict.