skills/skills/compare-test-runs/SKILL.md
You'll typically receive two test run identifiers. Follow these steps:
tuist test show <id> --json for both base and head test runs.tuist test module list <test-run-id> --json and tuist test suite list <test-run-id> --json to get module and suite breakdowns.tuist test case run list <identifier> --json to get individual test case results.tuist test case run show <id> --json.Fetch each directly:
tuist test show <base-id> --json
tuist test show <head-id> --json
List recent test runs on each branch to identify test run IDs:
tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5
Pick the latest test run ID from each branch's results.
main).After fetching both test runs, compare:
| Metric | What to check |
|---|---|
status | Flag if base passed but head failed |
duration | Flag if head is >10% slower |
total_test_count | Note if test count changed (new or removed tests) |
failed_test_count | Compare failure counts |
flaky_test_count | Compare flaky counts |
avg_test_duration | Flag significant changes |
Fetch module and suite-level results for both test runs to understand which areas regressed:
tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json
tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --json
Match modules and suites by name across both runs to identify areas with new failures or duration regressions.
Fetch test case runs for both test runs:
tuist test case run list <identifier> --json --page-size 100
Match test cases by their name + module_name + suite_name across both runs.
Group test cases into categories:
For each new failure, get detailed information:
tuist test case run show <test-case-run-id> --json
Key fields to examine:
failures[].message -- the assertion or error messagefailures[].path -- source file pathfailures[].line_number -- exact line of failurefailures[].issue_type -- type of issuerepetitions -- if present, shows retry behavior (flaky detection)crash_report -- crash data if test runner crashedThe tuist test case run show output includes attachment and crash report information. Review:
Produce a summary with:
Example:
Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)
Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)
New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
Message: "Expected status 401, got 500"
File: Tests/AuthModuleTests/LoginTests.swift:42
Likely cause: Server error handling changed for expired tokens
2. NetworkTests/RetryTests/test_retry_on_timeout
Message: "Timed out waiting for retry"
File: Tests/NetworkTests/RetryTests.swift:87
Likely cause: Timeout threshold too low after network layer refactor
Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)
Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTests