Debugging and Managing Flaky Tests

Understanding Flakiness Types
Detection and Reproduction
Root Cause Analysis
Fixing Strategies by Type
CI-Specific Flakiness
Quarantine and Management
Prevention Strategies

Understanding Flakiness Types

Categories of Flakiness

Most flaky tests fall into distinct categories requiring different remediation:

Category	Symptoms	Common Causes
UI-driven	Element not found, click missed	Missing waits, animations, dynamic rendering
Environment-driven	CI-only failures	Slower CPU, memory limits, cold browser starts
Data/parallelism-driven	Fails with multiple workers	Shared backend data, reused accounts, state collisions
Test-suite-driven	Fails when run with other tests	Leaked state, shared fixtures, order dependencies

Flakiness Decision Tree

Test fails intermittently
├─ Fails locally too?
│  ├─ YES → Timing/async issue → Check waits and assertions
│  └─ NO → CI-specific → Check environment differences
│
├─ Fails only with multiple workers?
│  └─ YES → Parallelism issue → Check data isolation
│
├─ Fails only when run after specific tests?
│  └─ YES → State leak → Check fixtures and cleanup
│
└─ Fails randomly regardless of conditions?
   └─ External dependency → Check network/API stability

Detection and Reproduction

Confirming Flakiness

bash

# Run test multiple times to confirm instability
npx playwright test tests/checkout.spec.ts --repeat-each=20

# Run with single worker to isolate parallelism issues
npx playwright test --workers=1

# Run in CI-like conditions locally
CI=true npx playwright test --repeat-each=10

Reproduction Strategies

typescript

// playwright.config.ts - Enable artifacts for flaky test investigation
export default defineConfig({
  retries: process.env.CI ? 2 : 0,
  use: {
    trace: 'on-first-retry', // Capture trace on retry
    video: 'retain-on-failure',
    screenshot: 'only-on-failure',
  },
})

Identify Flaky Tests Programmatically

typescript

// Track test results across runs
test.afterEach(async ({}, testInfo) => {
  if (testInfo.retry > 0 && testInfo.status === 'passed') {
    console.warn(`FLAKY: ${testInfo.title} passed on retry ${testInfo.retry}`)
    // Log to your tracking system
  }
})

Root Cause Analysis

Event Logging for Race Conditions

Add comprehensive event logging to expose timing issues:

typescript

test.beforeEach(async ({page}) => {
  page.on('console', (msg) => console.log(`CONSOLE [${msg.type()}]:`, msg.text()))
  page.on('pageerror', (err) => console.error('PAGE ERROR:', err.message))
  page.on('requestfailed', (req) => console.error(`REQUEST FAILED: ${req.url()}`))
})

For comprehensive console error handling (fail on errors, allowed patterns, fixtures), see console-errors.md.

Network Timing Analysis

typescript

// Capture slow or failed requests
test.beforeEach(async ({page}) => {
  const slowRequests: string[] = []

  page.on('requestfinished', (request) => {
    const timing = request.timing()
    const duration = timing.responseEnd - timing.requestStart
    if (duration > 2000) {
      slowRequests.push(`${request.url()} took ${duration}ms`)
    }
  })

  page.on('requestfailed', (request) => {
    console.error(`Failed: ${request.url()} - ${request.failure()?.errorText}`)
  })
})

Trace Analysis

bash

# View trace from failed CI run
npx playwright show-trace path/to/trace.zip

# Generate trace for specific test
npx playwright test tests/flaky.spec.ts --trace on

Fixing Strategies by Type

UI-Driven Flakiness

Problem: Element not ready when action executes

typescript

// ❌ BAD: No wait for element state
await page.click('#submit')
await page.fill('#username', 'test') // Element may not be ready

// ✅ GOOD: Actions + assertions pattern (auto-waiting built-in)
await page.getByRole('button', {name: 'Submit'}).click()
await expect(page.getByRole('heading', {name: 'Dashboard'})).toBeVisible()

Problem: Animations or transitions interfere

typescript

// ❌ BAD: Click during animation
await page.click('.menu-item')

// ✅ GOOD: Wait for animation to complete
await page.getByRole('menuitem', {name: 'Settings'}).click()
await expect(page.getByRole('dialog')).toBeVisible()
// Or disable animations entirely
await page.emulateMedia({reducedMotion: 'reduce'})

Problem: Brittle selectors

typescript

// ❌ BAD: Fragile CSS chain
await page.click('div.container > div:nth-child(2) > button.btn-primary')

// ✅ GOOD: Semantic selectors
await page.getByRole('button', {name: 'Continue'}).click()
await page.getByTestId('checkout-button').click()
await page.getByLabel('Email address').fill('[email protected]')

Async/Timing Flakiness

Problem: Race between test and application

typescript

// ❌ BAD: Arbitrary sleep
await page.click('#load-data')
await page.waitForTimeout(3000) // Hope data loads in 3s

// ✅ GOOD: Wait for specific condition
await page.click('#load-data')
await expect(page.locator('.data-row')).toHaveCount(10, {timeout: 10000})

// ✅ BETTER: Wait for network response, then assert
const responsePromise = page.waitForResponse(
  (r) => r.url().includes('/api/data') && r.request().method() === 'GET' && r.ok(),
)
await page.click('#load-data')
await responsePromise
await expect(page.locator('.data-row')).toHaveCount(10)

For comprehensive waiting strategies (navigation, element state, network, polling with toPass()), see assertions-waiting.md.

Problem: Complex async state

typescript

// Custom wait for application-specific conditions
await page.waitForFunction(() => {
  const app = (window as any).__APP_STATE__
  return app?.isReady && !app?.isLoading
})

// Wait for multiple conditions
await Promise.all([
  page.waitForResponse('**/api/user'),
  page.waitForResponse('**/api/settings'),
  page.getByRole('button', {name: 'Load'}).click(),
])

Data/Parallelism-Driven Flakiness

Problem: Tests share backend data

typescript

// ❌ BAD: All workers use same user
const testUser = {email: '[email protected]', password: 'pass123'}

// ✅ GOOD: Unique data per worker
import {test as base} from '@playwright/test'

export const test = base.extend<{}, {testUser: {email: string; id: string}}>({
  testUser: [
    async ({}, use, workerInfo) => {
      const email = `test-${workerInfo.workerIndex}-${Date.now()}@example.com`
      const user = await createTestUser(email)
      await use(user)
      await deleteTestUser(user.id)
    },
    {scope: 'worker'},
  ],
})

Problem: Shared storageState across workers

typescript

// ❌ BAD: All workers share same auth state
use: {
  storageState: '.auth/user.json',
}

// ✅ GOOD: Per-worker auth state
export const test = base.extend<{}, { workerStorageState: string }>({
  workerStorageState: [
    async ({ browser }, use, workerInfo) => {
      const id = workerInfo.workerIndex;
      const fileName = `.auth/user-${id}.json`;

      if (!fs.existsSync(fileName)) {
        const page = await browser.newPage({ storageState: undefined });
        await authenticateUser(page, `worker${id}@test.com`);
        await page.context().storageState({ path: fileName });
        await page.close();
      }

      await use(fileName);
    },
    { scope: "worker" },
  ],
});

Test-Suite-Driven Flakiness (State Leaks)

Problem: Tests affect each other

typescript

// ❌ BAD: Module-level state persists across tests
let sharedPage: Page

test.beforeAll(async ({browser}) => {
  sharedPage = await browser.newPage() // Shared across tests!
})

// ✅ GOOD: Use Playwright's default isolation (fresh context per test)
test('first test', async ({page}) => {
  // Fresh page for this test
})

test('second test', async ({page}) => {
  // Fresh page for this test
})

Problem: Fixture cleanup not happening

typescript

// ✅ GOOD: Proper fixture with cleanup
export const test = base.extend<{tempFile: string}>({
  tempFile: async ({}, use) => {
    const file = `/tmp/test-${Date.now()}.json`
    fs.writeFileSync(file, '{}')

    await use(file)

    // Cleanup always runs, even on failure
    if (fs.existsSync(file)) {
      fs.unlinkSync(file)
    }
  },
})

CI-Specific Flakiness

Why Tests Fail Only in CI

CI Condition	Impact	Solution
Slower CPU	Actions complete later than expected	Use auto-waiting, not timeouts
Cold browser start	No cached assets, slower initial load	Add explicit waits for first navigation
Headless mode	Different rendering behavior	Test locally in headless mode
Shared runners	Resource contention	Reduce parallelism or use dedicated runners
Network latency	API calls slower	Mock external APIs, increase timeouts for real calls

Simulating CI Locally

bash

# Run headless with CI environment variable
CI=true npx playwright test

# Limit CPU (Linux/Mac)
cpulimit -l 50 -- npx playwright test

# Run in Docker matching CI environment
docker run -it --rm \
  -v $(pwd):/work \
  -w /work \
  mcr.microsoft.com/playwright:v1.40.0-jammy \
  npx playwright test

Consistent Viewport and Scale

typescript

// playwright.config.ts - Match CI rendering exactly
export default defineConfig({
  use: {
    viewport: {width: 1280, height: 720},
    deviceScaleFactor: 1,
  },
})

Network Stubbing for External APIs

typescript

// Eliminate external API flakiness
test.beforeEach(async ({page}) => {
  // Stub unstable third-party APIs
  await page.route('**/api.analytics.com/**', (route) => route.fulfill({body: ''}))
  await page.route('**/api.payment-provider.com/**', (route) =>
    route.fulfill({json: {status: 'ok'}}),
  )
})

// Test-specific stub
test('checkout with payment', async ({page}) => {
  await page.route('**/api/payment', (route) =>
    route.fulfill({json: {success: true, transactionId: 'test-123'}}),
  )
  // Test proceeds with deterministic response
})

Quarantine and Management

Quarantine Pattern

typescript

// playwright.config.ts - Separate flaky tests
export default defineConfig({
  projects: [
    {
      name: 'stable',
      testIgnore: ['**/*.flaky.spec.ts'],
    },
    {
      name: 'quarantine',
      testMatch: ['**/*.flaky.spec.ts'],
      retries: 3,
    },
  ],
})

Annotation-Based Quarantine

typescript

// Mark flaky tests with annotations
test('intermittent checkout issue', async ({page}, testInfo) => {
  testInfo.annotations.push({
    type: 'flaky',
    description: 'Investigating payment API timing - JIRA-1234',
  })

  // Test implementation
})

// Skip flaky test conditionally
test('known CI flaky', async ({page}) => {
  test.skip(!!process.env.CI, 'Flaky in CI - investigating JIRA-5678')
  // Test implementation
})

Prevention Strategies

Test Burn-In

bash

# Run new tests many times before merging
npx playwright test tests/new-feature.spec.ts --repeat-each=50

# Run in parallel to expose race conditions
npx playwright test tests/new-feature.spec.ts --repeat-each=20 --workers=4

Isolation Checklist

typescript

// ✅ Each test should be self-contained
test.describe('User profile', () => {
  test('can update name', async ({page, testUser}) => {
    // Uses unique testUser fixture
    // No dependency on other tests
    // Cleanup handled by fixture
  })

  test('can update email', async ({page, testUser}) => {
    // Independent of "can update name"
    // Own testUser, own state
  })
})

Defensive Assertions

typescript

// ❌ BAD: Single point of failure
await expect(page.locator('.items')).toHaveCount(5)

// ✅ GOOD: Progressive assertions that help diagnose
await expect(page.locator('.items-container')).toBeVisible()
await expect(page.locator('.loading')).not.toBeVisible()
await expect(page.locator('.items')).toHaveCount(5)

Retry Budget

typescript

// playwright.config.ts - Limit retries to avoid masking issues
export default defineConfig({
  retries: process.env.CI ? 2 : 0, // Only retry in CI
  expect: {
    timeout: 10000, // Reasonable assertion timeout
  },
  timeout: 60000, // Test timeout
})

Anti-Patterns to Avoid

Anti-Pattern	Problem	Solution
`waitForTimeout()` as primary wait	Arbitrary, hides real timing issues	Use auto-waiting assertions
Increasing global timeout to "fix" flakes	Masks root cause, slows all tests	Find and fix actual timing issue
Retrying until pass	Hides systemic problems	Fix root cause, use retries for diagnosis only
Shared test data across workers	Race conditions, collisions	Isolate data per worker
Testing real external APIs	Network variability	Mock external dependencies
Module-level mutable state	Leaks between tests	Use fixtures with proper cleanup
Ignoring flaky tests	Problem compounds over time	Quarantine and track for fixing

Debugging: See debugging.md for trace viewer and inspector
Fixtures: See fixtures-hooks.md for worker-scoped isolation
Performance: See performance.md for parallel execution patterns
Assertions: See assertions-waiting.md for auto-waiting patterns
Global Setup: See global-setup.md for setup vs fixtures decision

Debugging and Managing Flaky Tests

Debugging and Managing Flaky Tests

Table of Contents

Understanding Flakiness Types

Categories of Flakiness

Flakiness Decision Tree

Detection and Reproduction

Confirming Flakiness

Reproduction Strategies

Identify Flaky Tests Programmatically

Root Cause Analysis

Event Logging for Race Conditions

Network Timing Analysis

Trace Analysis

Fixing Strategies by Type

UI-Driven Flakiness

Async/Timing Flakiness

Data/Parallelism-Driven Flakiness

Test-Suite-Driven Flakiness (State Leaks)

CI-Specific Flakiness

Why Tests Fail Only in CI

Simulating CI Locally

Consistent Viewport and Scale

Network Stubbing for External APIs

Quarantine and Management

Quarantine Pattern

Annotation-Based Quarantine

Prevention Strategies

Test Burn-In

Isolation Checklist

Defensive Assertions

Retry Budget

Anti-Patterns to Avoid

Related References