.agents/skills/playwright-best-practices/debugging/flaky-tests.md
Most flaky tests fall into distinct categories requiring different remediation:
| Category | Symptoms | Common Causes |
|---|---|---|
| UI-driven | Element not found, click missed | Missing waits, animations, dynamic rendering |
| Environment-driven | CI-only failures | Slower CPU, memory limits, cold browser starts |
| Data/parallelism-driven | Fails with multiple workers | Shared backend data, reused accounts, state collisions |
| Test-suite-driven | Fails when run with other tests | Leaked state, shared fixtures, order dependencies |
Test fails intermittently
├─ Fails locally too?
│ ├─ YES → Timing/async issue → Check waits and assertions
│ └─ NO → CI-specific → Check environment differences
│
├─ Fails only with multiple workers?
│ └─ YES → Parallelism issue → Check data isolation
│
├─ Fails only when run after specific tests?
│ └─ YES → State leak → Check fixtures and cleanup
│
└─ Fails randomly regardless of conditions?
└─ External dependency → Check network/API stability
# Run test multiple times to confirm instability
npx playwright test tests/checkout.spec.ts --repeat-each=20
# Run with single worker to isolate parallelism issues
npx playwright test --workers=1
# Run in CI-like conditions locally
CI=true npx playwright test --repeat-each=10
// playwright.config.ts - Enable artifacts for flaky test investigation
export default defineConfig({
retries: process.env.CI ? 2 : 0,
use: {
trace: 'on-first-retry', // Capture trace on retry
video: 'retain-on-failure',
screenshot: 'only-on-failure',
},
})
// Track test results across runs
test.afterEach(async ({}, testInfo) => {
if (testInfo.retry > 0 && testInfo.status === 'passed') {
console.warn(`FLAKY: ${testInfo.title} passed on retry ${testInfo.retry}`)
// Log to your tracking system
}
})
Add comprehensive event logging to expose timing issues:
test.beforeEach(async ({page}) => {
page.on('console', (msg) => console.log(`CONSOLE [${msg.type()}]:`, msg.text()))
page.on('pageerror', (err) => console.error('PAGE ERROR:', err.message))
page.on('requestfailed', (req) => console.error(`REQUEST FAILED: ${req.url()}`))
})
For comprehensive console error handling (fail on errors, allowed patterns, fixtures), see console-errors.md.
// Capture slow or failed requests
test.beforeEach(async ({page}) => {
const slowRequests: string[] = []
page.on('requestfinished', (request) => {
const timing = request.timing()
const duration = timing.responseEnd - timing.requestStart
if (duration > 2000) {
slowRequests.push(`${request.url()} took ${duration}ms`)
}
})
page.on('requestfailed', (request) => {
console.error(`Failed: ${request.url()} - ${request.failure()?.errorText}`)
})
})
# View trace from failed CI run
npx playwright show-trace path/to/trace.zip
# Generate trace for specific test
npx playwright test tests/flaky.spec.ts --trace on
Problem: Element not ready when action executes
// ❌ BAD: No wait for element state
await page.click('#submit')
await page.fill('#username', 'test') // Element may not be ready
// ✅ GOOD: Actions + assertions pattern (auto-waiting built-in)
await page.getByRole('button', {name: 'Submit'}).click()
await expect(page.getByRole('heading', {name: 'Dashboard'})).toBeVisible()
Problem: Animations or transitions interfere
// ❌ BAD: Click during animation
await page.click('.menu-item')
// ✅ GOOD: Wait for animation to complete
await page.getByRole('menuitem', {name: 'Settings'}).click()
await expect(page.getByRole('dialog')).toBeVisible()
// Or disable animations entirely
await page.emulateMedia({reducedMotion: 'reduce'})
Problem: Brittle selectors
// ❌ BAD: Fragile CSS chain
await page.click('div.container > div:nth-child(2) > button.btn-primary')
// ✅ GOOD: Semantic selectors
await page.getByRole('button', {name: 'Continue'}).click()
await page.getByTestId('checkout-button').click()
await page.getByLabel('Email address').fill('[email protected]')
Problem: Race between test and application
// ❌ BAD: Arbitrary sleep
await page.click('#load-data')
await page.waitForTimeout(3000) // Hope data loads in 3s
// ✅ GOOD: Wait for specific condition
await page.click('#load-data')
await expect(page.locator('.data-row')).toHaveCount(10, {timeout: 10000})
// ✅ BETTER: Wait for network response, then assert
const responsePromise = page.waitForResponse(
(r) => r.url().includes('/api/data') && r.request().method() === 'GET' && r.ok(),
)
await page.click('#load-data')
await responsePromise
await expect(page.locator('.data-row')).toHaveCount(10)
For comprehensive waiting strategies (navigation, element state, network, polling with
toPass()), see assertions-waiting.md.
Problem: Complex async state
// Custom wait for application-specific conditions
await page.waitForFunction(() => {
const app = (window as any).__APP_STATE__
return app?.isReady && !app?.isLoading
})
// Wait for multiple conditions
await Promise.all([
page.waitForResponse('**/api/user'),
page.waitForResponse('**/api/settings'),
page.getByRole('button', {name: 'Load'}).click(),
])
Problem: Tests share backend data
// ❌ BAD: All workers use same user
const testUser = {email: '[email protected]', password: 'pass123'}
// ✅ GOOD: Unique data per worker
import {test as base} from '@playwright/test'
export const test = base.extend<{}, {testUser: {email: string; id: string}}>({
testUser: [
async ({}, use, workerInfo) => {
const email = `test-${workerInfo.workerIndex}-${Date.now()}@example.com`
const user = await createTestUser(email)
await use(user)
await deleteTestUser(user.id)
},
{scope: 'worker'},
],
})
Problem: Shared storageState across workers
// ❌ BAD: All workers share same auth state
use: {
storageState: '.auth/user.json',
}
// ✅ GOOD: Per-worker auth state
export const test = base.extend<{}, { workerStorageState: string }>({
workerStorageState: [
async ({ browser }, use, workerInfo) => {
const id = workerInfo.workerIndex;
const fileName = `.auth/user-${id}.json`;
if (!fs.existsSync(fileName)) {
const page = await browser.newPage({ storageState: undefined });
await authenticateUser(page, `worker${id}@test.com`);
await page.context().storageState({ path: fileName });
await page.close();
}
await use(fileName);
},
{ scope: "worker" },
],
});
Problem: Tests affect each other
// ❌ BAD: Module-level state persists across tests
let sharedPage: Page
test.beforeAll(async ({browser}) => {
sharedPage = await browser.newPage() // Shared across tests!
})
// ✅ GOOD: Use Playwright's default isolation (fresh context per test)
test('first test', async ({page}) => {
// Fresh page for this test
})
test('second test', async ({page}) => {
// Fresh page for this test
})
Problem: Fixture cleanup not happening
// ✅ GOOD: Proper fixture with cleanup
export const test = base.extend<{tempFile: string}>({
tempFile: async ({}, use) => {
const file = `/tmp/test-${Date.now()}.json`
fs.writeFileSync(file, '{}')
await use(file)
// Cleanup always runs, even on failure
if (fs.existsSync(file)) {
fs.unlinkSync(file)
}
},
})
| CI Condition | Impact | Solution |
|---|---|---|
| Slower CPU | Actions complete later than expected | Use auto-waiting, not timeouts |
| Cold browser start | No cached assets, slower initial load | Add explicit waits for first navigation |
| Headless mode | Different rendering behavior | Test locally in headless mode |
| Shared runners | Resource contention | Reduce parallelism or use dedicated runners |
| Network latency | API calls slower | Mock external APIs, increase timeouts for real calls |
# Run headless with CI environment variable
CI=true npx playwright test
# Limit CPU (Linux/Mac)
cpulimit -l 50 -- npx playwright test
# Run in Docker matching CI environment
docker run -it --rm \
-v $(pwd):/work \
-w /work \
mcr.microsoft.com/playwright:v1.40.0-jammy \
npx playwright test
// playwright.config.ts - Match CI rendering exactly
export default defineConfig({
use: {
viewport: {width: 1280, height: 720},
deviceScaleFactor: 1,
},
})
// Eliminate external API flakiness
test.beforeEach(async ({page}) => {
// Stub unstable third-party APIs
await page.route('**/api.analytics.com/**', (route) => route.fulfill({body: ''}))
await page.route('**/api.payment-provider.com/**', (route) =>
route.fulfill({json: {status: 'ok'}}),
)
})
// Test-specific stub
test('checkout with payment', async ({page}) => {
await page.route('**/api/payment', (route) =>
route.fulfill({json: {success: true, transactionId: 'test-123'}}),
)
// Test proceeds with deterministic response
})
// playwright.config.ts - Separate flaky tests
export default defineConfig({
projects: [
{
name: 'stable',
testIgnore: ['**/*.flaky.spec.ts'],
},
{
name: 'quarantine',
testMatch: ['**/*.flaky.spec.ts'],
retries: 3,
},
],
})
// Mark flaky tests with annotations
test('intermittent checkout issue', async ({page}, testInfo) => {
testInfo.annotations.push({
type: 'flaky',
description: 'Investigating payment API timing - JIRA-1234',
})
// Test implementation
})
// Skip flaky test conditionally
test('known CI flaky', async ({page}) => {
test.skip(!!process.env.CI, 'Flaky in CI - investigating JIRA-5678')
// Test implementation
})
# Run new tests many times before merging
npx playwright test tests/new-feature.spec.ts --repeat-each=50
# Run in parallel to expose race conditions
npx playwright test tests/new-feature.spec.ts --repeat-each=20 --workers=4
// ✅ Each test should be self-contained
test.describe('User profile', () => {
test('can update name', async ({page, testUser}) => {
// Uses unique testUser fixture
// No dependency on other tests
// Cleanup handled by fixture
})
test('can update email', async ({page, testUser}) => {
// Independent of "can update name"
// Own testUser, own state
})
})
// ❌ BAD: Single point of failure
await expect(page.locator('.items')).toHaveCount(5)
// ✅ GOOD: Progressive assertions that help diagnose
await expect(page.locator('.items-container')).toBeVisible()
await expect(page.locator('.loading')).not.toBeVisible()
await expect(page.locator('.items')).toHaveCount(5)
// playwright.config.ts - Limit retries to avoid masking issues
export default defineConfig({
retries: process.env.CI ? 2 : 0, // Only retry in CI
expect: {
timeout: 10000, // Reasonable assertion timeout
},
timeout: 60000, // Test timeout
})
| Anti-Pattern | Problem | Solution |
|---|---|---|
waitForTimeout() as primary wait | Arbitrary, hides real timing issues | Use auto-waiting assertions |
| Increasing global timeout to "fix" flakes | Masks root cause, slows all tests | Find and fix actual timing issue |
| Retrying until pass | Hides systemic problems | Fix root cause, use retries for diagnosis only |
| Shared test data across workers | Race conditions, collisions | Isolate data per worker |
| Testing real external APIs | Network variability | Mock external dependencies |
| Module-level mutable state | Leaks between tests | Use fixtures with proper cleanup |
| Ignoring flaky tests | Problem compounds over time | Quarantine and track for fixing |