.agents/skills/playwright-e2e/test-conventions.md
This document codifies all test writing standards for the Opik E2E test suite. AI agents generating or fixing tests MUST follow these conventions.
Always import test and expect from the appropriate fixture file, never from @playwright/test directly.
// CORRECT - import from fixture
import { test, expect } from '../../fixtures/projects.fixture';
// WRONG - never import directly from playwright
import { test, expect } from '@playwright/test';
Choose the fixture file based on the feature being tested:
fixtures/projects.fixturefixtures/datasets.fixturefixtures/tracing.fixturefixtures/feedback-experiments-prompts.fixturefixtures/base.fixtureimport { test, expect } from '../../fixtures/{feature}.fixture';
import { SomePage } from '../../page-objects/some.page';
test.describe('Feature Area Tests', () => {
test.describe('with SDK-created resources', () => {
test('Description of test @sanity @happypaths @fullregression @featuretag', async ({ page, helperClient, fixtureArg }) => {
// test body
});
});
test.describe('with UI-created resources', () => {
test('Description of test @happypaths @fullregression @featuretag', async ({ page, helperClient, fixtureArg }) => {
// test body
});
});
});
Test names MUST be descriptive AND include tags at the end. Tags control which test suites include the test.
// Format: 'Human-readable description @tag1 @tag2 @tag3'
test('Projects created via SDK are visible in both UI and SDK @sanity @happypaths @fullregression @projects', ...)
@sanity - Critical smoke tests (run first, should be fast)@happypaths - Happy path scenarios@fullregression - Full regression suite (every test should have this)@projects, @datasets, @experiments, @prompts, @playground, @tracing, @threads, @attachments, @feedbackscores, @onlinescores@fullregression and its feature tag@sanity and @happypaths@fullregression and the feature tagEvery test MUST have a description annotation as its first line:
test('Test name @tags', async ({ page }) => {
test.info().annotations.push({
type: 'description',
description: `Tests that [what is being tested].
Steps:
1. [First step]
2. [Second step]
3. [Third step]
This test ensures [why this matters].`
});
// actual test code follows
});
Use test.step() to group related actions into logical steps:
await test.step('Verify project is retrievable via SDK', async () => {
await helperClient.waitForProjectVisible(projectName, 10);
const projects = await helperClient.findProject(projectName);
expect(projects.length).toBeGreaterThan(0);
});
await test.step('Verify project is visible in UI', async () => {
const projectsPage = new ProjectsPage(page);
await projectsPage.goto();
await projectsPage.checkProjectExistsWithRetry(projectName, 5000);
});
ALWAYS use page objects for UI interactions. NEVER use raw page.click() or page.fill() directly in test files.
// CORRECT - use page object
const projectsPage = new ProjectsPage(page);
await projectsPage.goto();
await projectsPage.createNewProject(projectName);
// WRONG - raw playwright calls in test
await page.goto('/default/projects');
await page.getByRole('button', { name: 'Create new project' }).click();
If a page object method doesn't exist for what you need, consider adding it to the page object file rather than putting raw locators in the test.
Destructure fixtures in the test function arguments:
test('Test name', async ({ page, helperClient, createProjectApi }) => {
// createProjectApi is the project name (string)
// The fixture created the project before the test runs
// The fixture will clean it up after the test finishes
});
Key rules:
try/finally:let nameUpdated = false;
try {
await helperClient.updateProject(originalName, newName);
nameUpdated = true;
// ... verification ...
} finally {
if (nameUpdated) {
await helperClient.deleteProject(newName);
}
}
Use locators in this priority order:
getByRole() - semantic, resilient to implementation changesgetByTestId() - explicit test identifiersgetByText() - visible text contentgetByPlaceholder() - form inputsgetByLabel() - form labels// BEST - role-based
page.getByRole('button', { name: 'Create project' })
page.getByRole('cell', { name: projectName, exact: true })
// GOOD - test IDs
page.getByTestId('search-input')
// ACCEPTABLE - text/placeholder
page.getByText(projectName).first()
page.getByPlaceholder('Project name')
// AVOID - CSS selectors (fragile)
page.locator('.project-row .delete-btn')
data-testid to Frontend CodeWhen no reliable locator exists (no semantic role, no unique text, no existing test ID), you are allowed and encouraged to add data-testid attributes directly to the React frontend source code in apps/opik-frontend/src/.
When to add a data-testid:
How to add it:
apps/opik-frontend/src/data-testid="descriptive-name" to the elementdata-testid="dataset-items-table", data-testid="trace-detail-sidebar"page.getByTestId('dataset-items-table')Naming convention for test IDs:
{feature}-{element} pattern: project-name-input, dataset-delete-buttonbutton or containerUse Playwright's auto-waiting assertions:
// CORRECT - auto-waits with timeout
await expect(page.getByText(name)).toBeVisible({ timeout: 5000 });
await expect(page.getByText(name)).not.toBeVisible();
// WRONG - manual timeout
await page.waitForTimeout(3000);
expect(await page.getByText(name).isVisible()).toBe(true);
Use the helperClient wait methods:
await helperClient.waitForProjectVisible(projectName, 10); // retries
await helperClient.waitForDatasetVisible(datasetName, 10);
await helperClient.waitForTracesVisible(projectName, count, 30);
page.waitForLoadState('networkidle') - unreliable, deprecated patternpage.waitForTimeout(n) in assertions - use expect().toBeVisible({ timeout: n }) insteadpage.$() or page.$$() - legacy selectors, use locators insteadgenerateProjectName(), generateDatasetName() from helpers/random.tsprojectName and datasetName fixturestests/
├── projects/ # Project CRUD tests
├── datasets/ # Dataset CRUD and item tests
├── experiments/ # Experiment CRUD and item tests
├── prompts/ # Prompt CRUD and versioning tests
├── tracing/ # Trace, span, thread, attachment tests
├── feedback-scores/ # Feedback definition CRUD tests
├── playground/ # Playground interaction tests
├── online-scoring/ # Online scoring rule tests
└── seed-for-planner.spec.ts # AI agent seed test
Place new tests in the appropriate feature directory. Create a new directory if testing a new feature area.