.agents/skills/writing-e2e-tests/SKILL.md
This skill is how we add an end-to-end test to the Opik E2E suite. You give it a feature, page, or branch; it runs a proven loop end-to-end and leaves you with a working, locally-verified Playwright test.
Announce at start: "I'm using the writing-e2e-tests skill to add an E2E test for X."
The suite is at tests_end_to_end/e2e/. Inside it:
tests/<feature>/<name>.spec.ts — one feature directory per page family (datasets, trace-explore, experiments, test-suites, online-evaluation, …).pom/<name>.page.ts — one class per page, methods for the interactions a test needs.fixtures/<name>.fixture.ts — seed entities (project, dataset, trace, experiment, testSuite) and tear them down. Composed in a chain; re-exported from fixtures/index.ts.core/sdk/ — sdkClient.python (HTTP wrapper over the bridge) and sdkClient.typescript (direct new Opik({...})) for seeding. core/backend/ holds the typed REST client for inspection + teardown.services/opik-sdk-driver/ — a FastAPI app (run with uv) wrapping the Python SDK, exposing routes the TS clients call. Playwright's webServer directive auto-spawns it during a test run; you don't start it by hand.Specs and POMs import through path aliases: import { test, expect } from '@e2e/fixtures' and import { LogsPage } from '@e2e/pom/logs.page'.
Playwright MCP (live-UI exploration) and the playwright-test MCP (browser_generate_locator) are already configured in the repo's .mcp.json. No setup step.tests_end_to_end/e2e/. The webServer directive spawns the bridge automatically.Read conventions.md before writing any POM or spec. It carries the rules that keep tests legible and stable: mandatory test.step() wrapping, UI-first assertions, selector preference, public-SDK-only seeding, fixture seed shapes, and the tag taxonomy. They aren't optional polish — each prevents a class of failure.
digraph writing_e2e {
rankdir=TB;
"1. Scope (GATE)" [shape=box];
"2. Analyze feature + FE code" [shape=box];
"3. Discover live UI (GATE)" [shape=box];
"4. Write POM + spec" [shape=box];
"5. Run until green" [shape=box];
"Green?" [shape=diamond];
"1. Scope (GATE)" -> "2. Analyze feature + FE code";
"2. Analyze feature + FE code" -> "3. Discover live UI (GATE)";
"3. Discover live UI (GATE)" -> "4. Write POM + spec";
"4. Write POM + spec" -> "5. Run until green";
"5. Run until green" -> "Green?";
"Green?" -> "4. Write POM + spec" [label="no — fix"];
"Green?" -> "done" [label="yes"];
}
Work out, from the request:
http://localhost:5173 (OPIK_DEPLOYMENT=oss, workspace default) — the natural target for "test the feature I just built." Only use another target if the dev asks.@t1-smoke / @t2-cuj / @t3-nightly) and a feature tag, per conventions.md.Run the safety check (below) before any seeding. Then confirm the scope in one short message — feature, page, target, tags — and proceed. Don't write a formal spec document.
Before touching the browser:
apps/opik-frontend/src/v2/pages/<Page>/ — the route it renders at, the components it composes, and any data-testid attributes already present. The route shape is what your POM's goto() will use.fixtures/ for an existing fixture that already seeds the shape you need; reuse it before writing a new one.Invoke the playwright-pom-discovery skill (via the Skill tool). It walks the live page with the Playwright MCP: seed state, navigate authed, snapshot the accessibility tree, enumerate data-testids, pick the most stable selector for each element you'll target, and flag any element that has no stable selector (needs a FE data-testid added in this change).
When discovery is done, report a short summary — the selectors you'll use per element, and any missing testids you'll add — and confirm before writing code. Don't write anything under pom/ before this step.
pom/<name>.page.ts using the selectors from discovery. Each method wraps its body in test.step() and returns through the callback (see conventions.md).tests/<feature>/<name>.spec.ts: tier + feature tag on the describe block, coarse test.step() phases, UI-first assertions.data-testid to the FE component in the same change.From tests_end_to_end/e2e/:
npx playwright test tests/<feature>/<name>.spec.ts --reporter=list
The bridge auto-spawns (you'll see its startup line in the output). If a test fails, read the failure trace (npx playwright show-trace) rather than adjusting selectors blindly — see "verify the test render before blaming the backend" in conventions.md. Fix and re-run until green. Report the actual run output.
The Python SDK behind the bridge reads ~/.opik.config. If it points at a cloud environment, seeding would create real data there. Before any seed against a local target:
cat ~/.opik.config
If url_override is anything other than http://localhost:5173/api, back it up and point it local:
cp ~/.opik.config ~/.opik.config.bak 2>/dev/null || true
cat > ~/.opik.config << 'EOF'
[opik]
url_override = http://localhost:5173/api
workspace = default
EOF
When the work is done, remind the dev to restore: cp ~/.opik.config.bak ~/.opik.config. If it already points local, skip this.
| Symptom | What you skipped |
|---|---|
| "Let me read the FE source to find the selector" | Discovery — snapshot the rendered DOM. What renders is the only source of truth for selectors. |
| "I'll explore the empty page and figure out the rows later" | Seeding — an empty-state-only POM never exercises the row template or open-detail actions. |
| "I'll write the POM and find out if it works when the whole suite runs" | Run-until-green in isolation — iterate on the one spec, don't debug it inside a full suite run. |
"page.locator('tbody tr:nth-child(3)') is fine" | Flagging the missing testid — brittle structural selectors are the top source of flake; add a data-testid. |
| "I'll create the dataset through the UI so the page has data" | SDK/bridge seeding — UI-create is what the test exercises, not how you set up. |