rules/e2e-testing.md
Use E2E testing when you need to test a complete user flow for a feature.
If you would need to mock a lot of things to unit test a feature, prefer to write an E2E test instead.
Do NOT write lots of e2e test cases for one feature. Each e2e test case adds a significant amount of overhead, so instead prefer just one or two E2E test cases that each have broad coverage of the feature in question.
IMPORTANT: You MUST run npm run build before running E2E tests. E2E tests run against the built application binary, not the source code. If you make any changes to application code (anything outside of e2e-tests/), you MUST re-run npm run build before running E2E tests, otherwise you'll be testing the old version of the application.
npm run build
To run e2e tests without opening the HTML report (which blocks the terminal), use:
PLAYWRIGHT_HTML_OPEN=never npm run e2e
To get additional debug logs when a test is failing, use:
DEBUG=pw:browser PLAYWRIGHT_HTML_OPEN=never npm run e2e
The PageObject (aliased as po in tests) delegates most methods to sub-component page objects. Don't call methods directly on po unless they are explicitly defined on PageObject itself:
// Wrong: methods don't exist on po directly
await po.getTitleBarAppNameButton().click();
await po.getCurrentAppPath();
await po.goToChatTab();
// Correct: use the appropriate sub-component
await po.appManagement.getTitleBarAppNameButton().click();
await po.appManagement.getCurrentAppPath();
await po.navigation.goToChatTab();
Key sub-components: po.appManagement, po.navigation, po.chatActions, po.previewPanel, po.codeEditor, po.githubConnector, po.toastNotifications, po.settings, po.securityReview, po.modelPicker.
Base UI Radio components render a hidden native <input type="radio"> with aria-hidden="true". Both getByRole('radio', { name: '...' }) and getByLabel('...') find this hidden input but can't click it (element is outside viewport). Use getByText to click the visible label text instead.
// Correct: click the visible label text
await page.getByText("Vue", { exact: true }).click();
// Won't work: finds hidden input, can't click
await page.getByRole("radio", { name: "Vue" }).click();
await page.getByLabel("Vue").click();
The chat input uses a Lexical editor (contenteditable). Standard Playwright methods don't always work:
fill("") doesn't reliably clear Lexical. Use keyboard shortcuts instead: Meta+a then Backspace.toPass() with retries for resilient tests.po.clearChatInput() and po.openChatHistoryMenu() from test_helper.ts for reliable Lexical interactions.// Wrong: may not clear Lexical editor
await chatInput.fill("");
// Correct: use helper with retry logic
await po.clearChatInput();
// For history menu (needs clear + ArrowUp with retries)
await po.openChatHistoryMenu();
NEVER update snapshot files (e.g. .txt, .yml) by hand. Always use --update-snapshots to regenerate them.
Snapshots must be deterministic and platform-agnostic. They must not contain:
/tmp/..., /var/folders/...)If the output under test contains non-deterministic or platform-specific content, add sanitization logic in the test helper (e.g. in test_helper.ts) to normalize it before snapshotting.
The Pro mode build settings (Web Access, Turbo Edits, Smart Context) are inside a collapsed <Accordion> in ProModeSelector. E2E test helpers must expand the accordion before interacting with elements inside it. The ProModesDialog class in e2e-tests/helpers/page-objects/dialogs/ProModesDialog.ts has an expandBuildModeSettings() method that handles this — call it before clicking any build mode setting buttons.
Each parallel Playwright worker gets its own fake LLM server on port FAKE_LLM_BASE_PORT + parallelIndex. The base port constant lives in e2e-tests/helpers/test-ports.ts (not in playwright.config.ts) to avoid importing the Playwright config from test code.
When adding new test server URLs, update both the test fixtures (e2e-tests/helpers/fixtures.ts) and the Electron app source that consumes them. The app reads process.env.FAKE_LLM_PORT to build its TEST_SERVER_BASE URL — if you hardcode a port in app source, parallel workers will all hit the same server.
For app features that fetch api.dyad.sh directly, add a test-only env override in app code and point it at the worker-specific fake server during E2E. Without that override, E2E tests cannot deterministically exercise both the remote-success and local-fallback paths.
If an E2E CI shard fails before Playwright starts with [ERR_PNPM_IGNORED_BUILDS] during cd scaffold && pnpm install or cd nextjs-template && pnpm install, check the workflow pnpm version first. pnpm@latest can change build-script policy between major versions; pin the workflow pnpm version or explicitly update the build-script policy instead of debugging test code.
Packaged Electron E2E runs may fail inside the Codex sandbox before any test logic executes, with Playwright reporting electron.launch: Process failed to launch! and the Electron process exiting with SIGABRT.
The same sandbox issue can appear earlier as a Playwright config.webServer startup failure, for example Error: listen EPERM: operation not permitted 0.0.0.0:3500 from the fake LLM server. Re-run the same E2E command outside the sandbox before treating it as a product regression.
If this happens:
npm run e2e -- e2e-tests/<spec> command outside the sandbox before treating it as an app regression.If npm run build fails while rebuilding native modules with ImportError from Homebrew Python 3.14's pyexpat (for example Symbol not found: _XML_SetAllocTrackerActivationThreshold), rerun the build with the system Python: PYTHON=/usr/bin/python3 npm run build.
po.importApp(...): Some imports trigger an initial assistant turn (for example minimal generating AI_RULES.md) that can leave a visible Retry button in the chat. If the test is about a later prompt, first wait for that import-time turn to finish, then start a new chat before calling sendPrompt(), or helper methods that wait on Retry visibility may return too early.Add for manual, auto-include, or exclude paths, wait for the new row text to appear before adding or removing another path. Likewise, after clicking a remove button, wait for the row count to drop before the next click. Chained clicks can race React state updates and only fail on later --repeat-each runs.page.reload(): Always add await page.waitForLoadState("domcontentloaded") before interacting with elements. Without this, the page may not have re-rendered yet.await page.waitForTimeout(100) between sequential keyboard presses to let the UI state settle. Rapid keypresses can cause race conditions in menu navigation.await expect(link).toBeVisible({ timeout: Timeout.EXTRA_LONG }) before clicking tab links (especially in goToAppsTab()). Electron sidebar links can take time to render during app initialization.PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<spec> --repeat-each=10 to reproduce flaky tests. PLAYWRIGHT_RETRIES=0 is critical — CI defaults to 2 retries, hiding flakiness.expect(...).toPass() wrappers: Give inner Playwright actions/assertions short explicit timeouts. Default 30s click/expect timeouts can consume the whole toPass() budget, so the retry wrapper never actually retries.sendPrompt() flows, check for the prompt in messages-list or an empty input with Cancel generation; otherwise the retry can race into an active stream/proposal and leave the next prompt disabled.process.env, so tests that set fake provider keys (for example OPENAI_API_KEY) can affect later tests in the same worker. When a fixture intentionally shows the setup screen, explicitly clear any env key that would make the provider appear configured.#model-id and #model-name values before clicking Add Model, and wrap the fill/click in expect.toPass(). Fast repeats can otherwise leave the name field empty or append text to the wrong field.Testollama testollama or lmstudio-model-1 lmstudio-model-1). Exact test locators should account for this duplicated label/id shape.expect.poll(() => po.settings.recordSettings().someKey) before sending the prompt. UI clicks can return before the main-process settings write is visible to the request path.po.settings.recordSettings().customAppsFolder to match the expected value before creating, importing, or reopening apps. The folder picker IPC can return before later app-creation paths observe the persisted setting.--repeat-each.getByTestId("github-connected-repo"); the same text can also appear in a toast, causing Playwright strict-mode failures..dyad/screenshot. If native-git banner tests still show one uncommitted change after a successful commit, inspect whether Dyad-managed .dyad/ files are being excluded from Git status before blaming query invalidation.example.com/*.png) trigger dyad-image-load-error, remove the pending image change, and make "component modified" assertions time out.get-cloud-sandbox-status and wait for syncRevision to advance before reading the iframe digest; the first full sync can be revision 1 even when it already includes the prompt change.spawn vs PTY execution), proactively run npm run e2e e2e-tests/socket_firewall.spec.ts after npm run build. Unit tests and package builds do not cover the real packaged-Electron Socket Firewall flow.sfw binary in E2E, set fresh per-test npm_config_cache, npm_config_store_dir, and pnpm_config_store_dir in the launch hooks. Reused caches/stores can make Socket Firewall report that it did not detect package fetches, which turns blocked-package tests into false negatives.axois over lodahs. lodahs can resolve to 0.0.1-security and install successfully under pnpm, so it does not reliably reach the blocked-package UI.When clicking a button that triggers an async operation and changes its text/state (e.g., "Run Security Review" → "Running Security Review..."), wait for the loading state to appear and disappear rather than just waiting for the original button to be hidden:
// Wrong: waiting for original button to be hidden may race
const button = page.getByRole("button", { name: "Run Security Review" });
await button.click();
await button.waitFor({ state: "hidden" }); // Unreliable
// Correct: wait for loading state to appear then disappear
const button = page.getByRole("button", { name: "Run Security Review" });
await button.click();
const loadingButton = page.getByRole("button", {
name: "Running Security Review...",
});
await loadingButton.waitFor({ state: "visible" });
await loadingButton.waitFor({ state: "hidden" });
This pattern provides a more reliable signal that the async operation has completed, because:
For streamed progress indicators that may complete quickly, allow the assertion to match either the transient in-progress text or the final completed text, then assert the final state after the operation completes.
When adding E2E test fixtures that need a .dyad directory for testing:
.dyad directory is git-ignored by default in test fixturesgit add -f path/to/.dyad/file to force-add files inside .dyad directoriesmkdir is blocked on .dyad paths due to security restrictions, use the Write tool to create files directly (which auto-creates parent directories)