rules/e2e-testing.md
Use E2E testing when you need to test a complete user flow for a feature.
If you would need to mock a lot of things to unit test a feature, prefer to write an E2E test instead.
Do NOT write lots of e2e test cases for one feature. Each e2e test case adds a significant amount of overhead, so instead prefer just one or two E2E test cases that each have broad coverage of the feature in question.
IMPORTANT: You MUST run npm run build before running E2E tests. E2E tests run against the built application binary, not the source code. If you make any changes to application code (anything outside of e2e-tests/), you MUST re-run npm run build before running E2E tests, otherwise you'll be testing the old version of the application.
npm run build
To run e2e tests without opening the HTML report (which blocks the terminal), use:
PLAYWRIGHT_HTML_OPEN=never npm run e2e
To get additional debug logs when a test is failing, use:
DEBUG=pw:browser PLAYWRIGHT_HTML_OPEN=never npm run e2e
The PageObject (aliased as po in tests) delegates most methods to sub-component page objects. Don't call methods directly on po unless they are explicitly defined on PageObject itself:
// Wrong: methods don't exist on po directly
await po.getTitleBarAppNameButton().click();
await po.getCurrentAppPath();
await po.goToChatTab();
// Correct: use the appropriate sub-component
await po.appManagement.getTitleBarAppNameButton().click();
await po.appManagement.getCurrentAppPath();
await po.navigation.goToChatTab();
Key sub-components: po.appManagement, po.navigation, po.chatActions, po.previewPanel, po.codeEditor, po.githubConnector, po.toastNotifications, po.settings, po.securityReview, po.modelPicker.
Base UI Radio components render a hidden native <input type="radio"> with aria-hidden="true". Both getByRole('radio', { name: '...' }) and getByLabel('...') find this hidden input but can't click it (element is outside viewport). Use getByText to click the visible label text instead.
// Correct: click the visible label text
await page.getByText("Vue", { exact: true }).click();
// Won't work: finds hidden input, can't click
await page.getByRole("radio", { name: "Vue" }).click();
await page.getByLabel("Vue").click();
The chat input uses a Lexical editor (contenteditable). Standard Playwright methods don't always work:
fill("") doesn't reliably clear Lexical. Use keyboard shortcuts instead: Meta+a then Backspace.toPass() with retries for resilient tests.po.clearChatInput() and po.openChatHistoryMenu() from test_helper.ts for reliable Lexical interactions.// Wrong: may not clear Lexical editor
await chatInput.fill("");
// Correct: use helper with retry logic
await po.clearChatInput();
// For history menu (needs clear + ArrowUp with retries)
await po.openChatHistoryMenu();
NEVER update snapshot files (e.g. .txt, .yml) by hand. Always use --update-snapshots to regenerate them.
Snapshots must be deterministic and platform-agnostic. They must not contain:
/tmp/..., /var/folders/...)If the output under test contains non-deterministic or platform-specific content, add sanitization logic in the test helper (e.g. in test_helper.ts) to normalize it before snapshotting.
The Pro mode build settings (Web Access, Turbo Edits, Smart Context) are inside a collapsed <Accordion> in ProModeSelector. E2E test helpers must expand the accordion before interacting with elements inside it. The ProModesDialog class in e2e-tests/helpers/page-objects/dialogs/ProModesDialog.ts has an expandBuildModeSettings() method that handles this — call it before clicking any build mode setting buttons.
Each parallel Playwright worker gets its own fake LLM server on port FAKE_LLM_BASE_PORT + parallelIndex. The base port constant lives in e2e-tests/helpers/test-ports.ts (not in playwright.config.ts) to avoid importing the Playwright config from test code.
When adding new test server URLs, update both the test fixtures (e2e-tests/helpers/fixtures.ts) and the Electron app source that consumes them. The app reads process.env.FAKE_LLM_PORT to build its TEST_SERVER_BASE URL — if you hardcode a port in app source, parallel workers will all hit the same server.
For app features that fetch api.dyad.sh directly, add a test-only env override in app code and point it at the worker-specific fake server during E2E. Without that override, E2E tests cannot deterministically exercise both the remote-success and local-fallback paths.
Packaged Electron E2E runs may fail inside the Codex sandbox before any test logic executes, with Playwright reporting electron.launch: Process failed to launch! and the Electron process exiting with SIGABRT.
The same sandbox issue can appear earlier as a Playwright config.webServer startup failure, for example Error: listen EPERM: operation not permitted 0.0.0.0:3500 from the fake LLM server. Re-run the same E2E command outside the sandbox before treating it as a product regression.
If this happens:
npm run e2e -- e2e-tests/<spec> command outside the sandbox before treating it as an app regression.If npm run build fails while rebuilding native modules with ImportError from Homebrew Python 3.14's pyexpat (for example Symbol not found: _XML_SetAllocTrackerActivationThreshold), rerun the build with the system Python: PYTHON=/usr/bin/python3 npm run build.
po.importApp(...): Some imports trigger an initial assistant turn (for example minimal generating AI_RULES.md) that can leave a visible Retry button in the chat. If the test is about a later prompt, first wait for that import-time turn to finish, then start a new chat before calling sendPrompt(), or helper methods that wait on Retry visibility may return too early.Add for manual, auto-include, or exclude paths, wait for the new row text to appear before adding or removing another path. Likewise, after clicking a remove button, wait for the row count to drop before the next click. Chained clicks can race React state updates and only fail on later --repeat-each runs.page.reload(): Always add await page.waitForLoadState("domcontentloaded") before interacting with elements. Without this, the page may not have re-rendered yet.await page.waitForTimeout(100) between sequential keyboard presses to let the UI state settle. Rapid keypresses can cause race conditions in menu navigation.await expect(link).toBeVisible({ timeout: Timeout.EXTRA_LONG }) before clicking tab links (especially in goToAppsTab()). Electron sidebar links can take time to render during app initialization.PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<spec> --repeat-each=10 to reproduce flaky tests. PLAYWRIGHT_RETRIES=0 is critical — CI defaults to 2 retries, hiding flakiness.--repeat-each.spawn vs PTY execution), proactively run npm run e2e e2e-tests/socket_firewall.spec.ts after npm run build. Unit tests and package builds do not cover the real packaged-Electron Socket Firewall flow.sfw binary in E2E, set fresh per-test npm_config_cache, npm_config_store_dir, and pnpm_config_store_dir in the launch hooks. Reused caches/stores can make Socket Firewall report that it did not detect package fetches, which turns blocked-package tests into false negatives.axois over lodahs. lodahs can resolve to 0.0.1-security and install successfully under pnpm, so it does not reliably reach the blocked-package UI.When clicking a button that triggers an async operation and changes its text/state (e.g., "Run Security Review" → "Running Security Review..."), wait for the loading state to appear and disappear rather than just waiting for the original button to be hidden:
// Wrong: waiting for original button to be hidden may race
const button = page.getByRole("button", { name: "Run Security Review" });
await button.click();
await button.waitFor({ state: "hidden" }); // Unreliable
// Correct: wait for loading state to appear then disappear
const button = page.getByRole("button", { name: "Run Security Review" });
await button.click();
const loadingButton = page.getByRole("button", {
name: "Running Security Review...",
});
await loadingButton.waitFor({ state: "visible" });
await loadingButton.waitFor({ state: "hidden" });
This pattern provides a more reliable signal that the async operation has completed, because:
For streamed progress indicators that may complete quickly, allow the assertion to match either the transient in-progress text or the final completed text, then assert the final state after the operation completes.
When adding E2E test fixtures that need a .dyad directory for testing:
.dyad directory is git-ignored by default in test fixturesgit add -f path/to/.dyad/file to force-add files inside .dyad directoriesmkdir is blocked on .dyad paths due to security restrictions, use the Write tool to create files directly (which auto-creates parent directories)