Manual app performance suite

The app's high-volume performance diagnostics live under packages/app/e2e/performance and are excluded from normal local and CI Playwright discovery. The benchmark config builds the app and serves the production bundle before running scenarios serially.

Run the suite explicitly from packages/app:

bun run test:bench

PowerShell:

powershell

$env:PLAYWRIGHT_WORKERS = "1"
bun run test:bench

The suite contains:

cold and hot session-tab timing
cached session repaint and mutation tracing
streaming timeline throughput, RAF-gap, long-task, geometry, and remount diagnostics

All benchmarks import the shared benchmark fixture. Pages created through Playwright's page fixture automatically capture main-frame navigation history and emit a Chrome trace when OPENCODE_PERFORMANCE_TRACE_DIR is set. Benchmarks that need isolated browser contexts use withBenchmarkPage, which owns the context and the same diagnostics lifecycle.

New benchmarks should look like normal Playwright tests:

import { benchmark, expect } from "../benchmark"

benchmark("measures one interaction", async ({ page, report }) => {
  // Only scenario-specific setup and interaction belong here.
  report({ durationMs: 42 })
})

The fixture requires every benchmark to call report(), automatically names and closes traces, captures navigation history, attaches that history when a test fails, and emits metrics as a consistent BENCHMARK JSON line.

text

BENCHMARK {"name":"...","context":{"project":"chromium","platform":"darwin"},"metrics":{...}}

Every observed page also emits BENCHMARK_PAGE with the same run ID, navigation history, and optional trace path before the final status-bearing BENCHMARK record. Chrome traces are browser-wide page-lifetime diagnostics; scenario metrics use narrower explicitly named observation windows.

This follows the stack's own guidance: Electron recommends repeated Chrome DevTools and Chrome Tracing measurement, Chrome DevTools recommends Performance recordings for runtime work, and Playwright uses traces for test debugging rather than renderer profiling.

These Playwright benchmarks profile the shared app renderer in Chromium. A future packaged Electron benchmark that needs main-process and multi-process attribution should use Electron's official contentTracing API rather than extending this renderer harness with bespoke process instrumentation.

CPU and high-volume visual profiling are disabled by default. Set TIMELINE_CPU_PROFILE=1 to enable both, or additionally set TIMELINE_VISUAL_PROFILE=0 for CPU-only profiling.

The streaming scenario's 30x CPU throttle is a deterministic stress profile, not a simulated end-user device.

Benchmarks do not assert machine-dependent performance budgets. Streaming processes 160 deltas by default and reports renderer-observed completion time, throughput, RAF callback-gap distributions, frame-budget equivalents, and long tasks through final geometry settlement. Delta count and delivery batch are included in result context when overridden. These are main-thread callback diagnostics, not compositor presentation or dropped-frame measurements. Visual-only and geometry metrics are null when their probes are disabled. Tab metrics describe sampled DOM observations. Assertions verify scenario and metric collection completion. Repeated repaint states are run-length grouped, but every original observation timestamp is retained alongside raw mutation batches and layout shifts.

Committed smoke and regression tests continue to own correctness coverage for pagination, tab paint, context resize, collapse state, and composer spacing.

Chrome traces

Set OPENCODE_PERFORMANCE_TRACE_DIR to emit a standard Chrome DevTools trace for every benchmark page automatically:

OPENCODE_PERFORMANCE_TRACE_DIR=/tmp/opencode-performance-traces \
bunx playwright test --config e2e/performance/playwright.config.ts \
  timeline/session-tab-switch-benchmark.spec.ts

The emitted JSON is a standard Chrome trace and can be loaded directly into the Chrome DevTools Performance panel. devtools-tracing can optionally inspect it from the command line without adding package scripts or dependencies:

Trace capture mirrors Puppeteer's official tracing defaults and lifecycle, using Chrome's ReturnAsStream transfer mode and failing when Chromium reports trace data loss.

bunx devtools-tracing stats <trace-path-from-BENCHMARK_PAGE>

INP analysis requires a trace with a supported navigation/interaction insight. Selector statistics require a trace captured with OPENCODE_PERFORMANCE_SELECTOR_TRACE=1.

e2e/performance/playwright.uncapped.config.ts disables Chromium frame-rate limiting for explicit uncapped diagnostics. Native product benchmarks should use the default Playwright configuration.