docs/plans/2026-05-28-slate-browser-rich-text-replay-benchmark.md
Objective:
Add a browser-level rich-text replay coverage layer to the benchmarks/editor
Evidence Kit benchmark so Slate v2 and Slate are compared against the same
Chromium Playwright richtext, tables, inlines, and paste-html replay inventory.
Goal plan: docs/plans/2026-05-28-slate-browser-rich-text-replay-benchmark.md
Completion threshold:
The benchmark is complete when .tmp/slate-v2 can generate a row artifact from
the Slate v2 and Slate Playwright browser test corpus, benchmarks/editor
ingests it into the rich-text Evidence Kit result, rich-text.html exposes the
new rows without the old legacy-slate label, package checks pass, the served
route returns the regenerated data, and this plan passes the autogoal completion
check.
Verification surface:
/Users/zbeyens/git/plate-2/.tmp/slate-v2/scripts/benchmarks/browser/rich-text-replay-coverage.mjs/Users/zbeyens/git/plate-2/.tmp/slate-v2/tmp/slate-browser-rich-text-replay-coverage-benchmark.json/Users/zbeyens/git/plate-2/benchmarks/editor/src/index.mjs/Users/zbeyens/git/plate-2/benchmarks/editor/benchmarks/render-rich-text-viewer.mjshttp://127.0.0.1:8765/rich-text.htmlhttp://127.0.0.1:8765/rich-text-data.jsonConstraints:
Boundaries:
.tmp/slate-v2/playwright/integration/examples and
/Users/zbeyens/git/slate/playwright/integration/examples..tmp/slate-v2 benchmark scripts/package script,
benchmarks/editor ingestion, viewer generation, generated benchmark data,
generated perf docs, and this plan.rich-text.html and rich-text-data.json
served on port 8765.Blocked condition: Autonomous work would stop only if either local checkout could not list its Chromium Playwright tests, the Evidence Kit result could not ingest normalized rows, or the served route could not expose the regenerated data. None of those conditions occurred.
Major source:
.tmp/slate-v2 plus /Users/zbeyens/git/slateMajor lane:
.tmp/slate-v2 benchmark script and
benchmarks/editorPhase / pass table:
| Phase | Status | Evidence |
|---|---|---|
| Intake | complete | Existing Evidence Kit benchmark and Slate v2 browser test files inspected. |
| Artifact design | complete | Row artifact emits slate-v2:browser-replay and slate:browser-replay coverage rows. |
| Implementation | complete | Generator, package script, Evidence Kit ingestion, and viewer status mapping added. |
| Verification | complete | Generator, rich-text check, docs generation, docs check, package check, and served route smoke proof completed. |
| Closure | complete | This plan records evidence and passes check-complete. |
Start Gates:
| Gate | Applies | Evidence |
|---|---|---|
major-task loaded | yes | Used for heavyweight benchmark comparison work. |
| Active goal checked or created | yes | Active autogoal created for the replay coverage benchmark objective. |
| Source of truth read before analysis | yes | Read Slate v2 and Slate Playwright example test corpus before shaping rows. |
| Major lane selected | yes | Benchmark implementation lane selected. |
| Decision criteria stated | yes | Completion threshold lists artifact, ingestion, viewer, check, and served-route criteria. |
| Existing repo patterns / prior decisions checked | yes | Reused benchmarks/editor Evidence Kit rows and generated perf docs flow. |
| Helper stack selected | yes | Local generator script plus Evidence Kit ingestion; no external research helper needed. |
| External research decision recorded | yes | N/A because local repo evidence was authoritative. |
| Implementation expectation recorded | yes | Implementation expected and completed. |
| Workspace authority selected | yes | plate-2 controls benchmark harness; .tmp/slate-v2 controls generator artifact. |
| Branch / PR expectation decided | yes | No commit, push, or PR requested. |
| Browser pack selected | yes | Browser route proof used for generated rich-text.html and data JSON. |
| Browser route / app surface identified | yes | http://127.0.0.1:8765/rich-text.html. |
| Browser tool decision recorded | yes | Browser MCP was not exposed by tool search; HTTP smoke proof used against the same served route. |
| Console/network caveat policy recorded | yes | Static page proof checks HTTP status and generated JSON content. |
Work Checklist:
Completion Gates:
| Gate | Applies | Evidence |
|---|---|---|
| Browser replay artifact generated | yes | Artifact has 280 rows: 150 ok, 130 coverage-gap. |
| Slate v2 and Slate source counts recorded | yes | Generator metadata recorded 134 Slate v2 tests, 8 Slate tests, 136 union fixtures. |
| Evidence Kit result regenerated | yes | rich-text-editors-latest.json has 852 rows. |
| Viewer data regenerated | yes | rich-text-data.json has replay coverage categories and no legacy-slate label. |
| Package checks green | yes | npm run check passed in benchmarks/editor. |
| Served route proof | yes | curl -I returned HTTP 200 for rich-text.html; JSON smoke returned row counts below. |
| Autogoal completion check | yes | check-complete run after this file update. |
Verification evidence:
node --check .tmp/slate-v2/scripts/benchmarks/browser/rich-text-replay-coverage.mjs
passed.bunx biome check package.json scripts/benchmarks/browser/rich-text-replay-coverage.mjs --fix
passed in .tmp/slate-v2.bun run bench:browser:rich-text-replay-coverage:local generated
.tmp/slate-v2/tmp/slate-browser-rich-text-replay-coverage-benchmark.json.{ ok: 150, coverage-gap: 130 }.npm run bench:rich-text:check passed in benchmarks/editor and generated
benchmarks/results/rich-text-editors-latest.json with 852 rows.npm run docs:perf passed.npm run docs:perf:check passed.npx biome check src/index.mjs benchmarks/render-rich-text-viewer.mjs --fix
passed.npm run check passed in benchmarks/editor.curl -I --max-time 2 http://127.0.0.1:8765/rich-text.html returned HTTP
200.rowCount: 852,
slate-browser-rich-text-replay-coverage: 272,
slate-browser-rich-text-replay-suite-coverage: 8, status counts
{ adapter-missing: 55, coverage-gap: 130, ok: 663, optional-missing-artifact: 2, over-budget: 2 }, replay libraries
slate-v2:browser-replay and slate:browser-replay, and
hasLegacySlateName: false.Reboot status: Complete. The next useful layer is measured browser replay timing/trace rows for a selected shared subset, not more coverage inventory.
Open risks: The replay artifact is meaningful as coverage and parity inventory only. It does not prove Slate v2 is faster in those browser scenarios until selected fixtures are executed with timing, trace, and repeat-count discipline.
Current verdict: