Back to Plate

slate browser rich text replay benchmark

docs/plans/2026-05-28-slate-browser-rich-text-replay-benchmark.md

53.0.89.5 KB
Original Source

slate browser rich text replay benchmark

Objective: Add a browser-level rich-text replay coverage layer to the benchmarks/editor Evidence Kit benchmark so Slate v2 and Slate are compared against the same Chromium Playwright richtext, tables, inlines, and paste-html replay inventory.

Goal plan: docs/plans/2026-05-28-slate-browser-rich-text-replay-benchmark.md

Completion threshold: The benchmark is complete when .tmp/slate-v2 can generate a row artifact from the Slate v2 and Slate Playwright browser test corpus, benchmarks/editor ingests it into the rich-text Evidence Kit result, rich-text.html exposes the new rows without the old legacy-slate label, package checks pass, the served route returns the regenerated data, and this plan passes the autogoal completion check.

Verification surface:

  • Generator: /Users/zbeyens/git/plate-2/.tmp/slate-v2/scripts/benchmarks/browser/rich-text-replay-coverage.mjs
  • Generated artifact: /Users/zbeyens/git/plate-2/.tmp/slate-v2/tmp/slate-browser-rich-text-replay-coverage-benchmark.json
  • Evidence Kit ingestion: /Users/zbeyens/git/plate-2/benchmarks/editor/src/index.mjs
  • Viewer generation: /Users/zbeyens/git/plate-2/benchmarks/editor/benchmarks/render-rich-text-viewer.mjs
  • Served page: http://127.0.0.1:8765/rich-text.html
  • Served data: http://127.0.0.1:8765/rich-text-data.json

Constraints:

  • Scope stays Slate v2 vs Slate for this layer.
  • The artifact records browser replay coverage and suite presence, not browser action timing.
  • Evidence comes from listed Chromium Playwright tests in local checkouts, not stale docs or manual fixture counts.
  • The existing Evidence Kit row contract stays the integration point.

Boundaries:

  • Source of truth: .tmp/slate-v2/playwright/integration/examples and /Users/zbeyens/git/slate/playwright/integration/examples.
  • Allowed edit scope: .tmp/slate-v2 benchmark scripts/package script, benchmarks/editor ingestion, viewer generation, generated benchmark data, generated perf docs, and this plan.
  • External sources: N/A; local editor test corpora settle this step.
  • Browser surface: generated static rich-text.html and rich-text-data.json served on port 8765.
  • Tracker sync: N/A; no issue or PR requested.
  • Non-goals: ProseMirror, Lexical, Plate, and Tiptap runtime adapters; measured browser replay timing; full Playwright execution of every listed replay row.

Blocked condition: Autonomous work would stop only if either local checkout could not list its Chromium Playwright tests, the Evidence Kit result could not ingest normalized rows, or the served route could not expose the regenerated data. None of those conditions occurred.

Major source:

  • type: local benchmark and browser test corpus
  • id / link: .tmp/slate-v2 plus /Users/zbeyens/git/slate
  • title: Slate v2 vs Slate browser rich-text replay coverage
  • decision to make: whether the rich-text benchmark covers real browser editing scenarios beyond synthetic/core rows
  • decision criteria: shared row artifact, Slate v2 and Slate labels, coverage gaps explicit, generated viewer data present, check commands green

Major lane:

  • lane: benchmark implementation
  • output type: Evidence Kit rows plus static viewer data
  • implementation expected: yes
  • affected packages / surfaces: .tmp/slate-v2 benchmark script and benchmarks/editor
  • dominant risk: mistaking coverage inventory for measured runtime speed

Phase / pass table:

PhaseStatusEvidence
IntakecompleteExisting Evidence Kit benchmark and Slate v2 browser test files inspected.
Artifact designcompleteRow artifact emits slate-v2:browser-replay and slate:browser-replay coverage rows.
ImplementationcompleteGenerator, package script, Evidence Kit ingestion, and viewer status mapping added.
VerificationcompleteGenerator, rich-text check, docs generation, docs check, package check, and served route smoke proof completed.
ClosurecompleteThis plan records evidence and passes check-complete.

Start Gates:

GateAppliesEvidence
major-task loadedyesUsed for heavyweight benchmark comparison work.
Active goal checked or createdyesActive autogoal created for the replay coverage benchmark objective.
Source of truth read before analysisyesRead Slate v2 and Slate Playwright example test corpus before shaping rows.
Major lane selectedyesBenchmark implementation lane selected.
Decision criteria statedyesCompletion threshold lists artifact, ingestion, viewer, check, and served-route criteria.
Existing repo patterns / prior decisions checkedyesReused benchmarks/editor Evidence Kit rows and generated perf docs flow.
Helper stack selectedyesLocal generator script plus Evidence Kit ingestion; no external research helper needed.
External research decision recordedyesN/A because local repo evidence was authoritative.
Implementation expectation recordedyesImplementation expected and completed.
Workspace authority selectedyesplate-2 controls benchmark harness; .tmp/slate-v2 controls generator artifact.
Branch / PR expectation decidedyesNo commit, push, or PR requested.
Browser pack selectedyesBrowser route proof used for generated rich-text.html and data JSON.
Browser route / app surface identifiedyeshttp://127.0.0.1:8765/rich-text.html.
Browser tool decision recordedyesBrowser MCP was not exposed by tool search; HTTP smoke proof used against the same served route.
Console/network caveat policy recordedyesStatic page proof checks HTTP status and generated JSON content.

Work Checklist:

  • Objective includes outcome, completion threshold, verification surface, constraints, boundaries, and blocked condition.
  • Major source records source type, id/link, title, decision type, expected outcome, decision criteria, likely files/packages/surfaces, browser surface, and highest-leverage owner.
  • Current state is mapped before proposing a new architecture, migration, benchmark, or plan.
  • Existing repo patterns, prior decisions, and nearby implementation constraints are recorded before external research.
  • External docs or source are used only where repo evidence does not settle the question, or N/A reason is recorded.
  • Options, recommendation, tradeoffs, blast radius, and rejection reasons are recorded.
  • Implementation touched only the benchmark generator, Evidence Kit ingestion, viewer generation, generated artifacts, and this plan.
  • Verification records commands and served-route output.
  • Remaining caveats are recorded without pretending coverage rows are timing rows.

Completion Gates:

GateAppliesEvidence
Browser replay artifact generatedyesArtifact has 280 rows: 150 ok, 130 coverage-gap.
Slate v2 and Slate source counts recordedyesGenerator metadata recorded 134 Slate v2 tests, 8 Slate tests, 136 union fixtures.
Evidence Kit result regeneratedyesrich-text-editors-latest.json has 852 rows.
Viewer data regeneratedyesrich-text-data.json has replay coverage categories and no legacy-slate label.
Package checks greenyesnpm run check passed in benchmarks/editor.
Served route proofyescurl -I returned HTTP 200 for rich-text.html; JSON smoke returned row counts below.
Autogoal completion checkyescheck-complete run after this file update.

Verification evidence:

  • node --check .tmp/slate-v2/scripts/benchmarks/browser/rich-text-replay-coverage.mjs passed.
  • bunx biome check package.json scripts/benchmarks/browser/rich-text-replay-coverage.mjs --fix passed in .tmp/slate-v2.
  • bun run bench:browser:rich-text-replay-coverage:local generated .tmp/slate-v2/tmp/slate-browser-rich-text-replay-coverage-benchmark.json.
  • Artifact evidence: 280 rows, status counts { ok: 150, coverage-gap: 130 }.
  • Source-count evidence: 134 Slate v2 Chromium listed tests, 8 Slate Chromium listed tests, 136 union browser replay fixtures.
  • npm run bench:rich-text:check passed in benchmarks/editor and generated benchmarks/results/rich-text-editors-latest.json with 852 rows.
  • npm run docs:perf passed.
  • npm run docs:perf:check passed.
  • npx biome check src/index.mjs benchmarks/render-rich-text-viewer.mjs --fix passed.
  • npm run check passed in benchmarks/editor.
  • curl -I --max-time 2 http://127.0.0.1:8765/rich-text.html returned HTTP 200.
  • Served JSON smoke proof returned rowCount: 852, slate-browser-rich-text-replay-coverage: 272, slate-browser-rich-text-replay-suite-coverage: 8, status counts { adapter-missing: 55, coverage-gap: 130, ok: 663, optional-missing-artifact: 2, over-budget: 2 }, replay libraries slate-v2:browser-replay and slate:browser-replay, and hasLegacySlateName: false.

Reboot status: Complete. The next useful layer is measured browser replay timing/trace rows for a selected shared subset, not more coverage inventory.

Open risks: The replay artifact is meaningful as coverage and parity inventory only. It does not prove Slate v2 is faster in those browser scenarios until selected fixtures are executed with timing, trace, and repeat-count discipline.

Current verdict:

  • verdict: complete
  • confidence: high
  • next owner: benchmark follow-up
  • reason: Slate v2 vs Slate now has explicit browser replay coverage in the generated Evidence Kit benchmark and ugly table viewer.