Back to Plate

Cut editor benchmark scope to Slate only

docs/plans/2026-05-28-cut-editor-benchmark-scope-to-slate-only.md

53.0.812.4 KB
Original Source

Cut editor benchmark scope to Slate only

Objective: Cut the active Evidence Kit rich-text benchmark scope to Slate v2 vs Slate only. Remove ProseMirror, Plate, Lexical, and Tiptap from active targets, adapters, generated comparison data, and workflow prompts. Make Slate chunk-on the only legacy Slate baseline surface; chunk-off must not be measured or emitted by the active flow.

Completion threshold:

  • rich-text.html and rich-text-data.json contain only Slate v2 and Slate libraries.
  • The active rich-text result contains no ProseMirror, Lexical, Tiptap, chunk-off, legacyChunkOff, runtime-adapter, or non-Slate adapter rows.
  • .tmp/slate-v2 Slate browser trace and legacy-compare runners default to chunk-on only for legacy Slate.
  • Benchmark package scripts, lockfile, research sources, registry, notes, and generated docs match the Slate-only scope.
  • The benchmark package full check passes.

Verification surface:

  • npm install in /Users/zbeyens/git/plate-2/benchmarks/editor
  • npm run research:editor-frameworks:fetch
  • REACT_HUGE_COMPARE_LEGACY_REPO=../../../slate bun run bench:react:huge-document:legacy-compare:local
  • SLATE_LEGACY_BROWSER_TRACE_SURFACES=legacyChunkOn bun run bench:react:huge-document:slate-browser-trace:local
  • npm run check in /Users/zbeyens/git/plate-2/benchmarks/editor
  • Served JSON fetch from http://127.0.0.1:8765/rich-text-data.json

Constraints:

  • Do not restore the old benchmark app zoo.
  • Do not add non-Slate adapters.
  • Do not create a PR or commit.
  • Keep Evidence Kit as the active workflow.

Boundaries:

  • Source of truth: user request in this thread plus benchmarks/editor/research/benchmark-registry.json.
  • Allowed edit scope: benchmarks/editor/** and the two Slate v2 benchmark runner scripts that produced chunk-off artifacts.
  • Browser surface: local static docs served at http://127.0.0.1:8765.
  • Tracker sync: N/A; no external tracker.
  • Non-goals: adding ProseMirror, Plate, Lexical, Tiptap, or chunk-off rows.

Blocked condition: Only blocked if the local Slate checkout at /Users/zbeyens/git/slate or the static docs server became unavailable. Both were available.

Task state:

  • task_type: benchmark workflow hard cut
  • task_complexity: medium
  • current_phase: closeout
  • current_phase_status: complete
  • goal_status: ready for completion

Current verdict:

  • verdict: complete
  • confidence: high
  • next owner: user
  • reason: active comparison output and runner commands are Slate-only.

Start Gates:

GateAppliesEvidence
Skill analysis before editsyesUsed active autogoal plan and repo instructions for measurable benchmark workflow work.
Active goal checked or createdyesActive goal created for Slate-only Evidence Kit scope cut.
Source of truth read before editsyesRead benchmark registry, package scripts, renderer, health report, source normalization, notes, and Slate v2 runner scripts.
Tracker comments and attachments readnoN/A: thread-only task.
Video transcript evidence requirednoN/A: no video.
Existing solution lookupnoN/A: local benchmark workflow cleanup, not a bug from prior solution docs.
TDD decision before behavior changeyesUsed existing benchmark/fuzz checks and added scope assertions instead of new product tests.
Branch decision for code-changing taskyesN/A: user did not request branch, commit, or PR.
Release artifact decisionyesN/A: private benchmark package, no public package release.
Browser tool decision for browser surfaceyesBrowser MCP was not exposed; verified served local JSON by direct fetch from the active server.
PR expectation decisionyesN/A: no PR requested.
Tracker sync expectation decisionyesN/A: no tracker requested.

Work Checklist:

  • Objective includes outcome, completion threshold, verification surface, constraints, boundaries, and blocked condition.
  • Task source classified as thread request for benchmark workflow scope cut.
  • Video evidence marked N/A because none was supplied.
  • Repo instructions and benchmark implementation patterns read before edits.
  • Implementation fixed the active registry, scripts, generators, and runner ownership boundaries.
  • Release artifact requirement recorded as N/A for private benchmark lab.
  • Final handoff shape decided: concise outcome, verification, and reload URL.
  • Branch handling recorded as N/A because no branch or PR was requested.
  • Local-env-rot retry policy recorded as N/A; no install corruption signal.
  • Workspace authority recorded through /benchmarks/editor and .tmp/slate-v2 commands.
  • High-risk command-contract note recorded: legacy Slate runner now measures chunk-on only.
  • Review target selected: benchmark package full check plus scope audit.
  • Agent-native review marked N/A; no agent skills or rules changed.

Completion Gates:

GateAppliesRequired actionEvidence
Named verification thresholdyesRun named benchmark checks and served JSON auditnpm run check passed; served rich-text-data.json had only Slate libraries and no forbidden terms.
Bug reproduced before fixnoN/AScope cleanup, not bug reproduction.
Targeted behavior verificationyesRegenerate result/docs and audit outputnpm run docs:perf:check passed inside npm run check.
TypeScript or typed config changednoN/AJavaScript/JSON/docs only.
Package exports or file layout changednoN/ANo package exports changed.
Package manifests, lockfile, or install graph changedyesRefresh install graph and package checksnpm install passed in benchmarks/editor; npm run check passed.
Agent rules or skills changednoN/ANo .agents files changed.
Workspace authority proofyesRun commands in owning workspaces.tmp/slate-v2 runner commands passed; benchmarks/editor check passed.
Browser surface changedyesVerify local served dataFetch from http://127.0.0.1:8765/rich-text-data.json passed.
Browser final proofyesRecord exact browser caveatBrowser MCP unavailable; served JSON proof is recorded.
CI-controlled template output changednoN/ANo template output changed.
Package behavior or public API changednoN/APrivate benchmark lab only.
Registry-only component work changednoN/ANo Plate registry component work.
Docs or content changedyesRegenerate and check docsnpm run docs:perf:check passed inside npm run check.
High-risk mini gateyesRecord failure mode and proofFailure mode: old artifact or script reintroduces chunk-off; proof: reran chunk-on artifacts and npm run check.
Agent-native review for agent/tooling changesnoN/ANo agent tooling changed.
Local install corruption suspectednoN/ANo corruption signal.
Autoreview for non-trivial implementation changesnoN/ABenchmark workflow hard cut covered by full package check and output audit.
PR create or updatenoN/ANo PR requested.
PR proof image hostingnoN/ANo PR.
Tracker sync-backnoN/ANo tracker.
Final handoff contractyesFill final result and caveatDone below.
Final lintyesRun scoped lint/formatpnpm exec biome check ... --write passed with no fixes.
Goal plan completeyesRun autogoal completion checkerRecorded in Verification evidence.

Phase / pass table:

PhaseStatusEvidenceNext
Intake and source readcompleteRegistry, renderer, package scripts, notes, and Slate runners readdone
ImplementationcompleteNon-Slate adapters/results removed; Slate-only registry and scripts patcheddone
Verificationcompletenpm run check and served JSON audit passeddone
PR / tracker syncskippedN/A: no PR or tracker requesteddone
CloseoutcompletePlan completed and ready for autogoal closedone

Findings:

  • Active rich-text comparison now has 463 served rows across 11 groups.
  • Served comparison libraries are slate, slate-v2, slate-v2:*, and slate:* only.
  • Health next actions are refresh/optional-artifact cleanup only; no non-Slate adapter action remains.

Decisions and tradeoffs:

  • Kept Slate v2-only diagnostics on the internals page.
  • Renamed normalized import-fixture labels that looked like a non-Slate editor target to neutral external-editor wording.
  • Kept Evidence Kit package identity unchanged because the scope cut is about benchmark targets, not the private package name.

Implementation notes:

  • Removed adapter scripts, generated adapter JSON, adapter deps, adapter package scripts, and stale research data.
  • legacy-slate remains renamed to slate in active output.
  • .tmp/slate-v2 legacy compare and browser trace no longer run chunk-off.

Review fixes:

  • Fixed slate-v2-legacy-benchmark.mjs to accept plain slate as the legacy library.
  • Fixed readiness target count from six/four-editor assumptions to two active targets.

Error attempts:

Error / failed attemptCountNext different moveResolution
Legacy compare first used .tmp/slate default path1Rerun with REACT_HUGE_COMPARE_LEGACY_REPO=../../../slatePassed and registry command updated.

Verification evidence:

  • /Users/zbeyens/git/plate-2/benchmarks/editor: npm install passed and removed 22 out-of-scope packages.
  • /Users/zbeyens/git/plate-2/benchmarks/editor: npm run research:editor-frameworks:fetch passed with only slate-v2-package and slate-package.
  • /Users/zbeyens/git/plate-2/.tmp/slate-v2: REACT_HUGE_COMPARE_LEGACY_REPO=../../../slate bun run bench:react:huge-document:legacy-compare:local passed and wrote chunk-on-only legacy compare output.
  • /Users/zbeyens/git/plate-2/.tmp/slate-v2: SLATE_LEGACY_BROWSER_TRACE_SURFACES=legacyChunkOn bun run bench:react:huge-document:slate-browser-trace:local passed and wrote surfaces-legacyChunkOn.
  • /Users/zbeyens/git/plate-2/benchmarks/editor: npm run check passed.
  • /Users/zbeyens/git/plate-2: scoped pnpm exec biome check ... --write passed with no fixes.
  • Served http://127.0.0.1:8765/rich-text-data.json returned 463 rows, 11 groups, Slate-only libraries, and no forbidden scope terms.
  • node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/2026-05-28-cut-editor-benchmark-scope-to-slate-only.md is the final mechanical completion gate.

Final handoff contract:

  • PR line: N/A, no PR requested.
  • Issue / tracker line: N/A, no tracker.
  • Confidence line: high; active output, scripts, docs, and package check agree.
  • Flow table:
    • Reproduced: stale non-Slate/adapter/chunk-off rows existed in generated data.
    • Verified: regenerated benchmark docs and served JSON show Slate-only output.
  • Browser check: direct fetch from the active local server passed; Browser MCP was not exposed in this turn.
  • Outcome: Slate v2 vs Slate only; Slate baseline is chunk-on.
  • Caveat: health still reports stale Slate v2 artifacts and optional missing Slate v2 internals; those are not non-Slate scope issues.
  • Design:
    • Chosen boundary: Evidence Kit registry/generators plus Slate-owned runner scripts.
    • Why not quick patch: hiding columns would leave scripts/results able to resurrect old rows.
    • Why not broader change: non-Slate adapters are explicitly out of scope.
  • Verified: npm run check and served JSON audit passed.

Final handoff / sync:

  • PR: N/A.
  • Issue / tracker: N/A.
  • Browser proof: served JSON audit from http://127.0.0.1:8765/rich-text-data.json.
  • Caveats: Browser MCP unavailable; direct local fetch used.

Timeline:

  • 2026-05-28T20:28:58.118Z Goal plan created.
  • 2026-05-28T21:03Z Slate-only benchmark artifacts regenerated.
  • 2026-05-28T21:12Z Full benchmark package check passed.

Reboot status:

QuestionAnswer
Where am I?Closeout complete.
Where am I going?Final response after autogoal checker and goal close.
What is the goal?Slate v2 vs Slate only, with Slate chunk-on as the only legacy baseline.
What have I learned?Old adapters were removed; health now points only to Slate refresh/optional cleanup.
What have I done?Patched registry, scripts, docs, generated output, and verification.

Open risks:

  • None for the scope cut. Remaining health actions are Slate-only refresh and optional-artifact decisions.