Back to Plate

refresh react active typing benchmark

docs/plans/2026-05-28-refresh-react-active-typing-benchmark.md

53.0.916.8 KB
Original Source

refresh react active typing benchmark

Objective: Refresh the active Evidence Kit react-active-typing-breakdown artifact from .tmp/slate-v2, regenerate the rich-text benchmark health/docs, and verify the dashboard remains scoped to slate and slate-v2.

Goal plan: docs/plans/2026-05-28-refresh-react-active-typing-benchmark.md

Task source:

  • type: benchmark health next action
  • id / link: refresh-react-active-typing-breakdown
  • title: Refresh react-active-typing-breakdown
  • acceptance criteria: the artifact at .tmp/slate-v2/tmp/slate-react-active-typing-breakdown-benchmark.json is fresh, benchmarks/editor/benchmarks/results/benchmark-health-latest.json no longer lists the active-typing refresh action, generated rich-text docs check clean, and served rich-text-data.json contains no non-Slate runtimes.

Completion threshold:

  • The Slate v2 benchmark command bun run bench:react:active-typing:local succeeds from .tmp/slate-v2.
  • Evidence Kit npm run check succeeds from benchmarks/editor.
  • Health next actions no longer include refresh-react-active-typing-breakdown.
  • Served http://127.0.0.1:8765/rich-text-data.json contains only Slate/Slate-v2 libraries.
  • .tmp/slate-v2 owner checks are run or the unrelated blocker is recorded with exact evidence.
  • node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/2026-05-28-refresh-react-active-typing-benchmark.md passes.

Verification surface:

  • .tmp/slate-v2: bun run bench:react:active-typing:local
  • .tmp/slate-v2: bun check
  • benchmarks/editor: npm run check
  • repo root: health-next-action JSON audit
  • repo root: live rich-text-data.json Slate-only fetch audit
  • repo root: autogoal completion checker

Constraints:

  • Keep current rich-text scope to slate and slate-v2.
  • Do not add Plate, ProseMirror, Lexical, or TipTap back into active output.
  • Do not repair unrelated Slate v2 contract inventory failures in this benchmark refresh.
  • Do not create commits, PRs, pushes, or tracker comments.

Boundaries:

  • Source of truth: Evidence Kit registry and health output under benchmarks/editor, plus the Slate v2 benchmark artifact under .tmp/slate-v2/tmp.
  • Allowed edit scope: the Slate v2 active-typing benchmark command/script, generated Evidence Kit outputs, and this goal plan.
  • Browser surface: static benchmark docs served at http://127.0.0.1:8765/.
  • Tracker sync: not applicable, no external tracker requested.
  • Non-goals: no non-Slate runtime adapters, no chunk-off lane, no unrelated Slate v2 contract-test repairs.

Blocked condition:

  • The only blocker would be the Slate v2 active-typing benchmark itself failing after adapting to the current Slate v2 React/editor APIs, or Evidence Kit refusing the refreshed artifact. Neither blocker remains.

Task state:

  • task_type: benchmark-refresh
  • task_complexity: focused
  • current_phase: closeout
  • current_phase_status: complete
  • next_phase: final-response
  • goal_status: active-until-autogoal-close

Current verdict:

  • verdict: complete after autogoal checker
  • confidence: high
  • next owner: user
  • reason: refreshed artifact and generated docs are verified; only unrelated .tmp/slate-v2 contract inventory tests fail after lint/typecheck.

Start Gates:

GateAppliesEvidence
Skill analysis before editsyesUsed autogoal because the task had a measurable benchmark health threshold.
Active goal checked or createdyesActive goal objective tracks artifact refresh, health/docs regeneration, and Slate-only dashboard verification.
Source of truth read before editsyesRead benchmark registry references, health output, Evidence Kit package scripts, and active-typing benchmark script.
Tracker comments and attachments readnoNo tracker or external issue was part of this request.
Video transcript evidence requirednoNo video/screen recording supplied or needed.
docs/solutions checked for non-trivial existing-code worknoFocused benchmark command refresh; source of truth was local Evidence Kit and Slate v2 benchmark code.
TDD decision before behavior change or bug fixyesNo new test added; this is benchmark harness repair verified by the target benchmark and generated health/docs checks.
Branch decision for code-changing taskyesNo branch action; user asked for local execution only and repo rules forbid git actions without request.
Release artifact decisionyesNo package release artifact; changed benchmark script/package script only.
Browser tool decision for browser surfaceyesBrowser MCP was unavailable after tool discovery; live HTTP JSON fetch verified served dashboard data directly.
PR expectation decisionyesNo PR requested.
Tracker sync expectation decisionyesNo tracker sync requested.

Work Checklist:

  • Objective includes outcome, completion threshold, verification surface, constraints, boundaries, and blocked condition.
  • Task source classified with source type, id/link, title, task type, acceptance criteria, caveats, likely files/routes/packages, browser surface, and root-cause layer.
  • Required video or screen-recording evidence is cached/read as normalized XML, or marked not applicable with reason.
  • Nearby repo instructions and implementation patterns read before edits.
  • Implementation fixes the right ownership boundary, or the narrower choice is recorded with reason.
  • Release artifact requirement recorded: no changeset or registry changelog applies to benchmark harness refresh.
  • Final handoff shape decided: concise benchmark refresh summary with checks and caveat.
  • Branch handling recorded for code-changing work: no branch action because none was requested.
  • Local-env-rot retry policy recorded for surprising repo-wide failure: not local env rot; bun check failures are deterministic contract inventory/version assertions.
  • Workspace authority recorded: proof commands name .tmp/slate-v2, benchmarks/editor, or repo root as owner.
  • High-risk note recorded for command-contract/runtime changes: active benchmark script needed current Slate v2 React/editor API adaptation.
  • Review/autoreview target selected from actual diff state: not run; focused benchmark harness repair with target checks.
  • Agent-native review decision recorded: not applicable, no agent tooling changed.

Completion Gates:

GateAppliesRequired actionEvidence
Named verification thresholdyesRun benchmark, Evidence Kit, health, and Slate-only auditsbun run bench:react:active-typing:local, npm run check, health audit, and live JSON audit completed.
Bug reproduced before fixyesRecord failing command before fixInitial .tmp/slate-v2 run failed because bench:react:active-typing:local was missing. After adding it, the script failed on stale public Editor import and old React provider shape.
Targeted behavior verificationyesRun focused benchmark command.tmp/slate-v2: bun run bench:react:active-typing:local wrote fresh artifact with 5000 blocks, 3 iterations, 10 type ops.
TypeScript or typed config changedyesRun relevant owner check.tmp/slate-v2: bun check completed lint and typecheck before unrelated vitest contract failures.
Package exports or file layout changednoNo barrel actionNo exported package file layout changed.
Package manifests, lockfile, or install graph changedyesVerify manifest-only script edit.tmp/slate-v2/package.json script added; no dependency or lockfile change, so no install needed.
Agent rules or skills changednoNo skill syncNo .agents source edited.
Workspace authority proofyesRun commands in owning directoriesBenchmark ran in .tmp/slate-v2; Evidence Kit suite ran in benchmarks/editor; served JSON audit ran from repo root against 127.0.0.1:8765.
Browser surface changedyesVerify served data or browser caveatBrowser MCP unavailable; direct fetch of served rich-text-data.json verified row count 463, groups 11, no forbidden non-Slate terms.
Browser final proofyesRecord exact browser verification caveatHTTP data proof covers the generated dashboard data; no screenshot captured because Browser MCP tool was not exposed.
CI-controlled template output changednoNo template outputNo templates/** changed.
Package behavior or public API changednoNo changesetBenchmark harness only; no package behavior/API release change.
Registry-only component work changednoNo component changelogNot registry component work.
Docs or content changedyesVerify generated docsbenchmarks/editor: npm run check ran docs:perf, docs:rich-text:check, and docs:index:check successfully.
High-risk mini gateyesRecord failure mode and proofFailure mode was stale benchmark harness silently aging out. Proof is fresh artifact plus health next-action removal.
Agent-native review for agent/tooling changesnoNo agent tooling reviewNo agent/tooling files changed.
Local install corruption suspectednoNo reinstallThe failed .tmp/slate-v2 tests are explicit inventory/version assertion mismatches, not install corruption signals.
Autoreview for non-trivial implementation changesnoNo autoreviewFocused benchmark harness repair; target checks provide sufficient proof.
PR create or updatenoNo PRUser did not ask for PR.
PR proof image hostingnoNo PR bodyNot applicable.
Tracker sync-backnoNo trackerNot applicable.
Final handoff contractyesFill final handoff fieldsCompleted below.
Final lintyesRun owner lint or scoped equivalentRoot Biome ignored .tmp and plan files; .tmp/slate-v2 bun check ran bun lint successfully before later test failure.
Goal plan completeyesRun autogoal checkerTo be run after this file update.

Phase / pass table:

PhaseStatusEvidenceNext
Intake and source readcompleteLocated active-typing registry/health path and Slate v2 script/API mismatch.none
ImplementationcompleteAdded missing package script and adapted active-typing benchmark to current Slate v2 React/editor APIs.none
VerificationcompleteBenchmark, Evidence Kit check, health audit, served JSON audit, and .tmp/slate-v2 owner check caveat recorded.none
PR / tracker synccompleteNo PR or tracker requested.none
CloseoutcompletePlan evidence recorded; autogoal checker remains the final mechanical gate.final response

Findings:

  • Evidence Kit health was right: react-active-typing-breakdown pointed at an old active artifact.
  • The registry command could not run because .tmp/slate-v2/package.json no longer exposed bench:react:active-typing:local.
  • The benchmark script had drifted from current Slate v2 runtime shape: Editor is type-only from the public index, Editable is now provider-scoped through Slate, and mutations use transaction APIs.
  • Bun TSX source imports needed a local React global shim for current source component execution.

Decisions and tradeoffs:

  • Fixed the benchmark harness where it owns the drift instead of changing package public exports to satisfy an old benchmark import.
  • Kept the dashboard Slate-only and did not reintroduce non-Slate runtime adapters.
  • Did not repair unrelated Slate v2 contract inventory tests during this benchmark refresh.

Implementation notes:

  • Added .tmp/slate-v2 package script bench:react:active-typing:local.
  • Updated active-typing benchmark imports to use public types plus internal Editor helper.
  • Switched benchmark editor setup to createReactEditor() and <Slate editor={editor}>.
  • Replaced stale direct selection/text edits with editor.update((tx) => ...) transaction operations.
  • Wrote a fresh .tmp/slate-v2/tmp/slate-react-active-typing-breakdown-benchmark.json.

Review fixes:

  • None from external review.

Error attempts:

Error / failed attemptCountNext different moveResolution
Missing bench:react:active-typing:local script1Restore the script in .tmp/slate-v2/package.jsonFixed.
Stale public Editor runtime import and old Editable API1Match current Slate v2 benchmark patterns using internal Editor, createReactEditor, and Slate providerFixed.
React global missing for Bun TSX source component path1Add local benchmark-only React global shimFixed.
Root Biome ignored .tmp and plan files1Use .tmp/slate-v2 owner checkLint/typecheck reached through bun check; later unrelated tests failed.
.tmp/slate-v2 bun check vitest failures1Classify against touched files and command phaseRecorded as unrelated contract inventory/version failures after lint/typecheck succeeded.

Verification evidence:

  • .tmp/slate-v2: bun run bench:react:active-typing:local passed and wrote tmp/slate-react-active-typing-breakdown-benchmark.json.
  • Fresh artifact metadata: mtime 2026-05-28T21:51:41.114Z, lane slate-react-active-typing-breakdown, blocks 5000, iterations 3, typeOps 10, scenarios middleShelledModelOnly, middlePromoteThenType, middleSelectThenType, startActiveTyping, selectAll.
  • Artifact sample values: middleSelectThenType.totalActMs.p95 = 206.02, selectAll.selectAllMs.p95 = 40.31.
  • benchmarks/editor: npm run check passed. It regenerated benchmark results, docs, health, scope, and research list output.
  • Health audit after regeneration: status counts coverage-gap=130, ok=663, optional-missing-artifact=2, unsupported=6; next actions no longer include Refresh react-active-typing-breakdown.
  • Live served JSON audit: rowCount=463, groupCount=11, libraries slate, slate-v2, slate-v2:browser-replay, slate-v2:current, slate-v2:default-render-auto, slate-v2:dom-present, slate:baseline, slate:browser-replay; forbidden terms found [].
  • .tmp/slate-v2: bun check ran lint and typecheck successfully, then failed in packages/slate-react vitest contract tests: root selector inventory, mutation/repair inventory, peer dependency floor mismatch, and example value-replace inventory. These are outside the active-typing benchmark files.

Final handoff contract:

  • PR line: no PR requested.
  • Issue / tracker line: no tracker requested.
  • Confidence line: high for the benchmark refresh; medium for broader .tmp/slate-v2 repo health because unrelated contract inventory tests are red.
  • Flow table:
    • Reproduced: stale refresh action existed in health; target benchmark command initially failed.
    • Verified: target benchmark passed, Evidence Kit check passed, health action disappeared, served dashboard data stayed Slate-only.
  • Browser check: Browser MCP unavailable; direct HTTP fetch against the served local dashboard JSON passed.
  • Outcome: react-active-typing-breakdown is fresh and back in the active Evidence Kit data flow.
  • Caveat: .tmp/slate-v2 full bun check is still red from unrelated Slate React contract inventory/version assertions.
  • Design:
    • Chosen boundary: benchmark harness and package script.
    • Why not quick patch: changing generated health output directly would hide stale data instead of refreshing the artifact.
    • Why not broader change: package public exports and unrelated Slate React contract tests are outside this refresh.
  • Verified: bun run bench:react:active-typing:local, npm run check, health audit, live JSON audit, .tmp/slate-v2 owner check caveat.

Final handoff / sync:

  • PR: not requested.
  • Issue / tracker: not requested.
  • Browser proof: live HTTP JSON audit against 127.0.0.1:8765.
  • Caveats: unrelated .tmp/slate-v2 vitest contract failures remain.

Timeline:

  • 2026-05-28T21:47:39.309Z Task goal plan created.
  • 2026-05-28T21:51:41.114Z Fresh active-typing artifact written.
  • 2026-05-28T21:53:33Z Evidence Kit npm run check regenerated core/rich-text/startup/package/health/docs outputs and passed.
  • 2026-05-28T21:54Z Health and served JSON audits passed.
  • 2026-05-28T21:55Z .tmp/slate-v2 owner check recorded lint/typecheck pass plus unrelated vitest contract failures.

Reboot status:

QuestionAnswer
Where am I?Closeout.
Where am I going?Run autogoal checker, close goal, final response.
What is the goal?Refresh the active typing benchmark artifact and remove its stale health action.
What have I learned?The benchmark harness had drifted from current Slate v2 APIs, and the dashboard data remains Slate-only after regeneration.
What have I done?Restored the missing package script, updated the benchmark script, regenerated Evidence Kit output, and verified health/data.

Open risks:

  • .tmp/slate-v2 still has unrelated Slate React contract inventory/version test failures under bun check; they should be handled separately.