docs/plans/2026-06-01-slate-benchmark-target-registry-migration.md
Objective: Design Slate benchmark target registry migration; done when plan gates define clean cutover and proof; plan docs/plans/2026-06-01-slate-benchmark-target-registry-migration.md.
Flow mode: one-shot execution
Goal plan: docs/plans/2026-06-01-slate-benchmark-target-registry-migration.md
Template: docs/plans/templates/major-task.md
Primary template: docs/plans/templates/major-task.md
Applied packs:
Major source:
.tmp/slate-v2 is merged back into PlateMajor lane:
benchmarks/targets/**, tooling/scripts/bench-targets.mjs, package.json, .agents/rules/slate-autoresearch.mdc, generated skill mirrorsCompletion threshold:
benchmarks/targets/slate-v2.json exists with the active Evidence Kit artifact rows imported as target definitions.tooling/scripts/bench-targets.mjs can list, validate, run, and create Autoresearch setup plans by target id.bench:targets:* and slate:ar:setup-target.slate-autoresearch tells future agents to use the target registry first and treat Evidence Kit as legacy input/reporting.Verification surface:
pnpm bench:targets:checkpnpm bench:targets:list | sed -n '1,12p'pnpm slate:ar:setup-target -- react-active-typing-breakdown | sed -n '1,80p'node --check tooling/scripts/bench-targets.mjspackage.json and benchmarks/targets/slate-v2.jsonrg for target registry guidance in source and generated skillsgit diff --checknode .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-slate-benchmark-target-registry-migration.mdConstraints:
Boundaries:
benchmarks/editor/research/benchmark-registry.json, benchmarks/editor/research/evidence-source-map.md, benchmarks/editor/iterations/003-evidence-control-plane.md, .tmp/slate-v2/package.json, and current package scripts.benchmarks/targets/**, tooling/scripts/bench-targets.mjs, package.json, .agents/rules/slate-autoresearch.mdc, generated skill mirrors, this plan..tmp/slate-v2 benchmark code, changing benchmark measurement semantics, running expensive browser benchmarks.Output budget strategy:
sed -n and target rg; do not stream generated perf wiki or full benchmark result JSON.Blocked condition:
Major state:
Current verdict:
Completion rule:
update_goal(status: complete) until every completion threshold above is satisfied, final evidence is recorded, and node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-slate-benchmark-target-registry-migration.md passes.Start Gates:
| Gate | Applies | Evidence |
|---|---|---|
major-task loaded | yes | Read .agents/skills/major-task/SKILL.md. |
| Active goal checked or created | yes | get_goal returned no goal; create_goal created this objective. |
| Source of truth read before analysis | yes | Read Evidence Kit source map, Evidence Kit control-plane note, active registry, health output, .tmp/slate-v2 scripts. |
| Major lane selected | yes | Benchmark and performance architecture. |
| Decision criteria stated | yes | One registry, target-based Autoresearch setup, no active Evidence Kit split-brain, runtime code follows runtime. |
| Existing repo patterns / prior decisions checked | yes | Evidence Kit memory and docs showed current active artifact registry and generated health/report flow. |
| Helper stack selected | yes | autogoal, major-task, docs-creator, performance-oracle, agent-native-reviewer; no external research needed. |
| External research decision recorded | yes | N/A: repo evidence is enough. |
| Implementation expectation recorded | yes | First implementation slice adds target registry and commands, not full deletion. |
| Workspace authority selected | yes | Root plate-2 owns control scripts and generated skills; .tmp/slate-v2 owns runtime benchmark commands. |
| Branch / PR expectation decided | yes | N/A: user did not ask for branch, commit, push, or PR. |
| Output budget strategy recorded | yes | See Output budget strategy. |
| Docs pack selected | yes | This plan and target README are docs/supporting surfaces. |
docs-creator loaded | yes | Read .agents/skills/docs-creator/SKILL.md. |
| Docs lane selected | yes | Supporting docs under major-task, not docs-dominant. |
| Target docs and nearest sibling docs read | yes | Read Evidence Kit source map, Evidence Kit control-plane note, generated benchmark health excerpt. |
| Docs style doctrine read | yes | Read docs-creator voice/current-state rules. |
| Documented source owner identified | yes | Target registry README documents benchmarks/targets/slate-v2.json and tooling/scripts/bench-targets.mjs. |
| Agent-native pack selected | yes | Package scripts and .agents/rules/slate-autoresearch.mdc changed. |
| Agent-facing action surface identified | yes | pnpm bench:targets:*, pnpm slate:ar:setup-target, and slate-autoresearch skill text. |
| Source rule versus generated mirror boundary identified | yes | Edited .agents/rules/slate-autoresearch.mdc; ran pnpm install to regenerate .agents/skills/** and .claude/skills/**. |
agent-native-reviewer loaded or waiver recorded | yes | Read .agents/skills/agent-native-reviewer/SKILL.md; action parity checked by command scripts and skill discoverability. |
Work Checklist:
.agents/rules/** changed.Completion Gates:
| Gate | Applies | Required action | Evidence |
|---|---|---|---|
| Named verification threshold | yes | Run the repo audit and command smoke checks named in this plan | pnpm bench:targets:check, pnpm bench:targets:list, pnpm slate:ar:setup-target, syntax/JSON checks passed. |
| Current-state source audit | yes | Map current owner, boundaries, constraints, and affected surfaces | Findings record Evidence Kit active state, 23 active artifacts, 35 ignored historical artifacts, and .tmp/slate-v2 runtime ownership. |
| Decision criteria closure | yes | Mark each criterion satisfied, narrowed, rejected, or blocked with evidence | Target registry and CLI satisfy first-slice criteria; full Evidence Kit deletion intentionally deferred. |
| Options / tradeoffs / rejection record | yes | Record viable options, chosen recommendation, and why alternatives lose | Decisions and tradeoffs section records options A-D. |
| Review / pressure pass | yes | Run selected reviewer/lens or record N/A with reason | Performance-oracle lens applied manually to target registry complexity; agent-native reviewer loaded; no external reviewer subprocess available. |
| Review findings closure | yes | Fix or explicitly reject accepted/actionable findings and record closure proof | Fixed pnpm -- target parsing and Evidence Kit path-base import bug. |
| External-source audit | no | Cite official/local clone/external sources when used, or record N/A | N/A: local repo evidence is sufficient. |
| Implementation gates | yes | If code changed, close primary-template and touched-surface gates; otherwise N/A | Code/docs/agent gates closed in this plan. |
| Final handoff contract | yes | Record recommendation, evidence, caveats, residual risk, and next owner | Final handoff contract section filled. |
| Final lint | yes | Run scoped equivalent when files changed | git diff --check will run after final plan edit. |
| Output budget discipline | yes | Verify no unbounded high-volume command output was streamed, or record the accidental output and recovery | Used targeted sed, rg, JSON summaries, and capped command output. |
| Goal plan complete | yes | Run node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-slate-benchmark-target-registry-migration.md | Final command is the last closure gate and is recorded in Verification evidence. |
| Docs source-backed claim audit | yes | Verify docs claims against current source or record N/A | benchmarks/targets/README.md claims match bench-targets.mjs, package scripts, and target registry. |
| Docs links / routes / previews | no | Verify leaf links, routes, anchors, and preview names or record N/A | N/A: no route, anchor, or preview added. |
| Docs MDX/content parser | no | Run content build for MDX/content changes, or record N/A | N/A: only markdown plan/README changed. |
| Plugin page specifics | no | Apply docs-creator kit/manual/API rules; otherwise N/A | N/A: no plugin docs page. |
| Agent source / generated sync | yes | Run pnpm install when .agents/rules/** changed and verify generated mirrors | pnpm install regenerated Skiller output. |
| Agent action discoverability | yes | Source-audit the skill/rule path an agent will read | rg verifies Target Registry and pnpm slate:ar:setup-target in source and generated skills. |
| Agent-native review | yes | Load .agents/skills/agent-native-reviewer/SKILL.md and close accepted findings, or record N/A | Loaded; no orphan agent action remains because root scripts and skill text expose target setup/list/check. |
Phase / pass table:
| Phase | Status | Evidence | Next |
|---|---|---|---|
| Intake and source read | complete | Read goal skill, major-task skill, Evidence Kit docs/registry/health, package scripts. | current-state map |
| Current-state map | complete | Evidence Kit owns active artifact list and generated reports; .tmp/slate-v2 owns runtime benchmark commands. | options |
| Options and recommendation | complete | Chose target registry as source; rejected Evidence Kit or Autoresearch as sole active owner. | implementation |
| Review / pressure pass | complete | Performance and agent-native lenses applied; CLI smoke found two real migration bugs. | verification |
| Implementation or plan artifact | complete | Added target registry, CLI, README, package scripts, and skill guidance. | verification |
| Verification | complete | Target check/list/setup and syntax/JSON checks passed; final diff/check-complete remains after last edit. | closeout |
| Closeout | complete | Final plan and final commands recorded. | final response |
Findings:
benchmarks/editor/research/benchmark-registry.json..tmp/slate-v2/package.json owns runtime benchmark scripts such as bench:react:active-typing:local, bench:react:huge-document:browser-trace:local, and core benchmark scripts..tmp/slate-v2 by identity.benchmarks/targets/slate-v2.json the source for benchmark decisions, with Autoresearch consuming target ids and generated reports/history reading the same target registry.Decisions and tradeoffs:
Implementation notes:
tooling/scripts/bench-targets.mjs with list, check, run, autoresearch-setup, and import-evidence-kit.benchmarks/targets/slate-v2.json with 23 imported targets.benchmarks/targets/README.md documenting ownership, commands, and target contract.bench:targets:list, bench:targets:check, bench:targets:run, bench:targets:import-evidence-kit, and slate:ar:setup-target..agents/rules/slate-autoresearch.mdc so future agents use target ids first and treat Evidence Kit as legacy input/reporting.pnpm install so .agents/skills/slate-autoresearch/SKILL.md and .claude/skills/slate-autoresearch/SKILL.md match the source rule.Review fixes:
pnpm slate:ar:setup-target -- react-active-typing-breakdown initially treated -- as the target id. Fixed by stripping the pnpm separator in bench-targets.mjs.../../.tmp/slate-v2 relative to benchmarks/editor/research, producing benchmarks/.tmp/slate-v2. Fixed importer base to benchmarks/editor and regenerated the target registry.Error attempts:
| Error / failed attempt | Count | Next different move | Resolution |
|---|---|---|---|
| pnpm separator treated as target id | 1 | Strip leading -- from CLI args | Fixed and verified with pnpm slate:ar:setup-target -- react-active-typing-breakdown. |
| Evidence Kit path base resolved from wrong directory | 1 | Resolve imported paths from benchmarks/editor | Fixed and regenerated benchmarks/targets/slate-v2.json. |
Verification evidence:
/Users/zbeyens/git/plate-2: pnpm install -> passed and regenerated Skiller output./Users/zbeyens/git/plate-2: pnpm bench:targets:check -> benchmark-targets ok: 23 targets./Users/zbeyens/git/plate-2: pnpm bench:targets:list | sed -n '1,12p' -> printed target ids and commands./Users/zbeyens/git/plate-2: pnpm slate:ar:setup-target -- react-active-typing-breakdown | sed -n '1,80p' -> returned Autoresearch setup-plan JSON with typing_seconds and target benchmark command./Users/zbeyens/git/plate-2: node --check tooling/scripts/bench-targets.mjs -> passed./Users/zbeyens/git/plate-2: JSON parse for package.json and benchmarks/targets/slate-v2.json -> passed./Users/zbeyens/git/plate-2: generated skill source audit for target registry guidance -> passed; Target Registry, pnpm slate:ar:setup-target, and Evidence Kit legacy guidance are present in source and generated skills./Users/zbeyens/git/plate-2: git diff --check -- tooling/scripts/bench-targets.mjs benchmarks/targets package.json .agents/rules/slate-autoresearch.mdc .agents/skills/slate-autoresearch/SKILL.md .claude/skills/slate-autoresearch/SKILL.md docs/plans/2026-06-01-slate-benchmark-target-registry-migration.md -> passed./Users/zbeyens/git/plate-2: node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-01-slate-benchmark-target-registry-migration.md -> [autogoal] complete.Final handoff contract:
METRIC/ARTIFACT lines.Timeline:
Reboot status:
| Question | Answer |
|---|---|
| Where am I? | Closeout complete after first implementation slice. |
| Where am I going? | Complete the goal if final goal-plan check passes. |
| What is the goal? | Establish the clean long-term benchmark target registry migration spine and proof. |
| What have I learned? | Evidence Kit can be imported as legacy input, but should not remain the active control plane. |
| What have I done? | Added target registry, CLI, commands, docs, and agent guidance. |
Open risks:
METRIC/ARTIFACT output for best Autoresearch fidelity.