docs/plans/2026-05-28-wire-evidence-kit-into-slate-skills.md
Objective:
Wire Evidence Kit into the Slate agent workflow by updating the source rules for
slate-plan and slate-patch so agents read benchmark registry/health state,
refresh the control plane when relevant, record benchmark gaps or next owners,
and keep .tmp/slate-v2 behavior proof separate from plate-2 benchmark
evidence.
Goal plan: docs/plans/2026-05-28-wire-evidence-kit-into-slate-skills.md
Template: docs/plans/templates/task.md
Primary template: docs/plans/templates/task.md
Applied packs:
Task source:
slate-plan and slate-patchpnpm install, generated skill text verified, benchmark health command
verified, and autogoal completion check passes.Completion threshold:
.agents/rules/slate-plan.mdc contains an Evidence Kit control-plane gate for
planning, scoring, refresh, gap/candidate handling, and next-action ownership..agents/rules/slate-patch.mdc contains a conditional Evidence Kit sync step
after .tmp/slate-v2 proof and before final handoff.pnpm install regenerates .agents/skills/slate-plan/SKILL.md and
.agents/skills/slate-patch/SKILL.md with the same Evidence Kit language.npm run bench:editor:health proves the referenced benchmark health surface
is runnable.node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/2026-05-28-wire-evidence-kit-into-slate-skills.md passes.Verification surface:
pnpm installrg -n "Evidence Kit|Control-Plane|bench:editor:refresh|benchmark-registry|benchmark-health" .agents/skills/slate-plan/SKILL.md .agents/skills/slate-patch/SKILL.mdnpm run bench:editor:healthpnpm exec biome check ... --fix attempted and ignored these docs/agent paths
by repo config, which is acceptable for this markdown-only source sync.node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/2026-05-28-wire-evidence-kit-into-slate-skills.mdConstraints:
Boundaries:
.agents/rules/slate-plan.mdc and
.agents/rules/slate-patch.mdc; generated SKILL.md files are mirrors.Blocked condition:
Blocked only if Skiller cannot regenerate generated skills or the generated
skills omit the source-rule Evidence Kit gates after pnpm install.
Task state:
Current verdict:
Completion rule:
update_goal(status: complete) while any required checklist item
remains unchecked. If an item does not apply, check it and add N/A: <reason>.update_goal(status: complete) until every completion threshold
above is satisfied, final handoff evidence is recorded, and
node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/2026-05-28-wire-evidence-kit-into-slate-skills.md passes.Start Gates:
| Gate | Applies | Evidence |
|---|---|---|
| Skill analysis before edits | yes | Read autogoal, slate-plan, slate-patch, and agent-native-reviewer skill guidance. |
| Active goal checked or created | yes | get_goal returned no active goal; created this Evidence Kit workflow goal. |
| Source of truth read before edits | yes | Read .agents/rules/slate-plan.mdc and .agents/rules/slate-patch.mdc. |
| Tracker comments and attachments read | N/A | Current request is local workflow-rule work with no tracker link. |
| Video transcript evidence required | N/A | No video or screen recording input. |
docs/solutions checked for non-trivial existing-code work | N/A | Agent rule sync; live source rules are the source of truth. |
| TDD decision before behavior change or bug fix | N/A | No product behavior or bug fix changed. |
| Branch decision for code-changing task | N/A | User did not ask for branch/commit/PR. |
| Release artifact decision | N/A | No package/public API release artifact changed. |
| Browser tool decision for browser surface | N/A | No browser UI changed. |
| PR expectation decision | N/A | User asked to implement locally, not open a PR. |
| Tracker sync expectation decision | N/A | No issue tracker target. |
| Agent-native pack selected | yes | Used agent-native pack because .agents/** workflow behavior changed. |
| Agent-facing action surface identified | yes | Agent-facing surfaces are slate-plan and slate-patch generated skill text. |
| Source rule versus generated mirror boundary identified | yes | Edited .agents/rules/*.mdc; pnpm install regenerated .agents/skills/**/SKILL.md. |
agent-native-reviewer loaded or waiver recorded | yes | Loaded reviewer and applied as a focused parity check for generated skill discoverability. |
Work Checklist:
<video-transcripts> XML, or marked N/A with reason..agents/**, .claude/**,
.codex/**, skills, hooks, commands, prompts, or user-action tooling..agents/rules/** changed, or N/A reason is recorded.Completion Gates:
| Gate | Applies | Required action | Evidence |
|---|---|---|---|
| Named verification threshold | yes | Run source audit, generated-skill audit, benchmark health, and plan check | Source audit and generated skill greps passed; health command passed. |
| Bug reproduced before fix | N/A | Record failing test/repro or N/A with reason | No bug fix; workflow rule update only. |
| Targeted behavior verification | yes | Verify changed agent behavior is present in generated skill text | rg confirmed Evidence Kit gates in both generated Slate skills. |
| TypeScript or typed config changed | N/A | Run relevant typecheck | No TypeScript or typed config changed. |
| Package exports or file layout changed | N/A | Run pnpm brl before final verification and keep generated barrel updates | No package exports or file layout changed. |
| Package manifests, lockfile, or install graph changed | N/A | Run pnpm install and relevant package checks | pnpm install was required for skill sync, not package graph changes. |
| Agent rules or skills changed | yes | Run pnpm install and verify generated skill sync | pnpm install ran Skiller successfully; generated skills contain Evidence Kit text. |
| Workspace authority proof | yes | Run verification in the owning repo/package/app/route/tool and record cwd | Rule sync and health proof ran from /Users/zbeyens/git/plate-2, the owning workflow repo. |
| Browser surface changed | N/A | Capture Browser Use proof or record explicit waiver/blocker | No browser surface changed. |
| Browser final proof | N/A | Attach screenshot or exact browser verification caveat when browser proof applies | No browser proof applies. |
| CI-controlled template output changed | N/A | Restore generated template output or record why intentionally kept | No template output changed. |
| Package behavior or public API changed | N/A | Add a changeset or record why no changeset applies | Agent workflow docs only; no package release. |
| Registry-only component work changed | N/A | Update docs/components/changelog.mdx or record N/A | No registry component work. |
| Docs or content changed | yes | Verify source-backed claims and generated output | Plan and agent docs cite existing benchmark paths/scripts verified by rg and npm run bench:editor:health. |
| High-risk mini gate | yes | Record failure mode, proof plan, and chosen boundary | Failure mode was CLI-only benchmark workflow; fix is source-rule gate plus generated skill verification. |
| Agent-native review for agent/tooling changes | yes | Load reviewer and close accepted/actionable findings | Reviewer loaded; focused review found the changed agent action discoverable in generated skill text. |
| Local install corruption suspected | N/A | Run pnpm run reinstall once, rerun exact failure, or record N/A | No install-corruption signal. |
| Autoreview for non-trivial implementation changes | N/A | Load autoreview or record N/A | No runtime implementation patch; source-rule wording is covered by direct audit. |
| PR create or update | N/A | Run check before PR work and sync PR body to final handoff | User did not ask for PR. |
| PR proof image hosting | N/A | Replace local image paths with hosted GitHub URLs or record N/A | No PR/browser proof images. |
| Tracker sync-back | N/A | Post concise issue/Linear sync after PR exists, or record N/A/blocker | No tracker target. |
| Final handoff contract | yes | Fill exact outcome/caveats/verification content or N/A reason | Final handoff fields below filled. |
| Final lint | yes | Run pnpm lint:fix or scoped equivalent | Scoped Biome check attempted; repo config ignored these markdown/agent paths, so no formatting owner applies. |
| Goal plan complete | yes | Run node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/2026-05-28-wire-evidence-kit-into-slate-skills.md | To run after this file is closed. |
| Agent source / generated sync | yes | Run pnpm install when .agents/rules/** changed and verify generated mirrors | pnpm install completed Skiller apply; generated skills verified with rg. |
| Agent action discoverability | yes | Source-audit the skill/rule path an agent will read | Generated slate-plan and slate-patch skills expose the Evidence Kit gates. |
| Agent-native review | yes | Load reviewer and close accepted findings, or record N/A | Loaded reviewer; no actionable parity findings after generated skill audit. |
Phase / pass table:
| Phase | Status | Evidence | Next |
|---|---|---|---|
| Intake and source read | complete | Read skills, source rules, root benchmark scripts, and memory registry hit. | implementation complete |
| Implementation | complete | Patched .agents/rules/slate-plan.mdc and .agents/rules/slate-patch.mdc. | verification complete |
| Verification | complete | pnpm install, generated skill rg, and npm run bench:editor:health passed. | closeout complete |
| PR / tracker sync | skipped | No PR/tracker requested. | final response |
| Closeout | complete | This plan records final evidence and will pass check-complete. | final response |
Findings:
slate-plan needed a planning/scoring gate that maps benchmark-sensitive
claims to registered artifacts, gaps, candidates, or non-goals.slate-patch needed a post-proof conditional sync step so perf-sensitive
bug fixes refresh or route benchmark evidence without taxing tiny fixes.Decisions and tradeoffs:
plate-2 control-plane evidence. .tmp/slate-v2 remains
the only proof source for Slate v2 runtime/browser/package behavior.slate-patch conditional, not unconditional, to avoid turning
every small bug fix into benchmark ceremony.Implementation notes:
pnpm install so Skiller regenerated the generated skills.Review fixes:
Error attempts:
| Error / failed attempt | Count | Next different move | Resolution |
|---|---|---|---|
| None yet | 0 |
Verification evidence:
pnpm install completed Skiller apply successfully.rg -n "Evidence Kit|Control-Plane|bench:editor:refresh|benchmark-registry|benchmark-health" .agents/skills/slate-plan/SKILL.md .agents/skills/slate-patch/SKILL.md found the generated workflow gates.npm run bench:editor:health wrote benchmark-health-latest.json and reported active=23 rows=904 nextActions=10.pnpm exec biome check ... --fix checked 0 files because repo Biome config ignores these markdown/agent paths; no scoped formatter owns them.Final handoff contract:
SKILL.md directly would be
overwritten.pnpm install, generated skill grep, benchmark health command.Final handoff / sync:
Timeline:
pnpm install regenerated generated skills.Reboot status:
| Question | Answer |
|---|---|
| Where am I? | Closeout complete |
| Where am I going? | Final response |
| What is the goal? | Make Evidence Kit an enforced Slate agent workflow gate |
| What have I learned? | Source rules lacked an Evidence Kit gate; generated skills now expose it |
| What have I done? | Updated source rules, regenerated skills, verified generated text and benchmark health |
Open risks: