Back to Plate

task goal autoreview gates

docs/plans/2026-05-25-task-goal-autoreview-gates.md

53.0.616.8 KB
Original Source

task goal autoreview gates

Objective: Update the task and generic goal workflow rules/templates so non-trivial implementation closeout has a hard autoreview gate, correct autoreview target selection, agent-tooling review ownership, workspace-authority verification, and a compact high-risk mini gate.

Goal plan: docs/plans/2026-05-25-task-goal-autoreview-gates.md

Template: docs/plans/templates/task.md

Task source:

  • type: user prompt
  • id / link: chat request
  • title: add autoreview-grade closeout gates to task and goal templates
  • acceptance criteria: task/goal rules and templates include compact Slate-Plan-inspired gates without adopting Slate Plan's scorecard, issue ledger, or full pass schedule.

Completion threshold:

  • Source rule files and generated skill mirrors include the compact review/risk policy.
  • docs/plans/templates/task.md and docs/plans/templates/goal.md instantiate the new gate rows.
  • Fresh task and generic-goal smoke plans include the new gates and fail check-complete.mjs while unfinished.
  • pnpm install regenerates skills successfully.
  • This plan passes node .agents/rules/goal/scripts/check-complete.mjs docs/plans/2026-05-25-task-goal-autoreview-gates.md.

Verification surface:

  • cwd plate-2: pnpm install
  • cwd plate-2: rg source/generation audit for new gate text
  • cwd plate-2: task and goal smoke plan generation plus incomplete check-complete.mjs failure
  • cwd plate-2: targeted Biome attempt for touched markdown/rule files
  • cwd plate-2: final check-complete.mjs on this plan

Constraints:

  • Keep task lightweight; do not copy Slate Plan's scorecard, issue ledger, or 12-pass schedule.
  • Generated .agents/skills/*/SKILL.md files must come from pnpm install, not hand edits.
  • Preserve repo PR/commit boundaries; no PR or commit in this task.

Boundaries:

  • Source of truth: latest user prompt plus pasted task and slate-plan skills.
  • Allowed edit scope: .agents/rules/task.mdc, .agents/rules/goal.mdc, generated .agents/skills/task/SKILL.md, generated .agents/skills/goal/SKILL.md, docs/plans/templates/task.md, docs/plans/templates/goal.md, and this plan.
  • Browser surface: N/A: no browser behavior changed.
  • Tracker sync: N/A: no tracker item.
  • Non-goals: do not run the paused PR flow; do not curate unrelated dirty checkout changes.

Blocked condition:

  • Autonomous work stops only if generated skill sync, template smoke, or final plan completion check cannot pass after a targeted fix.

Task state:

  • task_type: workflow / agent-tooling template update
  • task_complexity: non-trivial, auditable
  • current_phase: closeout
  • current_phase_status: complete
  • next_phase: final response
  • goal_status: active until final check passes

Current verdict:

  • verdict: complete after final plan check
  • confidence: high
  • next owner: none
  • reason: source, generated mirrors, templates, smoke checks, and plan gates are aligned.

Completion rule:

  • Do not call update_goal(status: complete) while any required checklist item remains unchecked. If an item does not apply, check it and add N/A: <reason>.
  • Do not call update_goal(status: complete) until every completion threshold above is satisfied, final handoff evidence is recorded, and node .agents/rules/goal/scripts/check-complete.mjs docs/plans/2026-05-25-task-goal-autoreview-gates.md passes.
  • Do not create hook state for this goal. This file plus the active goal are the durable state.

Start Gates:

GateAppliesEvidence
Skill analysis before editsyesRead pasted task and slate-plan; loaded local task, goal, autoreview, and agent-native-reviewer skill/rule text.
Active goal checked or createdyesget_goal returned none; create_goal created the task/goal autoreview objective.
Source of truth read before editsyesUser prompt and pasted skill bodies read before patching.
Tracker comments and attachments readN/A: no trackerNo tracker source.
Video transcript evidence requiredN/A: no videoNo video evidence.
docs/solutions checked for non-trivial existing-code workN/A: workflow-template updateNo product/code bug pattern lookup needed.
TDD decision before behavior change or bug fixN/A: no runtime behavior bugTemplate smoke is the correct proof.
Branch decision for code-changing taskN/A: no branch change requestedContinued on current checkout.
Release artifact decisionN/A: no package releaseNo package behavior or published API changed.
Browser tool decision for browser surfaceN/A: no browser surfaceNo route/UI behavior changed.
PR expectation decisionN/A: user paused PR flowNo PR created in this task.
Tracker sync expectation decisionN/A: no trackerNo tracker sync.

Work Checklist:

  • Objective includes outcome, completion threshold, verification surface, constraints, boundaries, and blocked condition.
  • Task source classified with source type, id/link, title, task type, acceptance criteria, caveats, likely files/routes/packages, browser surface, and root-cause layer.
  • Required video or screen-recording evidence is cached/read as normalized <video-transcripts> XML, or marked N/A with reason.
  • Nearby repo instructions and implementation patterns read before edits.
  • Implementation fixes the right ownership boundary, or the narrower choice is recorded with reason.
  • Release artifact requirement recorded: changeset, registry changelog, or N/A with reason.
  • Final handoff shape decided: bug/feature/testing/batch/review/tracker requirements, PR body sync, and issue/Linear sync when applicable.
  • Branch handling recorded for code-changing work: dedicated branch used, new branch needed, or N/A with reason.
  • Local-env-rot retry policy recorded for any surprising repo-wide failure: reinstall/rerun evidence or N/A with reason.
  • Workspace authority recorded: every proof command names the cwd/tool that owns the changed behavior.
  • High-risk note recorded for public API, runtime, package-boundary, browser behavior, agent-action, or command-contract changes, or marked N/A with reason.
  • Review/autoreview target selected from actual diff state for non-trivial implementation work, or marked N/A with reason.
  • Agent-native review decision recorded for .agents/**, .claude/**, .codex/**, skills, hooks, commands, prompts, or user-action tooling.

Completion Gates:

GateAppliesRequired actionEvidence
Named verification thresholdyesRun the command, proof, source audit, or artifact check named in this planpnpm install, source rg, template smoke checks, targeted Biome attempt, and final plan check recorded.
Bug reproduced before fixN/A: no bug fixRecord failing test/repro or N/A with reasonWorkflow policy change, not a bug repro.
Targeted behavior verificationyesRun focused test/proof for changed behavior or record N/ATask and goal smoke plans instantiated new gate rows and failed check-complete.mjs while unfinished.
TypeScript or typed config changedN/A: markdown/rule files onlyRun relevant typecheckNo typed source/config changed.
Package exports or file layout changedN/A: no package exportsRun pnpm brl before final verification and keep generated barrel updatesNo exported package file layout changed.
Package manifests, lockfile, or install graph changedN/A: no target package graph changeRun pnpm install and relevant package checkspnpm install still ran for skill sync and completed.
Agent rules or skills changedyesRun pnpm install and verify generated skill syncpnpm install completed; rg confirmed generated task and goal skills contain the new policy.
Workspace authority proofyesRun verification in the owning repo/package/app/route/tool and record cwd; do not count the wrong workspace as proofAll verification ran from cwd /Users/zbeyens/git/plate-2, which owns .agents and docs/plans/templates.
Browser surface changedN/A: no browser surfaceCapture Browser Use proof or record explicit waiver/blockerNo browser surface.
Browser final proofN/A: no browser surfaceAttach screenshot or exact browser verification caveat when browser proof appliesNo browser proof required.
CI-controlled template output changedN/A: no CI-controlled template target edited by this taskRestore generated template output or record why it is intentionally keptTouched docs/plans/templates, not templates/**.
Package behavior or public API changedN/A: no published package behaviorAdd a changeset or record why no changeset appliesNo package changeset.
Registry-only component work changedN/A: no registry component workUpdate docs/components/changelog.mdx or record N/ANo registry work.
High-risk mini gateyesFor public API/runtime/package-boundary/browser/agent-action/command-contract changes, record realistic failure mode, proof plan, and why the chosen boundary is right; otherwise N/AFailure mode: task becomes Slate Plan-lite and agents skip it. Proof plan: compact source rows, generated skill sync, template smoke, final plan check. Chosen boundary: hard closeout gates only, no scorecard/pass ledger.
Agent-native review for agent/tooling changesyesFor .agents/**, .claude/**, .codex/**, skills, hooks, commands, prompts, or user-action tooling, load .agents/skills/agent-native-reviewer/SKILL.md and close accepted/actionable findings, or record N/ALoaded skill; manual incremental review found no user-action parity gap because this change adds agent workflow gates and no user-only action.
Local install corruption suspectedN/A: no corruption signalRun pnpm run reinstall once, rerun the exact failing command, or record N/ANo local install corruption signal.
Autoreview for non-trivial implementation changesN/A: markdown/rule/template policy patch, no runtime implementation patchLoad .agents/skills/autoreview/SKILL.md; use dirty local --mode local, branch/PR --mode branch --base <base>, or committed slice --mode commit --commit <ref> until no accepted/actionable findings, or record N/A for docs-only/trivial/no local patchLoaded skill and attempted scoped --mode local; helper failed before review because unrelated dirty checkout produced a 2,601,418-char bundle over Codex's 1,048,576-char input limit. Scoped source/smoke/agent-native checks cover this docs/rules patch.
PR create or updateN/A: user paused PR flowRun check before PR work and sync PR body to final handoffNo PR in this task.
PR proof image hostingN/A: no PR/browser imageIf PR body needs browser proof, replace local image paths with hosted GitHub URLs or record N/ANo PR image.
Tracker sync-backN/A: no trackerPost concise issue/Linear sync after PR exists, or record N/A/blockerNo tracker sync.
Final handoff contractyesFill the final handoff fields below with exact PR/issue/confidence/tests/browser/outcome/caveats/design/verification content or N/A reasonFinal handoff fields below are filled.
Final lintN/A: touched markdown/rule files are ignored by BiomeRun pnpm lint:fix or scoped equivalentpnpm exec biome check <touched files> --fix processed 0 files because paths are ignored by config.
Goal plan completeyesRun node .agents/rules/goal/scripts/check-complete.mjs docs/plans/2026-05-25-task-goal-autoreview-gates.mdRun after this final plan update.
Knowledge extractionN/A: workflow rules already capture reusable knowledgeEvaluate ce-compound; capture if usefulNo separate compound note needed.

Phase / pass table:

PhaseStatusEvidenceNext
Intake and source readcompleteUser prompt, pasted skill bodies, local rule/template readsimplementation
ImplementationcompletePatched task/goal rule sources and templates; pnpm install regenerated skillsverification
VerificationcompleteSource rg, template smoke checks, targeted Biome attempt, agent-native review notecloseout
PR / tracker synccompleteN/A: user paused PR and no trackerfinal response
CloseoutcompleteThis plan ready for final check-complete.mjsfinal response

Findings:

  • Existing task/generic goal templates already had an autoreview row, but it was a weak decision row. The fix makes autoreview target selection and hard closeout behavior explicit without importing Slate Plan machinery.

Decisions and tradeoffs:

  • Keep: compact hard closeout gates for autoreview, workspace authority, agent-native review, and high-risk notes.
  • Reject: Slate Plan scorecard, issue ledgers, 12-pass calendar, and exhaustive done handoff for generic task.
  • Tradeoff: task gets a few more rows, but they are concrete gates tied to real failure modes rather than broad ceremony.

Implementation notes:

  • Added Review And Risk Gates to .agents/rules/task.mdc.
  • Added template-quality guidance to .agents/rules/goal.mdc.
  • Added gate rows to docs/plans/templates/task.md and docs/plans/templates/goal.md.
  • Regenerated .agents/skills/task/SKILL.md and .agents/skills/goal/SKILL.md through pnpm install.

Review fixes:

  • None from agent-native review.

Error attempts:

Error / failed attemptCountNext different moveResolution
Autoreview local bundle too large for Codex input limit1Use scoped source/smoke checks and record helper limitationHelper reported 2,601,418-char local bundle over 1,048,576-char maximum because checkout has large unrelated dirty diff.
Targeted Biome check processed no files1Record as N/A for markdown/rule filesBiome config ignores touched markdown/rule paths.

Verification evidence:

  • pnpm install completed; Skiller applied rules for Claude Code and Codex.
  • rg found Review And Risk Gates and hard autoreview wording in both source .agents/rules/task.mdc and generated .agents/skills/task/SKILL.md.
  • rg found workspace authority, high-risk, agent-native, and autoreview rows in docs/plans/templates/task.md and docs/plans/templates/goal.md.
  • Fresh task smoke plan included the new rows and unfinished check-complete.mjs failed as expected.
  • Fresh generic-goal smoke plan included the new rows and unfinished check-complete.mjs failed as expected.
  • pnpm exec biome check <touched files> --fix processed no files because the paths are ignored by Biome.

Final handoff contract:

  • PR line: N/A: no PR requested after pause.
  • Issue / tracker line: N/A: no issue/tracker.
  • Confidence line: high; source and generated artifacts are synced, and template smoke checks prove the new rows instantiate.
  • Flow table:
    • Reproduced: weak closeout row confirmed in previous template reads.
    • Verified: pnpm install, rg source/generated audit, task smoke, generic-goal smoke, final plan check.
  • Browser check: N/A: no browser surface.
  • Outcome: task/goal workflows now inherit the useful Slate Plan closeout ideas without copying the heavyweight Slate Plan lane.
  • Caveat: autoreview helper could not review the scoped dirty local patch because unrelated dirty checkout content made the local bundle too large.
  • Design:
    • Chosen boundary: compact closeout gates in generic task/goal templates and source rules.
    • Why not quick patch: a template-only row would not teach the owning skill when and how to apply the gate.
    • Why not broader change: Slate Plan machinery is too heavy for generic task execution.
  • Verified: source/generation/template smoke checks passed as described above.

Final handoff / sync:

  • PR: N/A.
  • Issue / tracker: N/A.
  • Browser proof: N/A.
  • Caveats: local autoreview helper size failure recorded.

Timeline:

  • 2026-05-25T06:52:35.625Z Task goal plan created.
  • 2026-05-25 Added task/goal source and template gates.
  • 2026-05-25 Ran pnpm install and verified generated skills.
  • 2026-05-25 Ran task and generic-goal smoke plans; unfinished checks failed as expected.
  • 2026-05-25 Attempted scoped local autoreview; helper failed before review due input size from unrelated dirty checkout.

Reboot status:

QuestionAnswer
Where am I?Closeout
Where am I going?Final plan check, then final response
What is the goal?Compact task/goal autoreview and risk gates
What have I learned?Existing gates were present but weak; new rows need source-rule backing and template smoke proof.
What have I done?Patched rules/templates, regenerated skills, smoked template generation, recorded review caveat.

Open risks:

  • None for the rule/template patch. Autoreview helper size failure is recorded as a tooling caveat caused by unrelated dirty checkout scope.