Back to Plate

Autogoal

.agents/skills/autogoal/SKILL.md

53.0.639.6 KB
Original Source

Autogoal

Use this when the user asks for a durable objective, long-running autonomous work, goal setup, or when a governing repo skill requires goal setup before work starts.

This skill turns a vague "keep going" instruction into a thread-scoped completion contract: what should be true, how it is verified, what must not change, and when Codex should stop.

Core Take

A normal prompt says: do the next thing.

A goal says: keep working until this outcome is true, or until the evidence shows a real blocker.

Goals are for work where the next move depends on what Codex learns along the way: debugging, migrations, flaky tests, benchmark tuning, deep research, large refactors, prototypes, browser-proof loops, and pass-gated plans.

Goals are not a permission slip to wander. They are a scoped, evidence-checked contract.

No measurable outcome, no goal. A goal must have a verification surface and a completion threshold before create_goal is called. Prefer numbers: score, count, latency, coverage, pass count, failing-to-passing repro count, issue rows, or explicit command success. When a numeric target does not fit, use a binary artifact checklist that can be audited from files, commands, screenshots, browser proof, or source-backed citations.

Universal Boundary

autogoal is the goal lifecycle kernel. It owns:

  • objective shape
  • measurable completion thresholds
  • evidence standards
  • active goal conflict handling
  • durable plan state
  • blocker and completion rules
  • repair routing when a goal-backed workflow misses expectations

It does not own project policy. Keep repo commands, package managers, browser tools, release rules, PR policy, scorecards, issue ledgers, and lane-specific pass schedules in derived skills or docs/plans/templates/<template>.md.

Derived skills may be stricter than autogoal; they should not duplicate the goal lifecycle. autogoal says how work remains honest. The derived skill says what the lane actually requires.

Template Composition

Goal plans are composable, but only through static materialization.

The model is:

  1. one active goal
  2. one concrete docs/plans plan file
  3. one primary template
  4. optional materialized packs

The primary template is chosen by dominant risk: task for normal execution, docs for docs-dominant work, major-task for heavyweight architecture or proposal work, slate-plan for Slate plan lanes, and so on.

Packs are chosen by touched surface. They add recurring gates without becoming parents:

  • docs: docs are touched but not the dominant deliverable
  • agent-native: .agents/**, .claude/**, .codex/**, skills, hooks, commands, prompts, or user-action tooling changed
  • browser: real browser, route, UI, console, network, or interaction proof is required
  • package-api: package exports, public API, release artifacts, package boundaries, or package-level checks changed

Core execution and review gates belong in the primary template. Packs are only for optional touched surfaces that would otherwise be absent from that template.

Do not create runtime inheritance between templates. The helper copies pack rows into the generated plan's Start Gates, Work Checklist, and Completion Gates. After creation, the generated plan is the truth; the checker validates that materialized plan only.

The generated plan is the dedicated plan shell. Fill that exact file immediately after generation: replace placeholders, resolve every gate row, and mark non-applicable generated rows as N/A: <reason> with evidence. Do not delete, wholesale replace, or hand-narrow the generated plan into an ad hoc smaller plan after durable work has started. If the selected template is plainly wrong and no substantive work has started, regenerate once with the right template and record why. If work has already started, keep the generated plan and close it honestly.

Use packs like this:

bash
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
  --template task \
  --with docs \
  --with agent-native \
  --title "<short task title>"

Examples:

  • docs-only work: --template docs
  • normal code task that also changes docs: --template task --with docs
  • agent workflow task: --template task --with agent-native
  • browser behavior task: --template task --with browser
  • published package API task: --template task --with package-api
  • major architecture task: --template major-task
  • major architecture task that also changes docs and package API: --template major-task --with docs --with package-api

If two packs add related gates, keep both when they protect different failure modes. If they duplicate exactly the same proof, keep the more specific pack and record the other as N/A in the plan.

Proportionality Dial

Classify goal-backed work before creating or updating a plan:

  • micro: one narrow, auditable outcome; no cross-file state; no meaningful continuation loop. Use a tiny plan only when a repo rule requires it, or record the audit surface directly in the final response.
  • normal: multi-step work with concrete evidence and likely continuation. Use the appropriate docs/plans template and close all relevant gates.
  • major: architecture, migrations, benchmarks, framework comparisons, broad refactors, pass-gated lanes, or public API/runtime risk. Use a derived skill or project template with phases, risk rows, review gates, and explicit closure criteria.

Do not inflate a micro work item into a ceremony pile. Do not shrink a major work item into a checklist that cannot catch real risk.

Goal Flow Modes

Every goal-backed workflow chooses exactly one flow mode before durable work starts. The mode controls the human review boundary; it does not weaken the evidence or completion rules.

1. One-Shot Execution

Use this for issue-like or work-item-like work where the agent is expected to read the source, derive the local plan, implement, verify, and hand off the result without stopping for plan approval.

Rules:

  • Create or continue a goal when the work is non-trivial and auditable.
  • Create a plan when durable state is useful or required by the caller.
  • The plan is an execution ledger, not a proposal waiting for acceptance.
  • Human review happens at the final handoff or explicit user interruption.
  • Do not pause merely because the plan has not been reviewed. Pause only for a real blocker, unsafe ambiguity, or a user decision that changes scope.

2. Agent-Led Plan Hardening

Use this when the requested output is a plan and the user wants the agent to drive toward the best plan with minimal human interruption.

Rules:

  • The agent owns the review loop: research, compare options, pressure-test, revise, and improve the plan until the confidence threshold is met.
  • Ask the user only for decisions that materially change intent, boundaries, risk tolerance, or acceptance criteria.
  • Record each self-review pass and plan delta as evidence.
  • Stop for one major user review when the plan reaches the stated readiness threshold.
  • Do not execute implementation under the planning goal unless the caller's governing workflow explicitly says planning and execution are the same goal.

3. Collaborative Planning

Use this when the user and agent are intentionally shaping the plan together before execution.

Rules:

  • The goal outcome is an accepted plan, not implementation.
  • Ask focused questions when user judgment changes the plan.
  • Keep options, tradeoffs, rejected alternatives, and open decisions visible in the plan.
  • Continue revising until the user accepts the plan or a blocker remains.
  • Execution starts only after explicit acceptance or a new instruction that changes the flow mode.

Flow-mode selection belongs in the derived skill or the instantiated plan when the caller knows it. If no caller specifies a mode, default to one-shot execution for implementation tasks, agent-led plan hardening for autonomous planning/review requests, and collaborative planning when the user is actively brainstorming or asking for plan acceptance before work.

Use When

  • The user asks to set a goal or asks Codex to keep working until a verifiable end state.
  • A repo skill says to use create_goal or goal setup.
  • Work is long-running, iterative, and has an auditable success condition.
  • The path is uncertain but the finish line is auditable.
  • The user would otherwise keep saying: "continue", "try the next fix", "rerun the benchmark", "keep going until it works".
  • A pass-gated lane needs one durable objective with the pass schedule and closure gates inside it.
  • The user says autogoal repair <expectation> after any goal-backed workflow missed their expectation, and they want the owning rule/template repaired for future runs.

Do Not Use When

  • The user asks a one-off question or wants one short answer.
  • The edit is tiny and no continuation loop is useful.
  • The finish line is vague: "make it better", "improve performance", "clean this up" without a verification surface.
  • The user explicitly declined goal setup or asked not to use goal tools.
  • The only possible next move requires user input.
  • Creating a goal would hide uncertainty instead of naming it.
  • The user only wants the current artifact fixed once. Repair mode is for recurring workflow expectation misses, not every ordinary bug in a plan file.

Tool Contract

This is agent-native. Use the goal tools directly when available:

  • get_goal to inspect the current thread goal.
  • create_goal to start a new active goal.
  • update_goal(status: complete) only when the objective is genuinely met.
  • update_goal(status: blocked) only when no autonomous progress remains and the same blocker has recurred enough to satisfy the tool contract.

There can be only one active goal per thread. Repeated create_goal calls fail while a goal exists. Always call get_goal first; call create_goal only when it returns no goal; use update_goal to complete or block the active goal.

Active Goal Conflict Protocol

When get_goal returns a goal, classify it before touching durable state:

  • same: the existing goal already describes the current requested end state. Continue under it and keep its plan current.
  • same but stale plan: the goal is right but the plan is stale. Repair the plan first, then continue.
  • newer user correction: the latest user message narrows, reverses, or corrects the goal. Record the correction in the plan, follow the newest instruction, and do not call the old objective complete unless it is actually true.
  • different objective: the active goal is unrelated. Do not hijack it. If no lifecycle tool can pause, resume, cancel, or replace it, say so briefly and proceed only with degraded plan state when the user explicitly says to go.
  • paused or externally controlled: do not fake completion or blocked status to escape the tool. Continue only if the latest user instruction clearly authorizes the new work, and record the mismatch in the plan.

Never mark a goal complete because the user changed their mind. Completion means the objective is true. A correction changes the work path; it does not retroactively prove the old objective.

Do not invent a goal state file when a goal tool is available. If goal tools are not available, record degraded control state in the active plan only when the repo workflow requires that fallback; otherwise state that goal tools are not available and continue with the nearest safe workflow.

Goal Anatomy

A strong goal defines eight things:

  1. Flow mode: one-shot execution, agent-led plan hardening, or collaborative planning.
  2. Outcome: what must be true when done.
  3. Completion threshold: the number, pass/fail command, artifact checklist, or explicit acceptance rows that prove done.
  4. Verification surface: tests, benchmarks, logs, browser proof, generated artifact, report, issue comment, or source-backed audit.
  5. Constraints: what must not regress.
  6. Boundaries: files, packages, repos, tools, data, routes, issue scope, or product surfaces Codex may or may not touch.
  7. Iteration policy: how to choose the next move after each attempt.
  8. Blocked stop condition: when to stop and report the blocker, evidence, and next input needed.

Use this objective shape:

txt
<desired end state>, complete only when <quantitative or auditable threshold>,
verified by <specific evidence>, and when the active goal plan passes
`node .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path>`, while
preserving <constraints>. Use flow mode <one-shot execution | agent-led plan
hardening | collaborative planning> and <allowed inputs/tools/boundaries>.
Maintain goal plan <docs/plans/path>. Between iterations, <progress log and
next-move policy>. If blocked or no valid path remains, report <attempts,
evidence, blocker, and needed input>.

Measurable Outcome Gate

Before calling create_goal, rewrite vague objectives into measurable ones.

Required:

  • a specific done state
  • a flow mode
  • a verification surface
  • a completion threshold
  • a constraint list or explicit no extra constraints
  • a blocked condition

Quantitative examples:

  • p95 < 120 ms
  • score >= 0.92 and no dimension below 0.85
  • 0 accepted review findings
  • all 12 pass rows complete or skipped with evidence
  • focused repro fails before fix and passes 5 consecutive runs after
  • no stale symbol matches from rg

Auditable non-numeric examples:

  • named file exists with required sections
  • named issue rows moved to fixed/improved/related/not-claimed
  • named browser route has screenshot proof and no console errors
  • named API examples compile and match the accepted public shape

Reject or rewrite:

  • "make better"
  • "clean up"
  • "finish"
  • "absolute best" without score rows, pass gates, or evidence
  • "review and decide" without an artifact and acceptance criteria

Completion Gate Policy

Do not make check-complete.mjs the whole goal. That only proves the plan looks closed, not that the work is true.

Use the hybrid rule for every goal:

  1. The goal objective names the real outcome, threshold, verification surface, constraints, boundaries, and blocked condition.
  2. The docs/plans goal plan records the fresh evidence for that threshold.
  3. node .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path> is the final mechanical gate before update_goal(status: complete).

The checker validates that the goal plan has no unchecked required checklist items, no unresolved gate rows, no open phase/pass rows, concrete verification evidence, current reboot status, and recorded risks. It does not replace tests, browser proof, source audits, benchmark output, or other named verification evidence.

Evidence Type Contract

Every completion proof should fit at least one evidence type:

  • command: exact command, cwd, and pass/fail result.
  • source-audit: exact files or search query proving a static property.
  • browser: route, interaction, screenshot or console/network caveat.
  • artifact: generated file, report, table, PR body, issue comment, or exported asset.
  • review: reviewer/tool used, accepted findings, fixes, and remaining rejected findings with reasons.
  • external-source: cited URL, issue, paper, docs page, or connected app result used as authority.
  • N/A:<reason>: why a recurring gate does not apply.

Evidence must name the owning workspace, package, app, route, or tool when that ownership matters. A root-level check cannot prove a sibling repo, app route, browser surface, or external tracker unless the plan explains why it is the owning surface.

Repair Mode

Trigger this mode when the arguments start with:

txt
repair <expectation>

Repair mode is self-improvement with a leash. It converts a concrete expectation miss from a goal-backed run into the smallest durable change to the owning rule, template, helper, or active plan.

Use it for misses like:

  • the generated goal plan lacked a gate the user expected
  • a derived skill used the wrong template or completion rule
  • the skill completed too early or kept running past the intended boundary
  • the final handoff omitted evidence the user expects every time
  • the workflow forced too much ceremony or skipped a required review/proof step

Do not use it for:

  • one-off wording preferences in a single plan
  • a product/runtime bug that belongs in implementation code
  • broad "make all skills better" edits
  • rewriting generated .agents/skills/*/SKILL.md by hand

Target selection order:

  1. If the prompt names a plan path, read that plan first. Use its Template:, skill name, phase table, and completion gates to identify the owner.
  2. If the prompt names a skill, read .agents/skills/<skill>/SKILL.md first, then docs/plans/templates/<skill>.md when it exists.
  3. If there is an active goal, read its plan path from the objective or current plan before editing anything.
  4. If the miss belongs to every goal, target .agents/rules/autogoal.mdc and docs/plans/templates/goal.md.
  5. If ownership is still unclear after source reads, ask one short targeting question instead of patching multiple templates.

Repair scope matrix:

MissPrimary repair owner
Current plan has wrong status, row, evidence, or handoff fieldsactive docs/plans/* plan
Future generated plans need a recurring section, gate, row, or placeholderdocs/plans/templates/<owner>.md
Agent chose the wrong workflow, target, proof standard, or completion rule.agents/rules/<owner>.mdc
Prose keeps failing and the miss is mechanically checkable.agents/rules/autogoal/scripts/* plus focused script proof
Derived skill adds lane-specific ceremony or policyderived skill rule/template, not autogoal
Universal lifecycle rule is missing across goal-backed work.agents/rules/autogoal.mdc

Repair workflow:

  1. Restate the expectation in one sentence.

  2. Identify the miss with source evidence: plan row, final response shape, missing gate, bad status, wrong template, or stale generated skill.

  3. Pick exactly one primary owner. Patch secondary owners only when sync is required, such as source rule plus project template.

  4. Create a repair plan with:

    bash
    node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
      --template goal-repair \
      --title "<short repair title>"
    

    If a repair is truly trivial, record why no separate repair plan is needed.

  5. Patch source-of-truth files only. Never hand-edit generated .agents/skills/*/SKILL.md; after changing .agents/rules/**, run pnpm install.

  6. Prove the repair:

    • source audit with rg for the new rule/gate/wording
    • generated skill sync when .agents/rules/** changed
    • instantiate the repaired template or inspect it directly when a smoke plan would create noise
    • verify unfinished generated plans still fail check-complete.mjs
    • verify a completed plan can record the new expectation without editing the template again
  7. Final response says: expectation, repaired owner, verification, and any deliberate non-repair.

Safety rules:

  • One expectation should produce one narrow repair. Do not turn repair mode into a skill rewrite.
  • Do not weaken completion gates just because a past run was annoying. If the expectation conflicts with evidence safety, record the conflict and ask.
  • Prefer adding a missing row or decision rule over adding a new script. Add mechanical enforcement only when prose gates keep failing.
  • A derived skill may have stricter rules than autogoal. Repair the derived skill when the expectation is lane-specific; repair autogoal only when the expectation should apply across goal-backed work.
  • If an active goal is unrelated to the repair, do not hijack it. Ask whether to finish/block it first or run the repair after it is closed.

Derived Skill Contract

Any skill that requires or wraps autogoal should declare:

  • when it creates or continues a goal
  • which flow mode it uses by default, and how the user changes it
  • which docs/plans/templates/<template>.md it uses
  • which packs it applies by default, and which touched surfaces add more packs
  • extra start gates and completion gates it owns
  • evidence types it requires
  • final handoff shape
  • review or pressure lenses it adds
  • what remains delegated to autogoal
  • what it intentionally does not inherit from broader templates

Derived skills should route to autogoal for lifecycle mechanics instead of re-implementing plan creation, completion, blocked semantics, repair mode, or evidence closure.

Resume Protocol

After compaction, interruption, or a long pause:

  1. Read the latest user message first.
  2. Call get_goal when available.
  3. Re-read the active docs/plans path named by the goal, current workflow, or latest handoff.
  4. Find the latest verification evidence, open risk, and next owner.
  5. Continue from the newest user instruction, not from an older stale objective.
  6. Before final response, sanity-check that the answer matches the newest request and the current plan state.

If the active goal and newest request disagree, use the Active Goal Conflict Protocol before editing.

Start And Completion Gates

Project templates may define Start Gates: and Completion Gates: tables. These are template-owned audit surfaces for recurring project checks.

Keep this rule generic. Do not put project-specific commands, package-manager details, release rules, browser tooling, or repo policy in this file. Those rows belong in project-owned templates under docs/plans/templates/.

When present, gate tables must use markdown tables with these columns:

  • Gate
  • Applies
  • Evidence

They may include extra columns such as Required action. The checker treats any cell in a gate row as unresolved when it is blank, pending, TODO, or TBD.

Gate closure rules:

  • Applies must be resolved before completion.
  • yes means the evidence cell names the command, artifact, proof, source audit, or concrete result.
  • no or N/A: <reason> means the evidence cell explains why the gate does not apply.
  • A completion gate row should stay unresolved until the action or reason is recorded.
  • check-complete.mjs enforces gate-row closure mechanically, but it does not know what project-specific commands mean.

Start Workflow

  1. Read the user's request and any named plan, issue, logs, route, test, or source-of-truth file.
  2. Inspect the current goal with get_goal when available.
  3. Select the flow mode: one-shot execution, agent-led plan hardening, or collaborative planning.
  4. Rewrite the desired objective until it has a measurable or auditable completion threshold.
  5. If no active goal exists and the user or governing skill asked for a goal, create one with create_goal.
  6. If an active goal already matches the desired end state, continue under it.
  7. If an active goal exists but points at a different objective, do not overwrite it. Resolve the current goal honestly before starting another one. If the tool does not allow that transition, report the mismatch and ask for the smallest decision needed. A governing lane goal may proceed only when it can honestly complete or fit within the current active goal.
  8. Create the docs/plans goal plan from the checklist template before substantive work.
  9. Fill the generated plan itself before substantive work: write the objective, threshold, verification surface, constraints, boundaries, blocked condition, flow mode, and goal plan path; resolve generated gates as yes/no/N/A instead of deleting or replacing the template output.
  10. Use that exact path for check-complete.mjs.
  11. Do not start durable work until the goal is set, verified as already matching, or the user explicitly resolves the missing-goal path.

Set the goal before mutable lane state when the workflow depends on a goal. For pass-gated planning or accepted-plan execution lanes, the goal is the first durable action after the minimum read needed to derive the objective.

Goal Plan

Every active goal gets one durable goal plan. It is a single markdown file that absorbs the useful file-planning parts: phases, findings, progress, decisions, failed attempts, verification, and reboot status.

Path:

txt
docs/plans/YYYY-MM-DD-<short-goal-slug>.md
docs/plans/<ticket>-<short-goal-slug>.md

Use the ticket-prefixed form for issue-backed work. Do not create task_plan.md, findings.md, progress.md, .planning/**, docs/goals/**, .tmp/goals/**, or hook state for goal work. Hooks are overkill. The active goal plus the docs/plans file are the durable state.

Create the goal plan with the source-owned helper whenever available:

bash
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
  --title "<short title>" \
  --template "<primary template name or path>" \
  --with "<optional pack name>"

The helper writes docs/plans/YYYY-MM-DD-<slug>.md or docs/plans/<ticket>-<slug>.md from a project-owned template. The helper lives under .agents/rules/autogoal/ because it is generic rule tooling; generated SKILL.md files are not edited by hand.

Do not pass objective, threshold, verification, constraints, boundaries, or blocked condition through CLI flags. The CLI only creates the static plan shell. After creation, edit the generated docs/plans file and write the active goal objective, completion threshold, verification surface, constraints, boundaries, blocked condition, and remaining goal-specific rows into the file.

Editing the generated file means filling and resolving that materialized shell, not replacing it with a hand-made mini-plan. Keep generated sections and rows unless the row is truly irrelevant, then mark it complete with N/A: <reason>. If a template choice is wrong before work starts, regenerate with the correct template and record the replacement. If any durable work has already started, do not swap the plan out from under the work; close the generated plan with honest evidence, N/A rows, or a blocker.

The default project template is generic:

txt
docs/plans/templates/goal.md

Project or skill-specific templates live beside it:

txt
docs/plans/templates/<template>.md

Reusable packs live under:

txt
docs/plans/templates/packs/<pack>.md

Use templates by passing the primary template name. Add packs for touched surfaces:

bash
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
  --template "<template-name>" \
  --with "<pack-name>" \
  --title "<short title>" \
  ...

Repeat --with for multiple packs, or pass a comma-separated list. The helper records Primary template: and Applied packs: in the generated plan and copies pack rows into the plan's existing gate/checklist sections.

docs/plans/templates holds reusable project templates. Direct files under docs/plans are instantiated runtime goal plans. Do not store goal templates or active goal state under docs/goals.

Create a new project-owned template by copying the generic template:

bash
node .agents/rules/autogoal/scripts/create-goal-template.mjs \
  --skill "<skill-name>"

Then edit the new docs/plans/templates/<skill-name>.md to add that skill or project lane's mandatory sections, checklist rows, phase schedule, evidence rows, and closure gates. Keep the generic goal template project-agnostic.

Template creation is not skill creation. Do not generate .agents/rules/*, .agents/skills/*, aliases, execution handoffs, hook state or compatibility bridges from this workflow. A project template is just a reusable static shell for a future docs/plans/* goal plan. The agent fills the real objective, threshold, verification surface, constraints, boundaries, and blocked condition inside the instantiated plan.

Before creating or updating a project template, define these inputs:

  • template name and owning skill or project lane
  • primary-template role and which packs should usually compose with it
  • display name and purpose
  • recurring failure mode the template prevents
  • use cases and non-use cases
  • allowed edit boundaries for plans created from it
  • required read-first sources and optional read-when-relevant sources
  • evidence sources and final verification surface
  • measurable score, count, pass/fail command, or artifact checklist threshold
  • required plan sections
  • required checklist rows, including skill analysis and final goal-plan check
  • phase or pass table, or an explicit reason the template needs no phases
  • completion gates and score caps when score is used
  • review or pressure lenses that must run before closeout
  • handoff, final response, and risk rows
  • blocked condition and what input would unblock it

If an input cannot be inferred from current project context, add a placeholder inside the template and label it as a generation gap. Ask the user only when the missing answer changes the template's purpose, safety model, or boundaries.

Template quality bar:

  • The template must be self-contained enough to create a useful goal plan from scratch. Do not require a sibling template to understand it.
  • Sibling templates may be used for sync review, not as hidden dependencies.
  • Packs may provide recurring touched-surface rows, but only after the helper materializes them into the generated plan. Do not rely on hidden pack state.
  • Domain facts must be placeholders or instructions unless live source proves them. Do not invent current-state, before/after, API, product, or workflow facts.
  • No template may let a goal finish from polished prose, score alone, or a completed phase table without fresh evidence.
  • Every required checklist item must map to evidence, an explicit N/A reason, or a blocker.
  • Every required section is either present in the template or omitted with a recorded reason.
  • Project templates that cover implementation work should include compact gates for review target selection, workspace-authority verification, specialized agent/tooling review when those surfaces change, and a high-risk note for public API, runtime, package-boundary, browser, agent-action, or command contract changes. Do not copy a major planning lane's scorecard, issue ledger, or full pass schedule into generic execution templates.
  • The template should prefer concrete commands, file paths, issue rows, browser routes, screenshots, benchmark names, or source-audit rows over vague "review" wording.
  • The generated plan remains the runtime truth. Do not put active goal state in docs/plans/templates.

Template sync review:

  • Instantiate the template once with create-goal-scratchpad.mjs or inspect the copied file directly when a smoke plan would create noise.
  • Verify the expected headings, checklist rows, phase/pass rows, completion gates, and blocker rows are present.
  • Verify a blank or unfinished instantiated plan fails check-complete.mjs.
  • Verify a completed plan can record the named evidence without editing the template itself.
  • After editing .agents/rules/autogoal.mdc, run pnpm install to regenerate generated skill files.

Create the plan before substantive edits. Update it after every meaningful decision, finding, tradeoff, failed attempt, review fix, verification run, or scope change. Re-read it before major decisions and after compaction or interruption.

Check the goal plan before completion:

bash
node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/<goal-plan>.md

This is the final mechanical gate, not a substitute for the named verification surface.

The goal-plan checklist is mandatory. Its first required item is skill analysis. Do not call update_goal(status: complete) while any required checklist item remains unchecked. If an item does not apply, check it and add N/A: <reason>.

Required goal-plan sections:

md
# <Goal title>

Objective:
<exact active goal objective>

Flow mode:
<one-shot execution | agent-led plan hardening | collaborative planning>

Goal plan:
<docs/plans/path>

Primary template:
<docs/plans/templates/name.md>

Applied packs:
- <pack or none>

Completion threshold:
- <quantitative or auditable done row>

Verification surface:
- <tests/artifacts/browser proof/source audit>

Constraints:
- <must preserve / must not touch>

Boundaries:
- <allowed files/packages/tools>

Blocked condition:
- <condition that stops autonomous work>

Start Gates:
| Gate | Applies | Evidence |

Work Checklist:
- [ ] Actual work item or pass-specific requirement with evidence.
- [ ] ...

Completion Gates:
| Gate | Applies | Required action | Evidence |

Phase / pass table:
| Phase | Status | Evidence | Next |

Findings:
- <research, source reads, browser/visual findings as data>

Timeline:
- <timestamp> <action/evidence>

Decisions and tradeoffs:
- <decision> -> <reason> -> <risk>

Review fixes:
- <finding> -> <accepted/rejected> -> <change or reason>

Error attempts:
| Error / failed attempt | Count | Next different move | Resolution |

Verification evidence:
- <command/artifact> -> <result>

Reboot status:
| Where am I? | Where am I going? | What is the goal? | What learned? | What done? |

Open risks:
- <risk or none>

Before update_goal(status: complete), the goal plan must include the final verification evidence, checked checklist, current reboot status, and any remaining risks.

Good Goal Examples

Performance:

txt
Reduce p95 checkout latency below 120 ms, complete only when the checkout
benchmark reports p95 < 120 ms and the correctness suite passes, while keeping
public API behavior unchanged. Use only checkout service code, benchmark
fixtures, and related tests. Maintain goal plan
`docs/plans/YYYY-MM-DD-checkout-latency.md`. After each iteration, record the
change, benchmark result, and next experiment. If the benchmark cannot run or no
valid path remains, stop with attempted paths, evidence, blocker, and needed
input.

Bug hunt:

txt
Fix the flaky checkout test on the current branch, complete only when a focused
repro fails before the fix and passes 5 consecutive runs after, while preserving
public API behavior. If the failure cannot be reproduced after the agreed
attempts, produce an evidence-backed blocker report.

Research:

txt
Produce the strongest evidence-backed reproduction of the target paper
using available materials and local resources, complete only when every headline
claim has a status row: confirmed, approximate, proxy-supported, blocked, or
uncertain. Attempt every headline result where feasible and end with a report
separating confirmed mechanics, approximate reconstructions, blocked exact
replay, and remaining uncertainty.

Pass-gated planning:

txt
Close the layout plan for user review by running the scheduled passes
one activation at a time, complete only when score >= 0.92, no dimension is
below 0.85, every scheduled pass row is complete or skipped with evidence,
issue/reference sync rows are closed, closure gates pass, and final handoff is
emitted. Do not edit implementation code.

Weak Goal Examples

txt
Improve performance
Make this better
Refactor the editor
Run all passes
Finish the project

These are weak because they lack a measurable outcome, verification surface, or scope boundary.

Pass-Gated Goals

For pass-gated lanes, prefer one lane goal when the goal tool can persist across turns. Put the pass schedule in the goal objective, run one pass per activation, and complete the goal only when closure gates prove no pass remains runnable.

Use this when a workflow has scheduled passes such as current-state read, issue discovery, intent boundary, research refresh, steelman, revision, verification sweep, or closure.

Rules:

  • The goal objective should describe the lane outcome, full pass schedule, one-pass-per-activation policy, proof gates, and closure condition.
  • Complete the current pass in the plan or progress ledger, not by closing the goal.
  • Complete the goal only when every required pass is complete or intentionally skipped with evidence.
  • Do not use separate per-pass goals; keep scheduled passes as rows in the active plan.
  • Keep pass status in the plan or progress ledger; keep goal status tied to the whole lane.

Progress fields for pass-gated lanes:

md
current_pass: current-state-read
current_pass_status: in_progress
next_pass: related-issue-discovery
goal_status: active

Allowed goal_status values:

  • active
  • complete
  • blocked

Completion Rules

Mark a goal complete only when:

  • the outcome in the goal is actually achieved
  • the completion threshold is met exactly
  • the verification surface named by the goal was checked
  • the docs/plans goal plan is updated with final verification
  • every required goal-plan checklist item is checked or marked N/A with reason
  • node .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path> passes after the final evidence is recorded
  • constraints and boundaries were respected, or deviations were explicitly accepted
  • required artifacts were created or updated
  • no required owner remains runnable
  • the final response reports the evidence, not just confidence

Do not mark complete because:

  • tests passed but the goal also required review, browser proof, docs, or a report
  • the budget is nearly exhausted
  • the current slice is done but later slices remain
  • a plan was written but execution or proof remains
  • the user says "nice" without accepting open risks

When calling update_goal(status: complete), include the tool's final token/time usage in the user-facing closeout when the tool returns it.

Blocked Rules

Blocked is terminal for the goal, not a normal checkpoint.

Use blocked only when:

  • no autonomous next move remains
  • missing evidence, access, tooling, data, or a user decision prevents progress
  • repeated attempts show the same blocker, and the tool's blocked threshold is satisfied

Do not mark blocked when:

  • more investigation is possible
  • a different test, smaller repro, or narrower source read is available
  • the work is merely hard, slow, or broad
  • a review pass found issues that can be fixed
  • a gate failed and the failing owner is obvious

Blocked report shape:

md
Goal blocked.
Attempted:

- ...
  Evidence:
- ...
  Blocker:
- ...
  Needed to continue:
- ...

Budget Handling

Budget exhaustion is not success.

If the system stops or warns because a goal budget is reached:

  • stop substantive work
  • summarize current evidence and remaining owners
  • name the next useful action
  • do not call the goal complete unless the original objective is already proven

Lifecycle Boundaries

Do not use update_goal for lifecycle transitions outside its contract.

The model may complete or block a goal only through update_goal when the tool contract is satisfied. Other lifecycle transitions are user/system-owned. If the user asks for a lifecycle transition and no direct tool is available, state that the current runtime does not expose that control instead of faking it with completion or blocked status.

Status Updates During Goals

Keep status short and evidence-based:

  • current checkpoint
  • what changed
  • what was verified
  • what remains
  • whether blocked
  • next concrete action

Avoid vague updates like "making progress" or "continuing investigation". If status gets vague, tighten the goal or checkpoint.

Research Goals

Research goals need stricter epistemic accounting.

Final reports should separate:

  • confirmed findings
  • approximate reconstructions
  • proxy/support-only evidence
  • blocked exact claims
  • remaining uncertainty

Do not flatten "approximate support" into "reproduced" or "fixed". A good research goal lets Codex keep working through uncertainty while preventing overclaiming.

Closeout Template

Use this shape when closing a goal:

md
Goal complete.
Evidence:

- <command/artifact/source>
  What changed:
- <short list>
  Constraints preserved:
- <short list>
  Residual risk:
- <only if real>
  Usage:
- <tool-reported tokens/time, when available>

For blocked:

md
Goal blocked.
Evidence:

- <what was tried>
  Blocker:
- <why no autonomous progress remains>
  Needed next:
- <specific user/tool/input>