.agents/skills/autogoal/SKILL.md
Use this when the user asks for a durable objective, long-running autonomous work, goal setup, or when a governing repo skill requires goal setup before work starts.
This skill turns a vague "keep going" instruction into a thread-scoped completion contract: what should be true, how it is verified, what must not change, and when Codex should stop.
A normal prompt says: do the next thing.
A goal says: keep working until this outcome is true, or until the evidence shows a real blocker.
Goals are for work where the next move depends on what Codex learns along the way: debugging, migrations, flaky tests, benchmark tuning, deep research, large refactors, prototypes, browser-proof loops, and pass-gated plans.
Goals are not a permission slip to wander. They are a scoped, evidence-checked contract.
No measurable outcome, no goal. A goal must have a verification surface and a
completion threshold before create_goal is called. Prefer numbers: score,
count, latency, coverage, pass count, failing-to-passing repro count, issue
rows, or explicit command success. When a numeric target does not fit, use a
binary artifact checklist that can be audited from files, commands, screenshots,
browser proof, or source-backed citations.
autogoal is the goal lifecycle kernel. It owns:
It does not own project policy. Keep repo commands, package managers, browser
tools, release rules, PR policy, scorecards, issue ledgers, and lane-specific
pass schedules in derived skills or docs/plans/templates/<template>.md.
Derived skills may be stricter than autogoal; they should not duplicate the
goal lifecycle. autogoal says how work remains honest. The derived skill says
what the lane actually requires.
Goal plans are composable, but only through static materialization.
The model is:
docs/plans plan fileThe primary template is chosen by dominant risk: task for normal execution,
docs for docs-dominant work, major-task for heavyweight architecture or
proposal work, slate-plan for Slate plan lanes, and so on.
Packs are chosen by touched surface. They add recurring gates without becoming parents:
docs: docs are touched but not the dominant deliverableagent-native: .agents/**, .claude/**, .codex/**, skills, hooks,
commands, prompts, or user-action tooling changedbrowser: real browser, route, UI, console, network, or interaction proof
is requiredpackage-api: package exports, public API, release artifacts, package
boundaries, or package-level checks changedCore execution and review gates belong in the primary template. Packs are only for optional touched surfaces that would otherwise be absent from that template.
Do not create runtime inheritance between templates. The helper copies pack rows
into the generated plan's Start Gates, Work Checklist, and
Completion Gates. After creation, the generated plan is the truth; the checker
validates that materialized plan only.
The generated plan is the dedicated plan shell. Fill that exact file
immediately after generation: replace placeholders, resolve every gate row, and
mark non-applicable generated rows as N/A: <reason> with evidence. Do not
delete, wholesale replace, or hand-narrow the generated plan into an ad hoc
smaller plan after durable work has started. If the selected template is plainly
wrong and no substantive work has started, regenerate once with the right
template and record why. If work has already started, keep the generated plan
and close it honestly.
Use packs like this:
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
--template task \
--with docs \
--with agent-native \
--title "<short task title>"
Examples:
--template docs--template task --with docs--template task --with agent-native--template task --with browser--template task --with package-api--template major-task--template major-task --with docs --with package-apiIf two packs add related gates, keep both when they protect different failure modes. If they duplicate exactly the same proof, keep the more specific pack and record the other as N/A in the plan.
Classify goal-backed work before creating or updating a plan:
micro: one narrow, auditable outcome; no cross-file state; no meaningful
continuation loop. Use a tiny plan only when a repo rule requires it, or
record the audit surface directly in the final response.normal: multi-step work with concrete evidence and likely continuation.
Use the appropriate docs/plans template and close all relevant gates.major: architecture, migrations, benchmarks, framework comparisons,
broad refactors, pass-gated lanes, or public API/runtime risk. Use a derived
skill or project template with phases, risk rows, review gates, and explicit
closure criteria.Do not inflate a micro work item into a ceremony pile. Do not shrink a major work item into a checklist that cannot catch real risk.
Every goal-backed workflow chooses exactly one flow mode before durable work starts. The mode controls the human review boundary; it does not weaken the evidence or completion rules.
Use this for issue-like or work-item-like work where the agent is expected to read the source, derive the local plan, implement, verify, and hand off the result without stopping for plan approval.
Rules:
Use this when the requested output is a plan and the user wants the agent to drive toward the best plan with minimal human interruption.
Rules:
Use this when the user and agent are intentionally shaping the plan together before execution.
Rules:
Flow-mode selection belongs in the derived skill or the instantiated plan when the caller knows it. If no caller specifies a mode, default to one-shot execution for implementation tasks, agent-led plan hardening for autonomous planning/review requests, and collaborative planning when the user is actively brainstorming or asking for plan acceptance before work.
create_goal or goal setup.autogoal repair <expectation> after any goal-backed workflow
missed their expectation, and they want the owning rule/template repaired for
future runs.This is agent-native. Use the goal tools directly when available:
get_goal to inspect the current thread goal.create_goal to start a new active goal.update_goal(status: complete) only when the objective is genuinely met.update_goal(status: blocked) only when no autonomous progress remains and
the same blocker has recurred enough to satisfy the tool contract.There can be only one active goal per thread. Repeated create_goal calls fail
while a goal exists. Always call get_goal first; call create_goal only when
it returns no goal; use update_goal to complete or block the active goal.
When get_goal returns a goal, classify it before touching durable state:
same: the existing goal already describes the current requested end state.
Continue under it and keep its plan current.same but stale plan: the goal is right but the plan is stale. Repair the
plan first, then continue.newer user correction: the latest user message narrows, reverses, or
corrects the goal. Record the correction in the plan, follow the newest
instruction, and do not call the old objective complete unless it is actually
true.different objective: the active goal is unrelated. Do not hijack it. If no
lifecycle tool can pause, resume, cancel, or replace it, say so briefly and
proceed only with degraded plan state when the user explicitly says to go.paused or externally controlled: do not fake completion or blocked status
to escape the tool. Continue only if the latest user instruction clearly
authorizes the new work, and record the mismatch in the plan.Never mark a goal complete because the user changed their mind. Completion means the objective is true. A correction changes the work path; it does not retroactively prove the old objective.
Do not invent a goal state file when a goal tool is available. If goal tools are not available, record degraded control state in the active plan only when the repo workflow requires that fallback; otherwise state that goal tools are not available and continue with the nearest safe workflow.
A strong goal defines eight things:
Use this objective shape:
<desired end state>, complete only when <quantitative or auditable threshold>,
verified by <specific evidence>, and when the active goal plan passes
`node .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path>`, while
preserving <constraints>. Use flow mode <one-shot execution | agent-led plan
hardening | collaborative planning> and <allowed inputs/tools/boundaries>.
Maintain goal plan <docs/plans/path>. Between iterations, <progress log and
next-move policy>. If blocked or no valid path remains, report <attempts,
evidence, blocker, and needed input>.
Before calling create_goal, rewrite vague objectives into measurable ones.
Required:
no extra constraintsQuantitative examples:
p95 < 120 msscore >= 0.92 and no dimension below 0.850 accepted review findingsall 12 pass rows complete or skipped with evidencefocused repro fails before fix and passes 5 consecutive runs afterno stale symbol matches from rgAuditable non-numeric examples:
Reject or rewrite:
Do not make check-complete.mjs the whole goal. That only proves the plan looks
closed, not that the work is true.
Use the hybrid rule for every goal:
docs/plans goal plan records the fresh evidence for that threshold.node .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path> is
the final mechanical gate before update_goal(status: complete).The checker validates that the goal plan has no unchecked required checklist items, no unresolved gate rows, no open phase/pass rows, concrete verification evidence, current reboot status, and recorded risks. It does not replace tests, browser proof, source audits, benchmark output, or other named verification evidence.
Every completion proof should fit at least one evidence type:
command: exact command, cwd, and pass/fail result.source-audit: exact files or search query proving a static property.browser: route, interaction, screenshot or console/network caveat.artifact: generated file, report, table, PR body, issue comment, or
exported asset.review: reviewer/tool used, accepted findings, fixes, and remaining
rejected findings with reasons.external-source: cited URL, issue, paper, docs page, or connected app
result used as authority.N/A:<reason>: why a recurring gate does not apply.Evidence must name the owning workspace, package, app, route, or tool when that ownership matters. A root-level check cannot prove a sibling repo, app route, browser surface, or external tracker unless the plan explains why it is the owning surface.
Trigger this mode when the arguments start with:
repair <expectation>
Repair mode is self-improvement with a leash. It converts a concrete expectation miss from a goal-backed run into the smallest durable change to the owning rule, template, helper, or active plan.
Use it for misses like:
Do not use it for:
.agents/skills/*/SKILL.md by handTarget selection order:
Template:,
skill name, phase table, and completion gates to identify the owner..agents/skills/<skill>/SKILL.md first, then
docs/plans/templates/<skill>.md when it exists..agents/rules/autogoal.mdc and
docs/plans/templates/goal.md.Repair scope matrix:
| Miss | Primary repair owner |
|---|---|
| Current plan has wrong status, row, evidence, or handoff fields | active docs/plans/* plan |
| Future generated plans need a recurring section, gate, row, or placeholder | docs/plans/templates/<owner>.md |
| Agent chose the wrong workflow, target, proof standard, or completion rule | .agents/rules/<owner>.mdc |
| Prose keeps failing and the miss is mechanically checkable | .agents/rules/autogoal/scripts/* plus focused script proof |
| Derived skill adds lane-specific ceremony or policy | derived skill rule/template, not autogoal |
| Universal lifecycle rule is missing across goal-backed work | .agents/rules/autogoal.mdc |
Repair workflow:
Restate the expectation in one sentence.
Identify the miss with source evidence: plan row, final response shape, missing gate, bad status, wrong template, or stale generated skill.
Pick exactly one primary owner. Patch secondary owners only when sync is required, such as source rule plus project template.
Create a repair plan with:
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
--template goal-repair \
--title "<short repair title>"
If a repair is truly trivial, record why no separate repair plan is needed.
Patch source-of-truth files only. Never hand-edit generated
.agents/skills/*/SKILL.md; after changing .agents/rules/**, run
pnpm install.
Prove the repair:
rg for the new rule/gate/wording.agents/rules/** changedcheck-complete.mjsFinal response says: expectation, repaired owner, verification, and any deliberate non-repair.
Safety rules:
autogoal. Repair the derived
skill when the expectation is lane-specific; repair autogoal only when the
expectation should apply across goal-backed work.Any skill that requires or wraps autogoal should declare:
docs/plans/templates/<template>.md it usesautogoalDerived skills should route to autogoal for lifecycle mechanics instead of
re-implementing plan creation, completion, blocked semantics, repair mode, or
evidence closure.
After compaction, interruption, or a long pause:
get_goal when available.docs/plans path named by the goal, current workflow, or
latest handoff.If the active goal and newest request disagree, use the Active Goal Conflict Protocol before editing.
Project templates may define Start Gates: and Completion Gates: tables.
These are template-owned audit surfaces for recurring project checks.
Keep this rule generic. Do not put project-specific commands, package-manager
details, release rules, browser tooling, or repo policy in this file. Those rows
belong in project-owned templates under docs/plans/templates/.
When present, gate tables must use markdown tables with these columns:
GateAppliesEvidenceThey may include extra columns such as Required action. The checker treats any
cell in a gate row as unresolved when it is blank, pending, TODO, or TBD.
Gate closure rules:
Applies must be resolved before completion.yes means the evidence cell names the command, artifact, proof, source
audit, or concrete result.no or N/A: <reason> means the evidence cell explains why the gate does
not apply.check-complete.mjs enforces gate-row closure mechanically, but it does not
know what project-specific commands mean.get_goal when available.create_goal.docs/plans goal plan from the checklist template before
substantive work.check-complete.mjs.Set the goal before mutable lane state when the workflow depends on a goal. For pass-gated planning or accepted-plan execution lanes, the goal is the first durable action after the minimum read needed to derive the objective.
Every active goal gets one durable goal plan. It is a single markdown file that absorbs the useful file-planning parts: phases, findings, progress, decisions, failed attempts, verification, and reboot status.
Path:
docs/plans/YYYY-MM-DD-<short-goal-slug>.md
docs/plans/<ticket>-<short-goal-slug>.md
Use the ticket-prefixed form for issue-backed work. Do not create
task_plan.md, findings.md, progress.md, .planning/**,
docs/goals/**, .tmp/goals/**, or hook state for goal work. Hooks are
overkill. The active goal plus the docs/plans file are the durable state.
Create the goal plan with the source-owned helper whenever available:
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
--title "<short title>" \
--template "<primary template name or path>" \
--with "<optional pack name>"
The helper writes docs/plans/YYYY-MM-DD-<slug>.md or
docs/plans/<ticket>-<slug>.md from a project-owned template. The helper lives
under .agents/rules/autogoal/ because it is generic rule tooling; generated
SKILL.md files are not edited by hand.
Do not pass objective, threshold, verification, constraints, boundaries, or
blocked condition through CLI flags. The CLI only creates the static plan shell.
After creation, edit the generated docs/plans file and write the active goal
objective, completion threshold, verification surface, constraints, boundaries,
blocked condition, and remaining goal-specific rows into the file.
Editing the generated file means filling and resolving that materialized shell,
not replacing it with a hand-made mini-plan. Keep generated sections and rows
unless the row is truly irrelevant, then mark it complete with N/A: <reason>.
If a template choice is wrong before work starts, regenerate with the correct
template and record the replacement. If any durable work has already started,
do not swap the plan out from under the work; close the generated plan with
honest evidence, N/A rows, or a blocker.
The default project template is generic:
docs/plans/templates/goal.md
Project or skill-specific templates live beside it:
docs/plans/templates/<template>.md
Reusable packs live under:
docs/plans/templates/packs/<pack>.md
Use templates by passing the primary template name. Add packs for touched surfaces:
node .agents/rules/autogoal/scripts/create-goal-scratchpad.mjs \
--template "<template-name>" \
--with "<pack-name>" \
--title "<short title>" \
...
Repeat --with for multiple packs, or pass a comma-separated list. The helper
records Primary template: and Applied packs: in the generated plan and
copies pack rows into the plan's existing gate/checklist sections.
docs/plans/templates holds reusable project templates. Direct files under
docs/plans are instantiated runtime goal plans. Do not store goal templates or
active goal state under docs/goals.
Create a new project-owned template by copying the generic template:
node .agents/rules/autogoal/scripts/create-goal-template.mjs \
--skill "<skill-name>"
Then edit the new docs/plans/templates/<skill-name>.md to add that skill or
project lane's mandatory sections, checklist rows, phase schedule, evidence
rows, and closure gates. Keep the generic goal template project-agnostic.
Template creation is not skill creation. Do not generate .agents/rules/*,
.agents/skills/*, aliases, execution handoffs, hook state or compatibility
bridges from this workflow. A project template is just a reusable static shell
for a future docs/plans/* goal plan. The agent fills the real objective,
threshold, verification surface, constraints, boundaries, and blocked condition
inside the instantiated plan.
Before creating or updating a project template, define these inputs:
If an input cannot be inferred from current project context, add a placeholder inside the template and label it as a generation gap. Ask the user only when the missing answer changes the template's purpose, safety model, or boundaries.
Template quality bar:
docs/plans/templates.Template sync review:
create-goal-scratchpad.mjs or inspect the
copied file directly when a smoke plan would create noise.check-complete.mjs..agents/rules/autogoal.mdc, run pnpm install to regenerate
generated skill files.Create the plan before substantive edits. Update it after every meaningful decision, finding, tradeoff, failed attempt, review fix, verification run, or scope change. Re-read it before major decisions and after compaction or interruption.
Check the goal plan before completion:
node .agents/rules/autogoal/scripts/check-complete.mjs docs/plans/<goal-plan>.md
This is the final mechanical gate, not a substitute for the named verification surface.
The goal-plan checklist is mandatory. Its first required item is skill analysis.
Do not call
update_goal(status: complete) while any required checklist item remains
unchecked. If an item does not apply, check it and add N/A: <reason>.
Required goal-plan sections:
# <Goal title>
Objective:
<exact active goal objective>
Flow mode:
<one-shot execution | agent-led plan hardening | collaborative planning>
Goal plan:
<docs/plans/path>
Primary template:
<docs/plans/templates/name.md>
Applied packs:
- <pack or none>
Completion threshold:
- <quantitative or auditable done row>
Verification surface:
- <tests/artifacts/browser proof/source audit>
Constraints:
- <must preserve / must not touch>
Boundaries:
- <allowed files/packages/tools>
Blocked condition:
- <condition that stops autonomous work>
Start Gates:
| Gate | Applies | Evidence |
Work Checklist:
- [ ] Actual work item or pass-specific requirement with evidence.
- [ ] ...
Completion Gates:
| Gate | Applies | Required action | Evidence |
Phase / pass table:
| Phase | Status | Evidence | Next |
Findings:
- <research, source reads, browser/visual findings as data>
Timeline:
- <timestamp> <action/evidence>
Decisions and tradeoffs:
- <decision> -> <reason> -> <risk>
Review fixes:
- <finding> -> <accepted/rejected> -> <change or reason>
Error attempts:
| Error / failed attempt | Count | Next different move | Resolution |
Verification evidence:
- <command/artifact> -> <result>
Reboot status:
| Where am I? | Where am I going? | What is the goal? | What learned? | What done? |
Open risks:
- <risk or none>
Before update_goal(status: complete), the goal plan must include the final
verification evidence, checked checklist, current reboot status, and any
remaining risks.
Performance:
Reduce p95 checkout latency below 120 ms, complete only when the checkout
benchmark reports p95 < 120 ms and the correctness suite passes, while keeping
public API behavior unchanged. Use only checkout service code, benchmark
fixtures, and related tests. Maintain goal plan
`docs/plans/YYYY-MM-DD-checkout-latency.md`. After each iteration, record the
change, benchmark result, and next experiment. If the benchmark cannot run or no
valid path remains, stop with attempted paths, evidence, blocker, and needed
input.
Bug hunt:
Fix the flaky checkout test on the current branch, complete only when a focused
repro fails before the fix and passes 5 consecutive runs after, while preserving
public API behavior. If the failure cannot be reproduced after the agreed
attempts, produce an evidence-backed blocker report.
Research:
Produce the strongest evidence-backed reproduction of the target paper
using available materials and local resources, complete only when every headline
claim has a status row: confirmed, approximate, proxy-supported, blocked, or
uncertain. Attempt every headline result where feasible and end with a report
separating confirmed mechanics, approximate reconstructions, blocked exact
replay, and remaining uncertainty.
Pass-gated planning:
Close the layout plan for user review by running the scheduled passes
one activation at a time, complete only when score >= 0.92, no dimension is
below 0.85, every scheduled pass row is complete or skipped with evidence,
issue/reference sync rows are closed, closure gates pass, and final handoff is
emitted. Do not edit implementation code.
Improve performance
Make this better
Refactor the editor
Run all passes
Finish the project
These are weak because they lack a measurable outcome, verification surface, or scope boundary.
For pass-gated lanes, prefer one lane goal when the goal tool can persist across turns. Put the pass schedule in the goal objective, run one pass per activation, and complete the goal only when closure gates prove no pass remains runnable.
Use this when a workflow has scheduled passes such as current-state read, issue discovery, intent boundary, research refresh, steelman, revision, verification sweep, or closure.
Rules:
Progress fields for pass-gated lanes:
current_pass: current-state-read
current_pass_status: in_progress
next_pass: related-issue-discovery
goal_status: active
Allowed goal_status values:
activecompleteblockedMark a goal complete only when:
docs/plans goal plan is updated with final verificationnode .agents/rules/autogoal/scripts/check-complete.mjs <docs/plans/path> passes
after the final evidence is recordedDo not mark complete because:
When calling update_goal(status: complete), include the tool's final token/time
usage in the user-facing closeout when the tool returns it.
Blocked is terminal for the goal, not a normal checkpoint.
Use blocked only when:
Do not mark blocked when:
Blocked report shape:
Goal blocked.
Attempted:
- ...
Evidence:
- ...
Blocker:
- ...
Needed to continue:
- ...
Budget exhaustion is not success.
If the system stops or warns because a goal budget is reached:
Do not use update_goal for lifecycle transitions outside its contract.
The model may complete or block a goal only through update_goal when the tool
contract is satisfied. Other lifecycle transitions are user/system-owned. If
the user asks for a lifecycle transition and no direct tool is available, state
that the current runtime does not expose that control instead of faking it with
completion or blocked status.
Keep status short and evidence-based:
Avoid vague updates like "making progress" or "continuing investigation". If status gets vague, tighten the goal or checkpoint.
Research goals need stricter epistemic accounting.
Final reports should separate:
Do not flatten "approximate support" into "reproduced" or "fixed". A good research goal lets Codex keep working through uncertainty while preventing overclaiming.
Use this shape when closing a goal:
Goal complete.
Evidence:
- <command/artifact/source>
What changed:
- <short list>
Constraints preserved:
- <short list>
Residual risk:
- <only if real>
Usage:
- <tool-reported tokens/time, when available>
For blocked:
Goal blocked.
Evidence:
- <what was tried>
Blocker:
- <why no autonomous progress remains>
Needed next:
- <specific user/tool/input>