Work Task

Handle $ARGUMENTS. Start from the source of truth, load extra skills only when they earn their keep, and verify before calling the task done.

<task>#$ARGUMENTS</task>

Core Rules

Read the task source first.
Read local repo instructions and nearby implementation patterns before editing.
Search for existing patterns before inventing new ones.
Prefer the best durable ownership fix over the smallest local patch.
Treat public issue diagnoses and suggested fixes as claims to challenge, not instructions to implement.
Prefer targeted tests and checks during iteration.
Keep the user updated at milestones.
Verify the actual result before claiming done.
Do not default to research swarms, review swarms, browser proof, PRs, tracker comments, or compounding.
Before calling a task blocked on a repo-wide gate, rule out local install corruption once when the failure smells wrong for the diff.

Intake

Classify the input:
- Plain task text: the user prompt is the source of truth.
- File path or spec path: read it first.
- GitHub issue URL: fetch it with gh issue view first.
- GitHub PR URL: fetch it with gh pr view first.
- GitHub repository security advisory URL (github.com/<owner>/<repo>/security/advisories/GHSA-*): fetch it with gh api repos/<owner>/<repo>/security-advisories/<GHSA_ID> first.
- Public GitHub Advisory Database URL (github.com/advisories/GHSA-*): fetch it with gh api advisories/<GHSA_ID> first. Treat it as read-only unless you can identify a repository security advisory owned by the current repo/org.
- Bare GitHub issue like #555: resolve it against the current gh repo first, then fetch it with gh issue view.
- Linear issue link/id: fetch it with the Linear integration first.
Read the full source-of-truth context before doing anything else.
For tracker items, also read comments and attachments when available.
If tracker evidence includes video or screen recording, load video-transcripts, use or create the shared transcript cache through that skill, and require normalized <video-transcripts> XML before implementation. If the helper cannot produce it after a real attempt, stop and report the blocker.
Classify task shape:
- Testing or coverage work.
- Program or batch work.
- Ordinary one-shot work: bug, feature, refactor, docs, review, or investigation.
Classify heavyweight work:
- Heavyweight: architecture or public API redesign, breaking changes, major cross-package refactors, benchmarking, profiling strategy, scalability work, framework comparison, migration analysis, RFCs, proposals, or spec-first major changes.
- Non-heavyweight: ordinary bugs, one-package features, docs-only edits, routine test work, small refactors, or normal issue execution.
If heavyweight, load major-task immediately and let it own workflow.
If non-heavyweight, classify complexity:
- Non-trivial: multi-step, research-heavy, phased, or likely more than a few tool calls.
- Trivial: quick question, small edit, or work that does not need persistent working memory.
If non-trivial and measurable/auditable:
- load autogoal before implementation
- create or update one docs/plans goal plan from the dominant-risk primary template plus touched-surface packs:
  - docs-dominant work: --template docs
  - other normal work: --template task
  - supporting docs touched: add --with docs
  - .agents/**, .claude/**, .codex/**, skills, hooks, commands, prompts, or user-action tooling touched: add --with agent-native
  - browser/UI route or interaction touched: add --with browser
  - package exports, public API, release artifacts, or package boundary touched: add --with package-api
  - user-visible Plate registry UI, kit, example, metadata, install shape, or generated registry changelog touched: add --with registry-changelog
  - GitHub security advisory, GHSA, CVE, npm advisory, or private vulnerability disclosure touched: add --with security-advisory node .agents/skills/autogoal/scripts/create-goal-scratchpad.mjs --template <task|docs> --with <pack> --title "<short task title>"
- follow local repo overrides for where planning files live
If testing or coverage work, load testing before tdd and choose the smallest honest slice.
If program or batch work, restate the ordered scope and finish one slice at a time unless the user asked for a broader sweep.
For any tracker source, record source type/id/title, task type, acceptance criteria, caveats, likely files/routes/packages, browser surface, likely root-cause layer, and the pre-solution issue challenge verdict when applicable in the plan when a plan exists.
If code will change, decide branch handling before edits using repo policy; do not reuse an unrelated branch just because it is checked out.
If anything important is still ambiguous after the source and nearby code pass, ask the smallest useful clarifying question.

Tracker Rules

Apply only when the source is a tracker item.

Treat the tracker item as the source of truth.
Use the native tool for fetch and sync-back: gh for GitHub, Linear integration for Linear.
If useful, rename the thread to <issue-number> <issue-title>.
Prefer PR before tracker comment for verified code-changing work unless blocked or the user said not to.
Comment back only after a meaningful outcome, or when a blocker note helps the tracker owner.
Do not require PR creation, screenshots, or comments for analytical, blocked, or inconclusive work.

Public Issue Challenge Gate

Apply this before implementation when a public tracker item reports a bug, makes a user-visible behavior claim, includes a technical diagnosis, or proposes an implementation fix.

Load .agents/skills/autoreview/SKILL.md for its review contract and use that stance before code: adversarial, source-backed, and willing to reject weak claims. This is a pre-solution issue/design review, not a dirty-diff helper invocation. The structured autoreview helper remains the closeout gate after a real diff exists.
For bug reports and behavior claims, reproduce the user-visible behavior or the smallest honest failing surface before treating the issue as valid. Reproduce the bug, not the suggested fix.
Escalate reproduction from lowest/fastest to highest/slowest before declaring not reproduced:
1. focused unit/package/integration test or source-level executable repro
2. existing repo-owned Playwright regression/test harness when available and useful as executable coverage; do not use standalone Playwright, Puppeteer, or raw DevTools as a substitute for the repo Browser policy
3. [@Browser](plugin://browser@openai-bundled) against the real route or local app when tests or Playwright cannot reproduce or cannot model the user-visible surface honestly
4. Browser screenshot or explicit visual-proof waiver when layout, rendering, selection, clipboard prompts, native dialogs, or visual state matters
Mark a ladder level N/A only when that level cannot observe the reported behavior; record why. Do not call a bug not reproduced until every applicable lower-to-higher repro level has either failed to reproduce or is blocked with evidence.
For feature, docs, support, or cleanup requests with no bug or failing behavior claim, mark reproduction N/A with a reason. Still challenge any proposed technical fix against source ownership before coding.
Read the nearby ownership boundary before choosing a solution. If the durable fix is an API, abstraction, data model, or package-boundary change, prefer that over patching every caller or copying the issue's proposal.
Record a pre-solution verdict in the plan when one exists:
- valid: reproduced and worth fixing, or non-bug work whose source-backed acceptance criteria are valid with reproduction marked N/A.
- not reproduced: hard stop. Do not code. Report exact repro attempts and missing evidence. Use this only when a bug or behavior claim needed reproduction.
- invalid or wont-fix: hard stop. Do not code. Give harsh honest feedback about the bad premise or policy boundary.
- partially valid: pivot to the best long-term fix and explicitly record which part of the issue or suggested fix is wrong, narrow, or incomplete.
- platform limitation: hard stop unless a docs note, guardrail, or graceful fallback is the actual durable fix.
When the issue suggests a fix, compare it against source ownership and reject it if it only papers over symptoms, depends on stale assumptions, widens API debt, or would make a future maintainer pay for the reporter's guess.
Be blunt in tracker and final handoff language. If the premise is wrong, say so. If only half the issue is real, say which half. Nice ambiguity is how bad patches land.

Security Advisory Hotfixes

Apply this when the source or required closeout is a GitHub security advisory, GHSA, CVE request, npm advisory, private vulnerability report, or public package security hotfix.

Treat the advisory as the tracker source. Use the GitHub advisory API through gh api when the source is a GitHub repository advisory or public GHSA; do not rely on the web UI as the only proof. For npm-only advisories or private reports without a GitHub advisory, record the external advisory/report source and the external owner or blocker instead.
Public GitHub Advisory Database records from github.com/advisories/GHSA-* are read-only for normal repo maintainers. If the fix belongs to this repo, locate or create the repository security advisory before trying to update vulnerable ranges, publish, or request a CVE. If no repo advisory is owned by this repo/org, close with global GHSA readback plus external owner/blocker.
If the source is private, draft, embargoed, or not yet publicly disclosed, do not create a public PR, public issue comment, release note, or tracker sync that reveals exploit details before the fixed version is available and disclosure is approved. Use the repository advisory/private fork workflow when available. If a public PR is necessary before disclosure, keep the title, body, branch, commits, tests, and comments sanitized unless the user explicitly approves disclosure.
Use --with security-advisory in the goal plan. Also use --with package-api when a published package, changeset, or npm release is part of the fix.
Do not stop at a merged PR, merged Version Packages PR, or created GitHub Release while the advisory is still draft, lacks a patched version, or points at the wrong affected range.
Verify the patched package is actually published before publishing the advisory. For npm packages, read back npm view <package>@<version> and the GitHub release/tag when relevant.
When a repository advisory exists, update advisory vulnerabilities with the exact package, vulnerable range, and fixed version. The affected range should exclude the fixed version. Public/global GHSA records without a repo advisory are read-only; record readback and owner/blocker instead of mutating them.
Publish repository advisories after the fixed version is available. For external/npm/private advisories, record the external publication state or the owner/blocker instead.
If a repository advisory has empty cve_id and is eligible, request a CVE through the advisory API unless the user explicitly says not to; read back the advisory afterward and record that assignment may remain pending. For public/global GHSA records without a repo advisory and non-GitHub sources, record existing CVE, external CNA/request owner, GitHub/global owner, or N/A reason.
Final closeout must read back state, published timestamp, affected package, vulnerable range, patched version, CVE status, and the expected GitHub review/Dependabot propagation caveat.
Final handoff for private/draft work must state the disclosure-safe path used or the exact user approval that allowed public details.

Load Skills Only When Justified

Skill Diet

Default to task for normal work and major-task for heavyweight work. Load a niche skill only when it owns a hard domain gate, command, or proof surface that the active task would otherwise miss.

Do not keep repo-local skills for generic lifestyle, app-template, local git ops, stale command stubs, or broad CE ceremony when task, major-task, autogoal, autoreview, or a Plate-specific skill owns the workflow better.

If a generated skill is gone but skills-lock.json still references it, remove it through npx skills remove <skill> -y first. If the CLI removes the agent files but leaves stale lock entries, record that evidence before cleaning the lock.

autogoal: measurable or auditable non-trivial work. Use the dominant-risk primary template and touched-surface packs: docs-heavy work gets --template docs, normal work gets --template task, and supporting docs, browser, agent-native, security-advisory, or package/API surfaces add matching --with <pack> rows. Review expectations stay in the primary template. Do not use root planning files, hooks, .planning/**, or docs/goals/**.
major-task: heavyweight architecture, framework, migration, benchmark, or proposal work.
testing: tasks primarily about tests, coverage, regression gaps, or suite phases.
tdd: bugs and feature work where behavior-level automated coverage is sane.
learnings-researcher: non-trivial repeated domains with documented solutions.
debug: fuzzy failures after the first repro pass or failing test.
video-transcripts: tracker evidence contains a video or screen recording.
If requirements remain ambiguous after source and local context, ask the smallest clarifying question or switch to a planning goal when the user wants planning.
framework-docs-researcher: unfamiliar, version-sensitive, or unstable third-party APIs after local clones/docs are checked.
browser-use: real browser/UI surface needs verification.
agent-browser-issue: browser automation is blocked by a reusable tool-side issue.
changeset: published package work under packages/ needs release notes.
registry-changelog: user-visible Plate registry UI, kit, example, metadata, copied-code install shape, or generated registry changelog work needs a registry changelog entry or a concrete N/A reason.
plate-ui: authoring or refactoring Plate registry UI/components, static/live renderers, kits, registry wiring, or ownership/extraction decisions under apps/www/src/registry/**.
docs-creator: non-trivial docs/content work, new or rewritten pages, plugin/API/spec/serialization docs, route moves, example changes, or docs with source-backed ownership/API claims. Use --template docs when docs dominate; use --with docs when docs are a supporting touched surface under a normal or major task.
git-commit-push-pr: verified code should ship as a PR.
Review skills: load only for risky, large, user-facing, or architecture-sensitive changes.
agent-native-reviewer: changes touch .agents/**, .claude/**, AI/tooling surfaces, commands, or user actions an agent should perform.

Review And Risk Gates

Keep this lighter than Slate Plan. A normal task should not grow a scorecard, issue ledger, or pass calendar, but it still needs real closeout pressure when the patch is risky.

For public tracker implementation work, the pre-solution issue challenge gate is mandatory before writing code. It is separate from final autoreview: first decide whether the issue deserves a fix and what the long-term boundary is, then implement, then run structured autoreview on the actual diff.
Autoreview is a hard closeout gate for non-trivial implementation changes. Load .agents/skills/autoreview/SKILL.md, pick the target from the actual diff state, and keep going until there are no accepted/actionable findings. Use dirty local --mode local, branch/PR --mode branch --base <base>, and committed-slice --mode commit --commit <ref>.
agent-native-reviewer is required when the task changes .agents/**, .claude/**, .codex/**, skills, hooks, commands, prompts, or user-action tooling. Treat accepted findings like normal review findings: verify against source, fix the real issue, and rerun the relevant proof.
Source authority is workspace-local. A check run in the planning repo cannot prove behavior owned by a sibling repo, package, app, browser route, or tracker system. Record the cwd/tool that owns each proof.
For public API, runtime, package-boundary, browser behavior, agent-action, or command-contract changes, add a compact high-risk note before closeout: realistic failure mode, proof plan, and why the chosen boundary is still the right one.
Trivial docs, wording, and no-local-patch tasks may mark these gates N/A with a reason. Do not run review theater for a typo.

Execution Path

Bug

For tracker-backed bugs, complete the public issue challenge gate first.
Reproduce first when possible.
Add a behavior-level regression test when sane.
Fix the real ownership boundary, not every caller around it.
If the best fix requires an API change, make it unless task constraints rule it out.
Re-run targeted checks and browser flow only when the bug lives there.

Feature

Reduce the task to the smallest slice that satisfies acceptance criteria.
Add behavior coverage when sane.
Prefer the cleanest long-term design that fits the slice.
Verify the user-facing outcome.

Testing Or Coverage Work

Use the testing policy before choosing files or commands.
Pick the smallest honest hotspot or ordered slice.
Add or deepen focused tests instead of broad smoke coverage.
Verify with targeted commands first.

Program Or Batch Work

Respect explicit order.
Define done for the current slice before implementation.
Complete one slice cleanly unless the user asks for a broader sweep.

Refactor Or Chore

Preserve behavior.
Do not do fake TDD theater.
Improve bad APIs or abstractions when that is the real fix.
Run the narrowest regression checks plus relevant build, typecheck, or lint.

Docs Or Content

Skip engineering ceremony.
For non-trivial docs, load docs-creator and use the docs goal template.
If docs are only a supporting touched surface on another task, add the docs pack instead of switching the primary template.
For tiny copy edits, skip the docs goal and keep verification proportional.
Verify links, examples, formatting, source-backed claims, and rendered output as appropriate.

Review Or Investigation

Read relevant diff, files, and surrounding context first.
For reviews, report findings first, ordered by severity.
For investigations, identify failure mode, probable cause, and next action before changing code.
Only implement changes if the user asked for them.

Verification

Keep verification mandatory and proportional.

Run targeted tests for changed behavior.
Run package/app build and typecheck when relevant.
Run lint when code changed and repo policy expects it.
Run browser verification only for browser or UI tasks.
Run broader repo-wide gates only when repo instructions or change scope justify them.
Run verification in the workspace, package, app, route, or external system that owns the changed behavior; record the cwd when that is not obvious.
Close the high-risk note before final handoff when the task changes public API, runtime, package boundaries, browser behavior, agent actions, or command contracts.
For non-trivial implementation changes, run the autoreview gate selected from the actual diff state and close all accepted/actionable findings before final.
For agent/tooling changes, run the agent-native review gate or record why it does not apply.
If pnpm test, bun test, or pnpm check fails with local-corruption signals unrelated to the diff, run pnpm run reinstall once and rerun the exact failing command before declaring the task blocked.
If work changes published packages, satisfy the changeset gate.
If work changes user-visible registry output under apps/www/src/registry/**, satisfy the registry changelog gate.
If work changes or resolves a security advisory, satisfy the security-advisory gate through publish/readback or record the external blocker.
If work is registry-only under apps/www/src/registry/, the registry changelog gate replaces the package changeset.
If verified work changed code, create or update the PR before tracker sync-back unless the user said not to.
If the task came from a tracker item and reached a meaningful outcome, sync back unless the user said not to.

Final Handoff

Be extremely concise.
Report PR, issue/tracker, confidence, tests, browser proof, outcome, caveats, design choice, and verification only when applicable.
For non-trivial task goals, close every relevant task-template gate before the final response.
If a PR exists, keep the PR description synced to the final handoff.
For tracker comments, write for QA or the issue owner, not for internal implementation history.

Task-Style PR Body

When a task run creates or updates a PR, the PR description must mirror the task final handoff. Do not use a generic Summary / Verification PR body, an adaptive prose body from git-commit-push-pr, or a generated badge footer unless the caller or repo template explicitly asks for it.

Use the accepted task PR format from kitcn PR #270. The shape is not optional:

Preserve any existing  block. If a changeset is part of the diff and repo policy expects auto release, include that block.
Use an emoji-prefixed issue/tracker/fix line, for example 🐛 Fixes #123 or 🐛 Fixes ➖ N/A. Never include a line that links to the current PR itself; the current PR URL belongs in the final response, not in its own description.
Use an emoji confidence line, for example 🟢 95-100% confidence.
Use this exact table header: | Phase | 🧪 Tests | 🌐 Browser |
Use Reproduced and Verified rows. Mark passing proof with 🟢, repro or failing proof with 🔴, and non-applicable browser/test cells with ➖ N/A.
Use bold emoji section headings exactly in this family: **✅ Outcome**, **⚠️ Caveat**, **🏗️ Design**, and **🧪 Verified**.

The body should tell QA/reviewers what was fixed, how it was reproduced, how it was verified, and why the chosen ownership boundary is right. It must not use plain Fix:, plain Confidence:, ## Outcome, ## Verified, or a generic Summary / Verification shape for task-run PRs. After editing, verify it with gh pr view --json body before final handoff.

Success Criteria

Source-of-truth context was read first.
Relevant repo instructions and patterns were read before editing.
Tracker items were fetched and summarized correctly when provided.
Public issue claims and suggested fixes were challenged before solution; not reproduced, invalid, and won't-fix cases stopped before code.
Video evidence used video-transcripts before implementation when required.
Bare GitHub issues like #555 were resolved against the current repo.
The chosen fix addressed the highest-leverage ownership boundary available.
Non-trivial measurable work loaded autogoal and used the right goal template.
Non-trivial docs work loaded docs-creator and used --template docs.
Supporting docs, browser, agent-native, package/API, or extra-review surfaces used the matching --with <pack> rows when they were not the dominant risk.
Supporting user-visible registry surfaces used --with registry-changelog.
Security advisory hotfixes used --with security-advisory and closed release, advisory publication, CVE request, and readback evidence.
Testing work loaded the testing policy before implementation.
Only necessary skills were loaded.
Batch work did not sprawl without explicit instruction.
Verification matched change scope.
PR descriptions created by task runs used the kitcn PR #270 emoji task-style body and were verified with gh pr view --json body.
Final handoff matched the task type and any task-template gate evidence.