docs/plans/2026-06-15-task-repro-escalation-ladder.md
Objective: Add task repro escalation ladder; done when task rule/template/skill require tests -> Playwright -> Browser before not-reproduced and verification passes.
Goal plan: docs/plans/2026-06-15-task-repro-escalation-ladder.md
Template: docs/plans/templates/task.md
Primary template: docs/plans/templates/task.md
Applied packs:
Task source:
not reproduced; allow N/A only when a level
cannot observe the claim; verify sync/review.Completion threshold:
.agents/rules/task.mdc, generated .agents/skills/task/SKILL.md, and
docs/plans/templates/task.md require the repro escalation ladder before
not reproduced.pnpm install syncs generated skills after rule edits.node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-15-task-repro-escalation-ladder.md passes.Verification surface:
rg across task rule, generated task skill, and task
template.pnpm install, pnpm lint:fix, local autoreview, and
node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-15-task-repro-escalation-ladder.md.Constraints:
Boundaries:
.agents/rules/*.mdc owns
generated skill mirrors..agents/rules/task.mdc, generated
.agents/skills/task/SKILL.md via pnpm install,
docs/plans/templates/task.md, and this active plan.Output budget strategy:
sed/rg reads with output caps; no broad repo scans.Blocked condition:
Task state:
Current verdict:
Pre-solution issue challenge:
pnpm install.Completion rule:
update_goal(status: complete) while any required checklist item
remains unchecked. If an item does not apply, check it and add N/A: <reason>.update_goal(status: complete) until every completion threshold
above is satisfied, final handoff evidence is recorded, and
node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-15-task-repro-escalation-ladder.md passes.Start Gates:
| Gate | Applies | Evidence |
|---|---|---|
| Skill analysis before edits | yes | Loaded task and autogoal; Browser is a workflow policy target, not a live browser run. |
| Active goal checked or created | yes | get_goal returned none; create_goal created this objective. |
| Source of truth read before edits | yes | User request, task skill, autogoal skill, task rule, and task template read. |
| Tracker comments and attachments read | N/A: no tracker item | User request is the source. |
| Video transcript evidence required | N/A: no video | No tracker video evidence. |
| Pre-solution issue challenge required | yes | This repair extends that gate. |
| Reproduction verdict before implementation | N/A: workflow repair | No product bug; source-backed workflow claim. |
| Repro escalation ladder selected | yes | tests/source -> Playwright -> Browser -> screenshot/visual waiver. |
| Suggested fix reviewed against durable boundary | yes | Best boundary is task rule/template, not ad hoc issue-by-issue behavior. |
docs/solutions checked for non-trivial existing-code work | N/A: workflow rule/template edit | No product implementation domain. |
| TDD decision before behavior change or bug fix | N/A: no runtime behavior bug | Source audit/review is the proof. |
| Branch decision for code-changing task | N/A: user did not ask for commit/PR | Edit current checkout only. |
| Release artifact decision | N/A: no package/runtime release | No changeset or registry changelog. |
| Browser tool decision for browser surface | N/A: no live browser surface | The Browser plugin is referenced in policy text only. |
| PR expectation decision | no | User asked for workflow update, not PR. |
| Tracker sync expectation decision | N/A: no tracker | No issue/Linear sync. |
| Output budget strategy recorded | yes | Focused reads/searches with caps. |
| Agent-native pack selected | yes | Task changes .agents/** workflow rules. |
| Agent-facing action surface identified | yes | Agents read .agents/skills/task/SKILL.md; source is .agents/rules/task.mdc. |
| Source rule versus generated mirror boundary identified | yes | Edit .agents/rules/task.mdc, regenerate skill mirror with pnpm install. |
agent-native-reviewer loaded or waiver recorded | N/A: no separate agent-native review needed | Source/generation audit covers the changed agent instruction surface. |
Work Checklist:
<video-transcripts> XML, or marked N/A with reason.valid, not reproduced, invalid,
wont-fix, partially valid, or platform limitation. Feature, docs,
support, or cleanup requests with no bug claim may mark reproduction
N/A with reason.[@Browser](plugin://browser@openai-bundled)
next when Playwright cannot reproduce or cannot model the surface
honestly; screenshot or explicit visual-proof waiver when visual/native
state matters..agents/**, .claude/**,
.codex/**, skills, hooks, commands, prompts, or user-action tooling..agents/rules/** changed, or N/A reason is recorded.Completion Gates:
| Gate | Applies | Required action | Evidence |
|---|---|---|---|
| Named verification threshold | yes | Run the command, proof, source audit, or artifact check named in this plan | pnpm install, pnpm lint:fix, source audit, agent-native source audit, and autoreview pass/fix cycle completed. |
| Pre-solution issue challenge verdict | yes | Record reporter claim, suggested fix, repro verdict, validity verdict, durable boundary, and hard-stop/pivot decision before implementation | Verdict recorded above; this is workflow repair, not product bug. |
| Repro escalation ladder | yes | For bug/behavior claims, record test/source-level, Playwright, Browser, and screenshot/visual-proof outcomes or N/A/blocker reasons before not reproduced | Template and rule now require tests/source -> Playwright -> Browser -> screenshot/visual waiver before not reproduced. |
| Bug reproduced before fix | N/A: workflow repair, not product bug | Record failing test/repro or N/A with reason | Repro ladder recorded as N/A for this repair. |
| Targeted behavior verification | yes | Run focused test/proof for changed behavior or record N/A | Source audit proves source rule, generated skill, and task template contain the ladder. |
| TypeScript or typed config changed | N/A: markdown/rule text only | Run relevant typecheck | No TS or typed config files changed. |
| Package exports or file layout changed | N/A: no package exports/file layout | Run pnpm brl before final verification and keep generated barrel updates | No barrel or export surface changed. |
| Package manifests, lockfile, or install graph changed | N/A: no manifest/lockfile edit | Run pnpm install and relevant package checks | pnpm install still ran for skill sync; lockfile was up to date. |
| Agent rules or skills changed | yes | Run pnpm install and verify generated skill sync | pnpm install passed and skiller applied Codex rules. |
| Workspace authority proof | yes | Run verification in the owning repo/package/app/route/tool and record cwd; do not count the wrong workspace as proof | All commands ran in /Users/zbeyens/git/plate, which owns task rules/templates. |
| Browser surface changed | N/A: policy text only | Capture Browser Use proof or record explicit waiver/blocker | No live Browser route changed; policy now tells future tasks when to use Browser. |
| Browser final proof | N/A: no live browser surface | Attach screenshot or exact browser verification caveat when browser proof applies | No screenshot needed for docs/rule text change. |
| CI-controlled template output changed | N/A: no CI-controlled template output | Restore generated template output or record why it is intentionally kept | docs/plans/templates/task.md is source template. |
| Package behavior or public API changed | N/A: no package behavior/API | Add a changeset or record why no changeset applies | No changeset needed. |
| Registry-only component work changed | N/A: no registry component work | Update tooling/data/plate-ui-changelog.mdx, run node tooling/scripts/generate-ui-changelog-entries.mjs --write, or record N/A | No registry files changed. |
| Docs or content changed | yes | For docs-heavy work, use --template docs; for incidental docs, verify source-backed claims, links, examples, and rendered output or record N/A | Workflow template/rule docs changed; source-backed audit passed. |
| High-risk mini gate | yes | For public API/runtime/package-boundary/browser/agent-action/command-contract changes, record realistic failure mode, proof plan, and why the chosen boundary is right; otherwise N/A | Failure mode: tasks stop after tests fail to repro a browser-only issue; fixed by ladder requiring escalation to Playwright and Browser before not reproduced. |
| Agent-native review for agent/tooling changes | yes | For .agents/**, .claude/**, .codex/**, skills, hooks, commands, prompts, or user-action tooling, load .agents/skills/agent-native-reviewer/SKILL.md and close accepted/actionable findings, or record N/A | Agent-native source audit: generated .agents/skills/task/SKILL.md includes ladder and points to .agents/rules/task.mdc. |
| Local install corruption suspected | N/A: no suspicious failure | Run pnpm run reinstall once, rerun the exact failing command, or record N/A | No install-corruption signal. |
| Autoreview for non-trivial implementation changes | yes | Load .agents/skills/autoreview/SKILL.md; use dirty local --mode local, branch/PR --mode branch --base <base>, or committed slice --mode commit --commit <ref> until no accepted/actionable findings, or record N/A for docs-only/trivial/no local patch | First two runs found plan-state P2s; plan fixed before final rerun. Final rerun clean: no accepted/actionable findings. |
| PR create or update | N/A: no PR requested and no tracker source | Run check before PR work and sync PR body to the task-style final handoff | User asked for local workflow update, not PR. |
| Task-style PR body verified | N/A: no PR | Verify the PR body with gh pr view --json body; it must preserve auto-release blocks when applicable, must not include a current-PR self-link, and must use the kitcn PR #270 emoji format: ๐ Fixes ..., ๐ข 95-100% confidence, Phase / ๐งช Tests / ๐ Browser table, and bold emoji Outcome/Caveat/Design/Verified sections | No PR body exists. |
| PR proof image hosting | N/A: no PR/browser proof | If PR body needs browser proof, replace local image paths with hosted GitHub URLs or record N/A | No images needed. |
| Tracker sync-back | N/A: no tracker source | Post concise issue/Linear sync after PR exists, or record N/A/blocker | No issue/Linear item. |
| Final handoff contract | yes | Fill the final handoff fields below with exact PR/issue/confidence/tests/browser/outcome/caveats/design/verification content or N/A reason | Final handoff fields filled below. |
| Final lint | yes | Run pnpm lint:fix or scoped equivalent | pnpm lint:fix passed; no fixes applied. |
| Output budget discipline | yes | Verify no unbounded high-volume command output was streamed, or record the accidental output and recovery | Used scoped sed/rg reads and command output caps. |
| Goal plan complete | yes | Run node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-15-task-repro-escalation-ladder.md | Final mechanical check passed. |
| Agent source / generated sync | yes | Run pnpm install when .agents/rules/** changed and verify generated mirrors | pnpm install ran; generated task skill includes ladder. |
| Agent action discoverability | yes | Source-audit the skill/rule path an agent will read | rg found ladder in .agents/skills/task/SKILL.md. |
| Agent-native review | yes | Load .agents/skills/agent-native-reviewer/SKILL.md and close accepted findings, or record N/A | Skill loaded; no action parity gap for instruction-only workflow change. |
Phase / pass table:
| Phase | Status | Evidence | Next |
|---|---|---|---|
| Intake and source read | complete | source rule, template, and skill docs read | implementation |
| Implementation | complete | task rule/template patched; generated skill synced | verification |
| Verification | complete | lint, source audits, agent-native source audit, autoreview finding accepted/fixed | closeout |
| PR / tracker sync | N/A: user did not ask for PR and no tracker applies | no PR/tracker owner | final response |
| Closeout | complete | plan filled after autoreview finding | final response |
Findings:
not reproduced.Decisions and tradeoffs:
.agents/skills/task/SKILL.md directly would drift.Implementation notes:
.agents/rules/task.mdc and docs/plans/templates/task.md.pnpm install to regenerate .agents/skills/task/SKILL.md.Review fixes:
Error attempts:
| Error / failed attempt | Count | Next different move | Resolution |
|---|---|---|---|
| Initial patch missed generated plan shape | 1 | Read exact plan and apply smaller patches | Fixed. |
| First autoreview found unfinished plan | 1 | Fill completion gates and evidence | Fixed. |
| Second autoreview found stale pending final-review text | 1 | Remove stale pending text and rerun review | Fixed. |
| Third autoreview found generic Playwright could bypass Browser policy | 1 | Limit Playwright to repo-owned regression harness and keep Browser proof path | Fixed. |
Verification evidence:
pnpm install in /Users/zbeyens/git/plate: passed; skiller applied Codex
rules and regenerated .agents/skills/task/SKILL.md.rg -n "Escalate reproduction|Playwright|\\[@Browser\\]|screenshot|Repro escalation ladder"
across .agents/rules/task.mdc, .agents/skills/task/SKILL.md,
docs/plans/templates/task.md, and this plan: passed.pnpm lint:fix in /Users/zbeyens/git/plate: passed; no fixes applied..agents/rules/task.mdc..agents/skills/autoreview/scripts/autoreview --mode local: first run found
unfinished-plan P2; second run found stale pending-review text; third run
found generic Playwright could bypass Browser policy; all fixed. Final rerun
clean with no accepted/actionable findings.node .agents/skills/autogoal/scripts/check-complete.mjs docs/plans/2026-06-15-task-repro-escalation-ladder.md:
passed.Final handoff contract:
not reproduced..agents/rules/task.mdc plus docs/plans/templates/task.md,
with generated .agents/skills/task/SKILL.md synced by pnpm install.autogoal lifecycle is fine; this is task-specific
public issue repro behavior.pnpm install, pnpm lint:fix, source audit, agent-native source
audit, autoreview pass/fix cycle, and goal checker.Task-style PR body contract:
<!-- auto-release:start --> block. If a changeset is
part of the diff and repo policy expects auto release, include that block.๐ Fixes #123 or ๐ Fixes โ N/A, then
an emoji confidence line like ๐ข 95-100% confidence.| Phase | ๐งช Tests | ๐ Browser |.Reproduced and Verified rows. Mark passing proof with ๐ข, repro or
failing proof with ๐ด, and non-applicable cells with โ N/A.**โ
Outcome**, **โ ๏ธ Caveat**,
**๐๏ธ Design**, and **๐งช Verified**.Summary / Verification PR body, an
adaptive prose body from a git helper skill, plain ## Outcome sections, or
an unrelated generated badge footer unless the caller or repo template
explicitly asks for it.gh pr view --json body output or a concise source-backed summary
of that output.Final handoff / sync:
Timeline:
pnpm install to sync generated task skill.pnpm lint:fix; passed.Reboot status:
| Question | Answer |
|---|---|
| Where am I? | Closeout |
| Where am I going? | Update goal complete, final response |
| What is the goal? | Add tests -> Playwright -> Browser repro escalation before not-reproduced |
| What have I learned? | The rule change was fine, but durable plan state must be closed before handoff |
| What have I done? | Updated task rule/template, regenerated skill, linted, audited, accepted review finding |
Open risks: