skills/diagnose-why-work-stopped/SKILL.md
A repeatable procedure for the recurring class of issues where the user (or a manager) points at a stalled / looping / over-recovered issue tree and asks "why did this stop / why is this looping / how do we make sure this doesn't happen again?"
This skill is diagnostic + product-design, not engineering. The output is a written root cause and an approved plan. No code changes leave this skill.
Canonical execution model: read doc/execution-semantics.md before diagnosing or proposing a new liveness/recovery rule. Use that document as the source of truth for status, action-path, post-run disposition, bounded continuation, productivity review, pause-hold, watchdog, and explicit recovery semantics. If the investigation finds a true product-rule gap, the plan should say whether doc/execution-semantics.md needs a matching update.
Trigger on an assignment whose title or body matches any of:
Also use when the user asks for forensics, root cause, or a write-up before any product change.
Every diagnosis and every proposed rule must hold these three invariants together. The user has restated them on at least four issues; treat them as load-bearing:
If a proposed rule violates any of the three, drop it or rework it. State explicitly in the plan how each invariant is held.
Before walking the tree, read doc/execution-semantics.md and keep its terms intact:
run_liveness_continuationDo not invent a new rule until you can state how it differs from the current execution semantics document.
Do this in the same heartbeat. Do not propose a rule until you have a concrete stop point.
in_review with no typed execution participant, no active run, no pending interaction, no recovery issue (PAP-2335, PAP-2674).in_progress after a successful run with no future action path queued (PAP-2674).cancelled / malformed / cross-company-inaccessible (PAP-2602).issue.continuation_recovery waking the same issue >N times after successful runs (PAP-2602).Respect the API boundary. If the linked issue is in another company and your agent token returns 403, do not bypass scoping. Either request a board-approved diagnostic path or proceed from inferred PAP-side evidence and label it.
Before proposing a new product rule, read what already shipped this week in the same area. The user has explicitly called this out: (PAP-2602) "review our recent work on liveness that we shipped in the last couple of days." A new rule that contradicts code merged 48 hours ago is rework, not improvement.
Quick survey:
State in the forensics: "I reviewed X, Y, Z. The new gap is …"
For every issue in the affected tree that is not done / cancelled / actively running, decide:
This is the table the user has asked for repeatedly (PAP-2335). Without it the plan is abstract.
The user does not want a one-off patch on the named tree. They want the rule. Two checks:
doc/execution-semantics.md. Prefer citing and applying the existing contract; propose a document change only when the current doc is incomplete or contradicted by accepted/implemented behavior.If the rule would have blocked a recent productive run from succeeding, drop or narrow it.
Write the plan into the issue's plan document. Cover:
doc/execution-semantics.md contract already covers the case, or what exact documentation update is needed.Phase 0 resolves the named live tree (carefully, not destructively), Phase 1 codifies the contract in docs, then implementation phases for detection, recovery, UI surfacing, security review, QA, and CTO review.blockedByIssueIds, parallel branches identified.Do not create the child issues yet. Do not push code.
request_confirmation interaction targeting the latest plan revision. Idempotency key confirmation:{issueId}:plan:{revisionId}.Phase 0 cleans up the live tree without papering over evidence:
in_review leaves with no participant to todo with a precise next action and named owner (PAP-2335).done to clear backlog.When the phase chain is complete, post a board-level summary comment on the parent issue: what changed, what the new contract is, what the rollout step is (e.g. "restart the control-plane to pick up the new response shape"), and the live state of the originally-named tree. Then close the parent.
request_confirmation against the latest plan revision is open.blockedByIssueIds dependencies.