doc/execution-semantics.md
Status: Current implementation guide Date: 2026-04-26 Audience: Product and engineering
This document explains how Paperclip interprets issue assignment, issue status, execution runs, wakeups, parent/sub-issue structure, and blocker relationships.
doc/SPEC-implementation.md remains the V1 contract. This document is the detailed execution model behind that contract.
Paperclip separates four concepts that are easy to blur together:
The system works best when those are kept separate.
An issue has at most one assignee.
assigneeAgentId means the issue is owned by an agentassigneeUserId means the issue is owned by a human board userThis is a hard invariant. Paperclip is single-assignee by design.
Paperclip issue statuses are not just UI labels. They imply different expectations about ownership and execution.
backlogThe issue is not ready for active work.
todoThe issue is actionable but not actively claimed.
in_progressThe issue is actively owned work.
For agent-owned issues, in_progress should not be allowed to become a silent dead state.
blockedThe issue cannot proceed until something external changes.
This is the right state for:
in_reviewExecution work is paused because the next move belongs to a reviewer or approver, not the current executor.
doneThe work is complete and terminal.
cancelledThe work will not continue and is terminal.
The execution model differs depending on assignee type.
Agent-owned issues are part of the control plane's execution loop.
User-owned issues are not executed by the heartbeat scheduler.
This is why in_progress can be strict for agents without forcing the same runtime rules onto human-held work.
Checkout is the bridge from issue ownership to active agent execution.
in_progresscheckoutRunId represents issue-ownership lock for the current agent runexecutionRunId represents the currently active execution path for the issueThese are related but not identical:
checkoutRunId answers who currently owns execution rights for the issueexecutionRunId answers which run is actually live right nowPaperclip already clears stale execution locks and can adopt some stale checkout locks when the original run is gone.
Paperclip uses two different relationships for different jobs.
parentId)This is structural.
Use it for:
Do not treat parentId as execution dependency by itself.
blockedByIssueIds)This is dependency semantics.
Use it for:
Blocked issues should stay idle while blockers remain unresolved. Paperclip should not create a queued heartbeat run for that issue until the final blocker is done and the issue_blockers_resolved wake can start real work.
If a parent is truly waiting on a child, model that with blockers. Do not rely on the parent/child relationship alone.
For agent-owned, non-terminal issues, Paperclip should never leave work in a state where nobody is responsible for the next move and nothing will wake or surface it.
This is a visibility contract, not an auto-completion contract. If Paperclip cannot safely infer the next action, it should surface the ambiguity with a blocked state, a visible comment, or an explicit recovery issue. It must not silently mark work done from prose comments or guess that a dependency is complete.
An issue is healthy when the product can answer "what moves this forward next?" without requiring a human to reconstruct intent from the whole thread. An issue is stalled when it is non-terminal but has no live execution path, no explicit waiting path, and no recovery path.
The valid action-path primitives are:
executionState.currentParticipantassigneeUserIdtodoThis is dispatch state: ready to start, not yet actively claimed.
A healthy dispatch state means at least one of these is true:
todo after a completed agent heartbeat, with no interrupted dispatch evidenceAn assigned todo issue is stalled when dispatch was interrupted, no wake remains queued or running, and no recovery path has been opened.
in_progressThis is active-work state.
A healthy active-work state means at least one of these is true:
An agent-owned in_progress issue is stalled when it has no active run, no queued continuation, and no explicit recovery surface. A still-running but silent process is not automatically stalled; it is handled by the active-run watchdog contract.
in_reviewThis is review/approval state: execution is paused because the next move belongs to a reviewer, approver, board user, or recovery owner.
A healthy in_review issue has at least one valid action path:
assigneeUserIdAgent-assigned in_review with no typed participant is only healthy when one of the other paths exists. Assignment to the same agent that produced the handoff is not, by itself, a review path.
An in_review issue is stalled when it has no typed participant, no pending interaction or approval, no user owner, no active run, no queued wake, and no explicit recovery issue. Paperclip should surface that state as recovery work rather than silently completing the issue or leaving blocker chains parked indefinitely.
blockedThis is explicit waiting state.
A healthy blocked issue has an explicit waiting path:
A blocker chain is covered only when its unresolved leaf is live or explicitly waiting. An intermediate blocked issue does not make the chain healthy by itself.
A blocked issue is stalled when the unresolved blocker leaf has no active run, queued wake, typed participant, pending interaction or approval, user owner, external owner/action, or recovery issue. In that case the parent should show the first stalled leaf instead of presenting the dependency as calmly covered.
Paperclip now treats crash/restart recovery as a stranded-assigned-work problem, not just a stranded-run problem.
There are two distinct failure modes.
todoExample:
todoRecovery rule:
blocked and posts a visible commentThis is a dispatch recovery, not a continuation recovery.
in_progressExample:
in_progressRecovery rule:
blocked and posts a visible commentThis is an active-work continuity recovery.
Startup recovery and periodic recovery are different from normal wakeup delivery.
On startup and on the periodic recovery loop, Paperclip now does four things in sequence:
running runsqueued runsThe stranded-work pass closes the gap where issue state survives a crash but the wake/run path does not. The silent-run scan covers the separate case where a live process exists but has stopped producing observable output.
An active run can still be unhealthy even when its process is running. Paperclip treats prolonged output silence as a watchdog signal, not as proof that the run is failed.
The recovery service owns this contract:
ok, suspicious, critical, snoozed, or not_applicablestale_active_run_evaluation issue per runoutputSilence summary shown by live-run and active-run API responsesSuspicious silence creates a medium-priority review issue for the selected recovery owner. Critical silence raises that review issue to high priority and blocks the source issue on the explicit evaluation task without cancelling the active process.
Watchdog decisions are explicit operator/recovery-owner decisions:
snooze records an operator-chosen future quiet-until time and suppresses scan-created review work during that windowcontinue records that the current evidence is acceptable, does not cancel or mutate the active run, and sets a 30-minute default re-arm window before the watchdog evaluates the still-silent run againdismissed_false_positive records why the review was not actionableOperators should prefer snooze for known time-bounded quiet periods. continue is only a short acknowledgement of the current evidence; if the run remains silent after the re-arm window, the periodic watchdog scan can create or update review work again.
The board can record watchdog decisions. The assigned owner of the watchdog evaluation issue can also record them. Other agents cannot.
Paperclip uses three different recovery outcomes, depending on how much it can safely infer.
Auto-recovery is allowed when ownership is clear and the control plane only lost execution continuity.
Examples:
todo issue whose latest run failed, timed out, or was cancelledin_progress issue whose live execution path disappearedAuto-recovery preserves the existing owner. It does not choose a replacement agent.
Paperclip creates an explicit recovery issue when the system can identify a problem but cannot safely complete the work itself.
Examples:
The source issue remains visible and blocked on the recovery issue when blocking is necessary for correctness. The recovery owner must restore a live path, resolve the source issue manually, or record the reason it is a false positive.
Instance-level issue-graph liveness auto-recovery is disabled by default. When enabled, its lookback window means "dependency paths updated within the last N hours"; older findings remain advisory and are counted as outside the configured lookback instead of creating recovery issues automatically. This is an operator noise control, not the older staleness delay for determining whether a chain is old enough to surface.
Human escalation is required when the next safe action depends on board judgment, budget/approval policy, or information unavailable to the control plane.
Examples:
In these cases Paperclip should leave a visible issue/comment trail instead of silently retrying.
These semantics do not change V1 into an auto-reassignment system.
Paperclip still does not:
parentId aloneThe recovery model is intentionally conservative:
For a board operator, the intended meaning is:
in_progress should mean "this is live work or clearly surfaced as a problem"todo should not stay assigned forever after a crash with no remaining wake pathThat is the execution contract Paperclip should present to operators.