doc/execution-semantics.md
Status: Current implementation guide Date: 2026-04-26 Audience: Product and engineering
This document explains how Paperclip interprets issue assignment, issue status, execution runs, wakeups, parent/sub-issue structure, and blocker relationships.
doc/SPEC-implementation.md remains the V1 contract. This document is the detailed execution model behind that contract.
Paperclip separates four concepts that are easy to blur together:
The system works best when those are kept separate.
An issue has at most one assignee.
assigneeAgentId means the issue is owned by an agentassigneeUserId means the issue is owned by a human board userThis is a hard invariant. Paperclip is single-assignee by design.
Paperclip issue statuses are not just UI labels. They imply different expectations about ownership and execution.
backlogThe issue is not ready for active work.
todoThe issue is actionable but not actively claimed.
in_progressThe issue is actively owned work.
For agent-owned issues, in_progress should not be allowed to become a silent dead state.
blockedThe issue cannot proceed until something external changes.
This is the right state for:
in_reviewExecution work is paused because the next move belongs to a reviewer or approver, not the current executor.
An external review service can also be a valid review path when the issue keeps an agent assignee and has an active one-shot monitor that will wake that assignee to check the service later.
doneThe work is complete and terminal.
cancelledThe work will not continue and is terminal.
The execution model differs depending on assignee type.
Agent-owned issues are part of the control plane's execution loop.
User-owned issues are not executed by the heartbeat scheduler.
This is why in_progress can be strict for agents without forcing the same runtime rules onto human-held work.
Checkout is the bridge from issue ownership to active agent execution.
in_progresscheckoutRunId represents issue-ownership lock for the current agent runexecutionRunId represents the currently active execution path for the issueThese are related but not identical:
checkoutRunId answers who currently owns execution rights for the issueexecutionRunId answers which run is actually live right nowPaperclip already clears stale execution locks and can adopt some stale checkout locks when the original run is gone.
Paperclip uses two different relationships for different jobs.
parentId)This is structural.
Use it for:
Do not treat parentId as execution dependency by itself.
blockedByIssueIds)This is dependency semantics.
Use it for:
Blocked issues should stay idle while blockers remain unresolved. Paperclip should not create a queued heartbeat run for that issue until the final blocker is done and the issue_blockers_resolved wake can start real work.
If a parent is truly waiting on a child, model that with blockers. Do not rely on the parent/child relationship alone.
For agent-owned, non-terminal issues, Paperclip should never leave work in a state where nobody is responsible for the next move and nothing will wake or surface it.
This is a visibility contract, not an auto-completion contract. If Paperclip cannot safely infer the next action, it should surface the ambiguity with a blocked state, a visible comment, or an explicit recovery issue. It must not silently mark work done from prose comments or guess that a dependency is complete.
An issue is healthy when the product can answer "what moves this forward next?" without requiring a human to reconstruct intent from the whole thread. An issue is stalled when it is non-terminal but has no live execution path, no explicit waiting path, and no recovery path.
The valid action-path primitives are:
executionState.currentParticipantexecutionPolicy.monitor.nextCheckAt) that will wake the assignee for a future checkassigneeUserIdtodoThis is dispatch state: ready to start, not yet actively claimed.
A healthy dispatch state means at least one of these is true:
todo after a completed agent heartbeat, with no interrupted dispatch evidenceAn assigned todo issue is stalled when dispatch was interrupted, no wake remains queued or running, and no recovery path has been opened.
backlogThis is parked state, not dispatch state.
Assigning an issue normally implies executable intent. When create APIs receive an assignee and no explicit status, Paperclip defaults the issue to todo so the assignee has a wake path instead of silently inheriting the unassigned backlog default.
An explicit assigned backlog issue remains valid when the creator is deliberately parking the work. It must not wake the assignee just because it has an assignee. Paperclip should make that choice visible in activity and UI so operators can distinguish intentional parking from a missed handoff.
An assigned backlog issue becomes a liveness problem when another issue is blocked on it and there is no explicit waiting path such as a human owner, active run, queued wake, pending interaction or approval, monitor, or open recovery issue. In that case the blocked parent should surface "blocked by parked work" rather than treating the dependency chain as healthy.
in_progressThis is active-work state.
A healthy active-work state means at least one of these is true:
An agent-owned in_progress issue is stalled when it has no active run, no queued continuation, and no explicit recovery surface. A still-running but silent process is not automatically stalled; it is handled by the active-run watchdog contract.
in_reviewThis is review/approval state: execution is paused because the next move belongs to a reviewer, approver, board user, or recovery owner.
A healthy in_review issue has at least one valid action path:
assigneeUserIdAgent-assigned in_review with no typed participant is only healthy when one of the other paths exists. Assignment to the same agent that produced the handoff is not, by itself, a review path.
An in_review issue is stalled when it has no typed participant, no pending interaction or approval, no user owner, no active monitor, no active run, no queued wake, and no explicit recovery issue. Paperclip should surface that state as recovery work rather than silently completing the issue or leaving blocker chains parked indefinitely.
An issue monitor is a one-shot deferred action path for agent-owned issues in in_progress or in_review.
Use a monitor when the current assignee owns a future check against an async system or external service. Examples include Greptile review loops, GitHub checks, Vercel deployments, or provider jobs where the agent should come back later and decide what happens next.
Monitor policy lives under executionPolicy.monitor and includes:
nextCheckAt: when Paperclip should wake the assigneenotes: non-secret instructions for what the assignee should checkserviceName: optional non-secret external-service contextexternalRef: optional external-service reference input; Paperclip treats it as secret-adjacent, redacts it before persistence/visibility, and omits it from activity and wake payloadstimeoutAt, maxAttempts, and recoveryPolicy: optional recovery hints for bounded waitsMonitors are not recurring intervals. When a monitor fires, Paperclip clears the scheduled monitor and queues an issue_monitor_due wake for the assignee. If the external service is still pending, the assignee must explicitly re-arm the monitor with a new nextCheckAt. If the issue moves to done, cancelled, an invalid status, or a human/unassigned owner, the monitor is cleared.
Because serviceName and notes remain visible in issue activity and wake context, operators should keep them short and non-secret. Put enough context for the assignee to know what to inspect, but do not include signed URLs, bearer tokens, customer secrets, tenant-private identifiers, or provider links with embedded credentials.
Monitor bounds are enforced. Paperclip rejects attempts to re-arm a monitor whose timeoutAt or maxAttempts is already exhausted. When a scheduled monitor reaches an exhausted bound at trigger time, Paperclip clears it and follows recoveryPolicy: wake_owner queues a bounded recovery wake for the assignee, create_recovery_issue opens visible recovery work, and escalate_to_board records a board-visible escalation comment/activity.
Use blocked instead of a monitor when no Paperclip assignee owns a responsible polling path. In that case, name the external owner/action or create first-class recovery/blocker work.
blockedThis is explicit waiting state.
A healthy blocked issue has an explicit waiting path:
A blocker chain is covered only when its unresolved leaf is live or explicitly waiting. An intermediate blocked issue does not make the chain healthy by itself.
A blocked issue is stalled when the unresolved blocker leaf has no active run, queued wake, typed participant, pending interaction or approval, user owner, external owner/action, or recovery issue. In that case the parent should show the first stalled leaf instead of presenting the dependency as calmly covered.
Paperclip now treats crash/restart recovery as a stranded-assigned-work problem, not just a stranded-run problem.
There are two distinct failure modes.
todoExample:
todoRecovery rule:
blocked and posts a visible commentThis is a dispatch recovery, not a continuation recovery.
in_progressExample:
in_progressRecovery rule:
blocked and posts a visible commentThis is an active-work continuity recovery.
Startup recovery and periodic recovery are different from normal wakeup delivery.
On startup and on the periodic recovery loop, Paperclip now does four things in sequence:
running runsqueued runsThe stranded-work pass closes the gap where issue state survives a crash but the wake/run path does not. The silent-run scan covers the separate case where a live process exists but has stopped producing observable output.
An active run can still be unhealthy even when its process is running. Paperclip treats prolonged output silence as a watchdog signal, not as proof that the run is failed.
The recovery service owns this contract:
ok, suspicious, critical, snoozed, or not_applicablestale_active_run_evaluation issue per runoutputSilence summary shown by live-run and active-run API responsesSuspicious silence creates a medium-priority review issue for the selected recovery owner. Critical silence raises that review issue to high priority and blocks the source issue on the explicit evaluation task without cancelling the active process.
Watchdog decisions are explicit operator/recovery-owner decisions:
snooze records an operator-chosen future quiet-until time and suppresses scan-created review work during that windowcontinue records that the current evidence is acceptable, does not cancel or mutate the active run, and sets a 30-minute default re-arm window before the watchdog evaluates the still-silent run againdismissed_false_positive records why the review was not actionableOperators should prefer snooze for known time-bounded quiet periods. continue is only a short acknowledgement of the current evidence; if the run remains silent after the re-arm window, the periodic watchdog scan can create or update review work again.
The board can record watchdog decisions. The assigned owner of the watchdog evaluation issue can also record them. Other agents cannot.
Paperclip uses three different recovery outcomes, depending on how much it can safely infer.
Auto-recovery is allowed when ownership is clear and the control plane only lost execution continuity.
Examples:
todo issue whose latest run failed, timed out, or was cancelledin_progress issue whose live execution path disappearedAuto-recovery preserves the existing owner. It does not choose a replacement agent.
Paperclip creates an explicit recovery issue when the system can identify a problem but cannot safely complete the work itself.
Examples:
The source issue remains visible and blocked on the recovery issue when blocking is necessary for correctness. The recovery owner must restore a live path, resolve the source issue manually, or record the reason it is a false positive.
Instance-level issue-graph liveness auto-recovery is disabled by default. When enabled, its lookback window means "dependency paths updated within the last N hours"; older findings remain advisory and are counted as outside the configured lookback instead of creating recovery issues automatically. This is an operator noise control, not the older staleness delay for determining whether a chain is old enough to surface.
Human escalation is required when the next safe action depends on board judgment, budget/approval policy, or information unavailable to the control plane.
Examples:
In these cases Paperclip should leave a visible issue/comment trail instead of silently retrying.
These semantics do not change V1 into an auto-reassignment system.
Paperclip still does not:
parentId aloneThe recovery model is intentionally conservative:
For a board operator, the intended meaning is:
in_progress should mean "this is live work or clearly surfaced as a problem"todo should not stay assigned forever after a crash with no remaining wake pathThat is the execution contract Paperclip should present to operators.