Convoy Lifecycle Design

Making convoys actively converge on completion.

Flow

mermaid

flowchart TB
    %% ---- Creation ----
    create_manual(["gt convoy create"])
    create_sling(["gt sling
(auto-convoy)"])
    create_formula(["gt formula run
(convoy type)"])

    create_manual --> convoy["Convoy bead created
(hq-cv-*)"]
    create_sling --> convoy
    create_formula --> convoy

    convoy --> track["Track issues
via dep add --type=tracks"]

    %% ---- Dispatch ----
    track --> dispatch["gt sling dispatches
work to polecats"]
    dispatch --> working["Polecats execute
tracked issues"]

    %% ---- Completion detection ----
    working --> issue_close["Issue closes
(gt done / bd close)"]

    issue_close --> check{"All tracked
issues closed?"}

    check -->|No| feed["Feed next ready issue
to available polecat"]
    feed --> working

    check -->|Yes| close["Close convoy +
send notifications"]
    close --> done(["Convoy landed"])

    %% ---- Daemon observer (event-driven + stranded scan) ----
    issue_close -.-> obs_daemon["Daemon ConvoyManager
Event poll (5s) + CheckConvoysForIssue
Stranded scan (30s) as safety net"]

    obs_daemon -.-> check

    %% ---- Manual overrides ----
    force_close(["gt convoy close --force"]) -.-> close
    land(["gt convoy land
(owned convoys)"]) -.-> close

    classDef successNode fill:#27ae60,color:#fff,stroke:none
    classDef extNode fill:#34495e,color:#ecf0f1,stroke:none

    class done successNode
    class obs_daemon extNode

Three creation paths feed into the same lifecycle. Completion is event-driven via the daemon's ConvoyManager, which runs two goroutines:

Event poll (every 5s): Polls all rig beads stores + hq via GetAllEventsSince, detects close events, and calls convoy.CheckConvoysForIssue — which both checks completion and feeds the next ready issue to a polecat.
Stranded scan (every 30s): Runs gt convoy stranded --json to catch convoys missed by the event-driven path (e.g. after crash/restart). Feeds ready work or auto-closes empty convoys.

Manual overrides (close --force, land) bypass the check entirely.

History: Witness and Refinery observers were originally planned as redundant observers but were removed (spec S-04, S-05). The daemon's multi-rig event poll + stranded scan provide sufficient coverage.

Auto-convoy creation: what `gt sling` actually does

gt sling auto-creates a convoy for every bead it dispatches, unless --no-convoy is passed. The behavior differs significantly between single-bead and multi-bead (batch) sling.

Single-bead sling

gt sling sh-task-1 gastown

Checks if sh-task-1 is already tracked by an open convoy.
If not tracked: creates one auto-convoy "Work: <issue-title>" tracking that single bead.
Spawns one polecat, hooks the bead, starts working.

Result: 1 bead, 1 convoy, 1 polecat.

Batch sling (3+ args, rig auto-resolved)

gt sling gt-task-1 gt-task-2 gt-task-3

The rig is auto-resolved from the beads' prefixes via routes.jsonl (resolveRigFromBeadIDs in sling_batch.go). All beads must resolve to the same rig. An explicit rig arg still works but prints a deprecation warning:

gt sling gt-task-1 gt-task-2 gt-task-3 gastown
# Deprecation: gt sling now auto-resolves the rig from bead prefixes.
#              You no longer need to explicitly specify <gastown>.

Batch sling creates one convoy tracking all beads. Before spawning any polecats, runBatchSling (sling_batch.go) calls createBatchConvoy (sling_convoy.go) which creates a single convoy with title "Batch: N beads to <rig>" and adds tracks deps for all beads.

Result: 3 beads, 1 convoy, 3 polecats -- all dispatched in parallel with 2-second delays between spawns.

The convoy ID and merge strategy are stored on each bead via beadFieldUpdates, so gt done can find the convoy via the fast path.

There is no upper limit on the number of beads. gt sling <10 beads> spawns 10 polecats sharing 1 convoy. The only throttle is --max-concurrent (default 0 = unlimited).

Rig resolution errors

When the rig is explicit, the cross-rig guard checks each bead's prefix against the target rig. On mismatch, it errors with batch-specific suggested actions (remove the bead, sling it separately, or --force).

When auto-resolving, resolveRigFromBeadIDs errors if:

A bead has no valid prefix
A prefix is not mapped in routes.jsonl (including town-level path=".")
Beads resolve to different rigs (lists each bead's rig, suggests slinging separately)

Already-tracked bead conflict

If any bead in the batch is already tracked by another convoy, batch sling errors before spawning any polecats. It prints:

Which bead conflicts and which convoy it belongs to
All beads in the existing convoy with their statuses
The conflicting bead highlighted
4 recommended actions (remove from batch, move bead, close old convoy, add to existing)

Initial dispatch vs daemon feeding

Initial dispatch is parallel. All beads get polecats sequentially with 2-second delays between spawns, but they all dispatch in the same batch sling call regardless of deps. Even if gt-task-2 has a blocks dep on gt-task-1, both get slung. The isIssueBlocked check only applies to daemon-driven convoy feeding (after a close event), not to the initial batch sling dispatch.
Subsequent feeding respects deps. When a task closes, the daemon's event-driven feeder checks IsSlingableType and isIssueBlocked before dispatching the next ready issue from the shared convoy.

No-convoy mode

Batch sling with --no-convoy skips convoy creation entirely:

gt sling gt-task-1 gt-task-2 gt-task-3 gastown --no-convoy

Problem Statement

Convoys are passive trackers. They group work but don't drive it. The completion loop has a structural gap:

Create → Assign → Execute → Issues close → ??? → Convoy closes

The ??? is "Deacon patrol runs gt convoy check" - a poll-based single point of failure. When Deacon is down, convoys don't close. Work completes but the loop never lands.

Current State

What Works

Convoy creation and issue tracking
gt convoy status shows progress
gt convoy stranded finds unassigned work
gt convoy check auto-closes completed convoys

What Breaks

Poll-based completion: Only Deacon runs gt convoy check
No event-driven trigger: Issue close doesn't propagate to convoy
Manual close is inconsistent across docs: gt convoy close --force exists, but some docs still describe it as missing
Single observer: No redundant completion detection
Weak notification: Convoy owner not always clear

Design: Active Convoy Convergence

Principle: Event-Driven, Centrally Managed

Convoy completion should be:

Event-driven: Triggered by issue close, not polling
Centrally managed: Single owner (daemon) avoids scattered side-effect hooks
Manually overridable: Humans can force-close

Event-Driven Completion

When an issue closes, check if it's tracked by a convoy:

mermaid

flowchart TD
    close["Issue closes"] --> tracked{"Tracked by convoy?"}
    tracked -->|No| done1(["Done"])
    tracked -->|Yes| check["Run gt convoy check"]
    check --> all{"All tracked
issues closed?"}
    all -->|No| done2(["Done"])
    all -->|Yes| notify["Close convoy +
send notifications"]

Implementation: The daemon's ConvoyManager event poll detects close events via SDK GetAllEventsSince across all rig stores + hq. This catches all closes regardless of source (CLI, witness, refinery, manual).

Observer: Daemon ConvoyManager

The daemon's ConvoyManager is the sole convoy observer, running two independent goroutines:

Loop	Trigger	What it does
Event poll	`GetAllEventsSince` every 5s (all rig stores + hq)	Detects close events, calls `CheckConvoysForIssue`
Stranded scan	`gt convoy stranded --json` every 30s	Feeds first ready issue via `gt sling`, auto-closes empty convoys

Both loops are context-cancellable. The shared CheckConvoysForIssue function is idempotent — closing an already-closed convoy is a no-op.

History: The original design called for three redundant observers (Daemon, Witness, Refinery) per the "Redundant Monitoring Is Resilience" principle. Witness observers were removed (spec S-04) because convoy tracking is orthogonal to polecat lifecycle management. Refinery observers were removed (spec S-05) after S-17 found they were silently broken (wrong root path) with no visible impact, confirming single-observer coverage is sufficient.

Issue-to-Rig Resolution

Convoys are rig-agnostic. A convoy like hq-cv-6vjz2 lives in the hq store and tracks issue IDs like sh-pb6sa — but it doesn't store which rig that issue belongs to. The rig association is resolved at dispatch time via two lookups:

Extract prefix: sh-pb6sa → sh- (string parsing)
Resolve rig: look up sh- in ~/gt/.beads/routes.jsonl → finds {"prefix":"sh-","path":"gastown/.beads"} → rig name is gastown

This happens in feedFirstReady (stranded scan path) and feedNextReadyIssue (event poll path) just before calling gt sling. Issues with prefixes that don't appear in routes.jsonl (or that map to path="." like hq-*) are skipped — see isSlingableBead().

Both paths check isRigParked after resolving the rig name. Issues targeting parked rigs are logged and skipped rather than dispatched.

Manual Close Command

gt convoy close is implemented, including --force for abandoned convoys.

bash

# Close a completed convoy
gt convoy close hq-cv-abc

# Force-close an abandoned convoy
gt convoy close hq-cv-xyz --reason="work done differently"

# Close with explicit notification
gt convoy close hq-cv-abc --notify mayor/

Use cases:

Abandoned convoys no longer relevant
Work completed outside tracked path
Force-closing stuck convoys

Convoy Owner/Requester

Track who requested the convoy for targeted notifications:

bash

gt convoy create "Feature X" gt-abc --owner mayor/ --notify overseer

Field	Purpose
`owner`	Who requested (gets completion notification)
`notify`	Additional subscribers

If owner not specified, defaults to creator (from created_by).

Convoy States

OPEN ──(all issues close)──► CLOSED
  │                             │
  │                             ▼
  │                    (add issues)
  │                             │
  └─────────────────────────────┘
         (auto-reopens)

Adding issues to closed convoy reopens automatically.

New state for abandonment:

OPEN ──► CLOSED (completed)
  │
  └────► ABANDONED (force-closed without completion)

Timeout/SLA (Future)

Optional due_at field for convoy deadline:

bash

gt convoy create "Sprint work" gt-abc --due="2026-01-15"

Overdue convoys surface in gt convoy stranded --overdue.

Commands

Current: `gt convoy close`

bash

gt convoy close <convoy-id> [--reason=<reason>] [--notify=<agent>]

Verifies tracked issues are complete by default
--force closes even when tracked issues remain open
Sets close_reason field
Sends notification to owner and subscribers
Idempotent - closing closed convoy is no-op

Enhanced: `gt convoy check`

bash

# Check all convoys (current behavior)
gt convoy check

# Check specific convoy (new)
gt convoy check <convoy-id>

# Dry-run mode
gt convoy check --dry-run

Future: `gt convoy reopen`

bash

gt convoy reopen <convoy-id>

Explicit reopen for clarity (currently implicit via add).

Implementation Status

Core convoy manager is fully implemented and tested (see spec.md stories S-01 through S-18, all DONE). Remaining future work:

P2: Owner field - targeted notifications polish
P3: Timeout/SLA - deadline tracking

Key Files

Component	File
Convoy command	`internal/cmd/convoy.go`
Auto-convoy (sling)	`internal/cmd/sling_convoy.go`
Convoy operations	`internal/convoy/operations.go` (`CheckConvoysForIssue`, `feedNextReadyIssue`)
Daemon manager	`internal/daemon/convoy_manager.go`
Formula convoy	`internal/cmd/formula.go` (`executeConvoyFormula`)

convoy.md - Convoy concept and usage
watchdog-chain.md - Daemon/boot/deacon watchdog chain
mail-protocol.md - Notification delivery

Convoy Lifecycle Design

Convoy Lifecycle Design

Flow

Auto-convoy creation: what gt sling actually does

Single-bead sling

Batch sling (3+ args, rig auto-resolved)

Rig resolution errors

Already-tracked bead conflict

Initial dispatch vs daemon feeding

No-convoy mode

Problem Statement

Current State

What Works

What Breaks

Design: Active Convoy Convergence

Principle: Event-Driven, Centrally Managed

Event-Driven Completion

Observer: Daemon ConvoyManager

Issue-to-Rig Resolution

Manual Close Command

Convoy Owner/Requester

Convoy States

Timeout/SLA (Future)

Commands

Current: gt convoy close

Enhanced: gt convoy check

Future: gt convoy reopen

Implementation Status

Key Files

Related

Auto-convoy creation: what `gt sling` actually does

Current: `gt convoy close`

Enhanced: `gt convoy check`

Future: `gt convoy reopen`