Back to Ruflo

ADR-016: Collaborative Issue Claims for Human-Agent Workflows

v3/implementation/adrs/ADR-016-collaborative-issue-claims.md

3.6.3024.5 KB
Original Source

ADR-016: Collaborative Issue Claims for Human-Agent Workflows

Status: ✅ Complete Date: 2026-01-06 Completion Date: 2026-01-07

Context

Current v3 swarm coordination is agent-only. Real-world projects need:

  • Humans and agents working on the same issue backlog
  • Clear ownership to prevent duplicate work
  • Handoff mechanisms between humans and agents
  • Visibility into who (human or agent) is working on what

GitHub Issues work well for human teams but lack agent awareness. We need a claim system that bridges both worlds.

Decision

Implement a unified claim system for GitHub-style issues that supports both human and agent participants.

typescript
// Core types
interface IssueClaim {
  issueId: string;           // GitHub issue number or internal ID
  claimant: Claimant;        // Who claimed it
  claimedAt: Date;
  status: ClaimStatus;
  expiresAt?: Date;          // Auto-release for stale claims
  handoffTo?: Claimant;      // Pending handoff request
}

type Claimant =
  | { type: 'human'; userId: string; name: string }
  | { type: 'agent'; agentId: string; agentType: string };

type ClaimStatus =
  | 'active'                 // Currently working
  | 'paused'                 // Temporarily stopped
  | 'handoff-pending'        // Requesting transfer
  | 'review-requested'       // Needs human review
  | 'blocked'                // Waiting on dependency
  | 'stealable'              // Can be stolen by idle agents
  | 'completed';             // Done, awaiting merge

// Work stealing metadata
interface StealableInfo {
  reason: StealReason;
  stealableAt: Date;         // When it becomes stealable
  preferredTypes?: string[]; // Preferred agent types to steal
  progress: number;          // 0-100% completion estimate
  context?: string;          // Handoff context for stealer
}

type StealReason =
  | 'overloaded'             // Agent has too many claims
  | 'stale'                  // No progress for too long
  | 'blocked-timeout'        // Blocked longer than threshold
  | 'voluntary';             // Agent marked as stealable

Claim Workflow

┌─────────────────────────────────────────────────────────────────┐
│                     Issue Lifecycle                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  [Unclaimed] ──claim──► [Active] ──complete──► [Done]           │
│       │                    │                                     │
│       │                    ├──pause──► [Paused] ──resume──┐     │
│       │                    │                               │     │
│       │                    ├──block──► [Blocked] ─────────┤     │
│       │                    │                               │     │
│       │                    ├──handoff──► [Handoff] ───────┤     │
│       │                    │              Pending          │     │
│       │                    │                 │             │     │
│       │                    │            accept/reject      │     │
│       │                    │                 │             ▼     │
│       │                    └─────────────────┴──────────► [Active]
│       │                                                          │
│       └──────────── auto-assign (agent) ────────────────────────┘
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

API Design

typescript
interface IClaimService {
  // Claiming
  claim(issueId: string, claimant: Claimant): Promise<ClaimResult>;
  release(issueId: string, claimant: Claimant): Promise<void>;

  // Handoffs
  requestHandoff(issueId: string, from: Claimant, to: Claimant, reason: string): Promise<void>;
  acceptHandoff(issueId: string, claimant: Claimant): Promise<void>;
  rejectHandoff(issueId: string, claimant: Claimant, reason: string): Promise<void>;

  // Status
  updateStatus(issueId: string, status: ClaimStatus, note?: string): Promise<void>;
  requestReview(issueId: string, reviewers: Claimant[]): Promise<void>;

  // Queries
  getClaimedBy(claimant: Claimant): Promise<IssueClaim[]>;
  getAvailableIssues(filters?: IssueFilters): Promise<Issue[]>;
  getIssueStatus(issueId: string): Promise<IssueWithClaim>;

  // Auto-management
  expireStale(maxAge: Duration): Promise<IssueClaim[]>;
  autoAssign(issue: Issue): Promise<Claimant | null>;

  // Work Stealing
  markStealable(issueId: string, info: StealableInfo): Promise<void>;
  steal(issueId: string, stealer: Claimant): Promise<StealResult>;
  getStealable(agentType?: string): Promise<IssueClaim[]>;
  contestSteal(issueId: string, originalClaimant: Claimant, reason: string): Promise<void>;

  // Agent Load Balancing
  getAgentLoad(agentId: string): Promise<AgentLoadInfo>;
  rebalance(swarmId: string): Promise<RebalanceResult>;
}

Human-Agent Collaboration Patterns

Pattern 1: Agent Claims, Human Reviews

typescript
// Agent claims and works
await claimService.claim('issue-123', { type: 'agent', agentId: 'coder-1', agentType: 'coder' });
// ... agent works ...
await claimService.requestReview('issue-123', [{ type: 'human', userId: 'dev-1', name: 'Alice' }]);

// Human reviews and provides feedback or approves
await claimService.updateStatus('issue-123', 'completed');

Pattern 2: Human Starts, Agent Continues

typescript
// Human claims and starts design
await claimService.claim('issue-456', { type: 'human', userId: 'dev-1', name: 'Alice' });
// ... human creates spec ...

// Human hands off to agent for implementation
await claimService.requestHandoff('issue-456',
  { type: 'human', userId: 'dev-1', name: 'Alice' },
  { type: 'agent', agentId: 'coder-1', agentType: 'coder' },
  'Spec complete, ready for implementation'
);

Pattern 3: Agent Gets Blocked, Human Unblocks

typescript
// Agent hits a decision point
await claimService.updateStatus('issue-789', 'blocked', 'Need architecture decision: REST vs GraphQL?');

// Human resolves and unblocks
await claimService.updateStatus('issue-789', 'active', 'Decision: Use GraphQL. Proceed with implementation.');

Agent-to-Agent Collaboration Patterns

Pattern 4: Specialization Handoff (Agent → Agent)

typescript
// Coder agent implements feature but needs testing
await claimService.claim('issue-100', { type: 'agent', agentId: 'coder-1', agentType: 'coder' });
// ... coder implements ...

// Coder hands off to tester agent
await claimService.requestHandoff('issue-100',
  { type: 'agent', agentId: 'coder-1', agentType: 'coder' },
  { type: 'agent', agentId: 'tester-1', agentType: 'tester' },
  'Implementation complete. Ready for test coverage.'
);

// Tester accepts and continues
await claimService.acceptHandoff('issue-100', { type: 'agent', agentId: 'tester-1', agentType: 'tester' });

Pattern 5: Pipeline Handoff Chain

typescript
// Multi-stage pipeline: architect → coder → tester → reviewer
const pipeline = ['architect', 'coder', 'tester', 'reviewer'];

async function pipelineHandoff(issueId: string, currentStage: number) {
  const current = pipeline[currentStage];
  const next = pipeline[currentStage + 1];

  if (next) {
    const nextAgent = await findAvailableAgent(next);
    await claimService.requestHandoff(issueId,
      { type: 'agent', agentId: `${current}-1`, agentType: current },
      { type: 'agent', agentId: nextAgent.id, agentType: next },
      `Stage ${current} complete. Proceeding to ${next}.`
    );
  }
}

Pattern 6: Skill-Based Routing

typescript
// Issue requires specific skills - route to capable agent
async function routeBySkill(issue: Issue): Promise<Claimant> {
  const requiredSkills = extractSkills(issue.labels);  // e.g., ['typescript', 'react', 'testing']

  const candidates = await getAgentsBySkills(requiredSkills);
  const bestMatch = candidates
    .sort((a, b) => b.skillMatch - a.skillMatch)
    .filter(a => a.currentLoad < a.maxLoad)[0];

  return { type: 'agent', agentId: bestMatch.id, agentType: bestMatch.type };
}

Work Stealing

Work stealing allows idle agents to take over tasks from overloaded or stalled agents, maximizing swarm throughput.

Work Stealing Flow

┌─────────────────────────────────────────────────────────────────────┐
│                     Work Stealing Lifecycle                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Agent A (overloaded)              Agent B (idle)                   │
│  ┌─────────────────┐               ┌─────────────────┐              │
│  │ Issue 1 [active]│               │                 │              │
│  │ Issue 2 [active]│               │  Looking for    │              │
│  │ Issue 3 [active]│──stealable──► │  work...        │              │
│  │ Issue 4 [active]│               │                 │              │
│  │ Issue 5 [stale] │               │                 │              │
│  └─────────────────┘               └─────────────────┘              │
│         │                                   │                        │
│         │                                   │                        │
│         ▼                                   ▼                        │
│  ┌─────────────────┐               ┌─────────────────┐              │
│  │ Issue 1 [active]│               │ Issue 5 [active]│              │
│  │ Issue 2 [active]│               │ (stolen)        │              │
│  │ Issue 3 [active]│               │                 │              │
│  │ Issue 4 [active]│               │                 │              │
│  └─────────────────┘               └─────────────────┘              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Steal Eligibility Rules

typescript
interface WorkStealingConfig {
  // When issues become stealable
  staleThresholdMinutes: number;      // No progress for N minutes → stealable
  blockedThresholdMinutes: number;    // Blocked for N minutes → stealable
  overloadThreshold: number;          // Agent with >N claims → lowest priority stealable

  // Stealing priorities
  stealPriority: StealPriority[];     // Order of steal preference

  // Protections
  gracePeriodMinutes: number;         // New claims protected from stealing
  minProgressToProtect: number;       // >N% progress = protected
  contestWindowMinutes: number;       // Original owner can contest within N minutes

  // Agent matching
  requireSameType: boolean;           // Stealer must be same agent type
  allowCrossTypeSteal: string[][];    // Allowed cross-type steals [['coder', 'debugger']]
}

type StealPriority =
  | 'stale-first'           // Oldest stale claims first
  | 'blocked-first'         // Blocked claims first
  | 'low-progress-first'    // Least progress first
  | 'high-priority-first';  // P0/critical labels first

const defaultConfig: WorkStealingConfig = {
  staleThresholdMinutes: 30,
  blockedThresholdMinutes: 60,
  overloadThreshold: 5,
  stealPriority: ['blocked-first', 'stale-first', 'high-priority-first'],
  gracePeriodMinutes: 10,
  minProgressToProtect: 75,
  contestWindowMinutes: 5,
  requireSameType: false,
  allowCrossTypeSteal: [
    ['coder', 'debugger'],
    ['tester', 'reviewer'],
  ],
};

Work Stealing Patterns

Pattern 7: Idle Agent Steals Stale Work

typescript
// Agent B is idle, looks for stealable work
const stealable = await claimService.getStealable('coder');

if (stealable.length > 0) {
  // Pick highest priority stealable issue
  const target = stealable[0];

  const result = await claimService.steal(
    target.issueId,
    { type: 'agent', agentId: 'coder-2', agentType: 'coder' }
  );

  if (result.success) {
    // Got the work, continue from where previous agent left off
    console.log(`Stole issue ${target.issueId}, progress: ${result.context.progress}%`);
  }
}

Pattern 8: Overloaded Agent Voluntarily Releases

typescript
// Agent A has too many claims, marks lowest priority as stealable
const myLoad = await claimService.getAgentLoad('coder-1');

if (myLoad.claimCount > myLoad.maxClaims) {
  const lowestPriority = myLoad.claims
    .sort((a, b) => a.priority - b.priority)[0];

  await claimService.markStealable(lowestPriority.issueId, {
    reason: 'overloaded',
    stealableAt: new Date(),
    progress: lowestPriority.estimatedProgress,
    context: 'Overloaded. Initial analysis complete, implementation not started.',
    preferredTypes: ['coder', 'debugger'],
  });
}

Pattern 9: Blocked Work Auto-Steals

typescript
// Background job checks for blocked work that should be stolen
async function autoStealBlocked() {
  const blocked = await claimService.getByStatus('blocked');
  const now = Date.now();

  for (const claim of blocked) {
    const blockedDuration = now - claim.statusChangedAt.getTime();

    if (blockedDuration > config.blockedThresholdMinutes * 60 * 1000) {
      // Find alternative agent who might not be blocked
      const alternative = await findUnblockedAgent(claim.issueId);

      if (alternative) {
        await claimService.markStealable(claim.issueId, {
          reason: 'blocked-timeout',
          stealableAt: new Date(),
          progress: claim.progress,
          context: `Blocked for ${Math.round(blockedDuration / 60000)} minutes: ${claim.blockReason}`,
        });
      }
    }
  }
}

Pattern 10: Contest a Steal

typescript
// Original agent was temporarily offline, contests the steal
await claimService.contestSteal(
  'issue-123',
  { type: 'agent', agentId: 'coder-1', agentType: 'coder' },
  'Was offline for maintenance. Can resume immediately. 80% complete.'
);

// Queen coordinator or human decides contest
// If contested successfully, work returns to original agent

Load Balancing

typescript
interface AgentLoadInfo {
  agentId: string;
  agentType: string;
  claimCount: number;
  maxClaims: number;
  utilization: number;        // 0-1
  claims: ClaimSummary[];
  avgCompletionTime: number;  // Historical average
  currentBlockedCount: number;
}

interface RebalanceResult {
  moved: Array<{
    issueId: string;
    from: Claimant;
    to: Claimant;
  }>;
  suggested: Array<{
    issueId: string;
    currentOwner: Claimant;
    suggestedOwner: Claimant;
    reason: string;
  }>;
}

// Rebalance entire swarm
async function rebalanceSwarm(swarmId: string): Promise<void> {
  const agents = await getSwarmAgents(swarmId);
  const loads = await Promise.all(agents.map(a => claimService.getAgentLoad(a.id)));

  const avgLoad = loads.reduce((sum, l) => sum + l.utilization, 0) / loads.length;
  const overloaded = loads.filter(l => l.utilization > avgLoad * 1.5);
  const underloaded = loads.filter(l => l.utilization < avgLoad * 0.5);

  for (const over of overloaded) {
    const excess = over.claims
      .filter(c => c.progress < 25)  // Only move low-progress work
      .slice(0, over.claimCount - Math.ceil(avgLoad * over.maxClaims));

    for (const claim of excess) {
      const target = underloaded.find(u =>
        u.agentType === over.agentType &&
        u.claimCount < u.maxClaims
      );

      if (target) {
        await claimService.requestHandoff(claim.issueId,
          { type: 'agent', agentId: over.agentId, agentType: over.agentType },
          { type: 'agent', agentId: target.agentId, agentType: target.agentType },
          'Load balancing: redistributing work across swarm'
        );
      }
    }
  }
}

GitHub Integration

typescript
// Sync with GitHub Issues
class GitHubClaimSync {
  async syncClaim(claim: IssueClaim): Promise<void> {
    const labels = this.claimToLabels(claim);
    const assignee = claim.claimant.type === 'human'
      ? claim.claimant.userId
      : `bot:${claim.claimant.agentType}`;

    await this.github.issues.update({
      issue_number: parseInt(claim.issueId),
      labels,
      assignees: [assignee],
    });

    // Add status comment
    await this.github.issues.createComment({
      issue_number: parseInt(claim.issueId),
      body: this.formatStatusUpdate(claim),
    });
  }

  private claimToLabels(claim: IssueClaim): string[] {
    const labels = [`status:${claim.status}`];
    if (claim.claimant.type === 'agent') {
      labels.push(`agent:${claim.claimant.agentType}`);
    }
    return labels;
  }
}

Events (Following ADR-007)

typescript
type ClaimEvent =
  | IssueClaimed
  | IssueReleased
  | HandoffRequested
  | HandoffAccepted
  | HandoffRejected
  | ClaimStatusChanged
  | ReviewRequested
  | ClaimExpired
  // Work stealing events
  | IssueMarkedStealable
  | IssueStolen
  | StealContested
  | StealContestResolved
  // Load balancing events
  | SwarmRebalanced
  | AgentOverloaded
  | AgentUnderloaded;

// Example event
class IssueClaimed extends DomainEvent {
  constructor(
    readonly issueId: string,
    readonly claimant: Claimant,
    readonly previousClaimant: Claimant | null,
    readonly timestamp: Date
  ) { super(); }
}

Auto-Assignment Rules

typescript
interface AutoAssignConfig {
  // Match agent types to issue labels
  labelMapping: Record<string, string[]>;  // label → agent types

  // Capacity limits
  maxConcurrentPerAgent: number;
  maxConcurrentPerHuman: number;

  // Priority rules
  priorityLabels: string[];  // High priority labels
  humanOnlyLabels: string[]; // Requires human (e.g., 'security', 'architecture')

  // Timeout rules
  staleClaimHours: number;   // Auto-release after N hours inactive
  reviewTimeoutHours: number; // Escalate if review not completed
}

// Example config
const config: AutoAssignConfig = {
  labelMapping: {
    'bug': ['coder', 'debugger'],
    'feature': ['coder', 'architect'],
    'docs': ['documentation'],
    'security': [],  // Human only
    'test': ['tester'],
  },
  maxConcurrentPerAgent: 3,
  maxConcurrentPerHuman: 5,
  priorityLabels: ['critical', 'urgent', 'P0'],
  humanOnlyLabels: ['security', 'architecture', 'breaking-change'],
  staleClaimHours: 24,
  reviewTimeoutHours: 48,
};

CLI Commands

bash
# View available issues
claude-flow issues list --available

# Claim an issue (as current user/agent)
claude-flow issues claim 123

# Release a claim
claude-flow issues release 123

# Request handoff to specific agent/human
claude-flow issues handoff 123 --to agent:coder-1
claude-flow issues handoff 123 --to human:alice

# Update status
claude-flow issues status 123 --blocked "Waiting for API spec"
claude-flow issues status 123 --review-requested

# View who's working on what
claude-flow issues board

# Work stealing commands
claude-flow issues stealable              # List stealable issues
claude-flow issues steal 123              # Steal an issue
claude-flow issues mark-stealable 123     # Mark your claim as stealable
claude-flow issues contest 123            # Contest a steal

# Load balancing
claude-flow issues load                   # View agent load distribution
claude-flow issues load --agent coder-1   # View specific agent load
claude-flow issues rebalance              # Trigger swarm rebalancing
claude-flow issues rebalance --dry-run    # Preview rebalancing without applying

MCP Tools

typescript
// New MCP tools
const claimTools = [
  // Core claiming
  'issue_claim',
  'issue_release',
  'issue_handoff',
  'issue_status_update',
  'issue_list_available',
  'issue_list_mine',
  'issue_board',

  // Work stealing
  'issue_mark_stealable',
  'issue_steal',
  'issue_get_stealable',
  'issue_contest_steal',

  // Load balancing
  'agent_load_info',
  'swarm_rebalance',
  'swarm_load_overview',
];

Benefits

  • No duplicate work: Clear ownership prevents agents and humans stepping on each other
  • Smooth handoffs: Formal handoff process with context preservation
  • Visibility: Everyone sees who's working on what
  • Accountability: Track who completed what (audit trail via events)
  • Flexibility: Humans can jump in when agents get stuck
  • Scalability: Auto-assignment handles large backlogs
  • Maximum throughput: Work stealing ensures no idle agents while work is pending
  • Fault tolerance: Stalled/blocked work automatically redistributes
  • Load balancing: Even distribution across swarm prevents bottlenecks
  • Specialization chains: Agent pipelines (architect → coder → tester) streamline workflows

Scope

In Scope:

  • Issue claiming/releasing
  • Human ↔ Agent handoffs
  • Agent ↔ Agent handoffs (specialization chains)
  • Work stealing (idle agents take stale/blocked work)
  • Load balancing (redistribute across swarm)
  • Contest mechanism (original owner can reclaim)
  • Status tracking
  • GitHub sync
  • Auto-assignment
  • Stale claim expiration

Out of Scope (Future):

  • Jira/Linear integration (later ADR)
  • Cross-swarm work stealing
  • Billing/credits for agent work
  • AI-based skill matching (v4)

Success Metrics

  • Claim service implemented (@claude-flow/cli/src/services/claim-service.ts)
  • Human and agent claims work
  • Human ↔ Agent handoff flow tested
  • Agent ↔ Agent handoff flow tested
  • Work stealing mechanism functional
  • Stale detection and auto-stealable marking
  • Contest mechanism with resolution
  • Load balancing rebalance operation
  • GitHub sync operational (future)
  • Auto-assignment rules configurable
  • CLI commands functional (@claude-flow/cli/src/commands/issues.ts)
  • MCP tools exposed (planned for MCP integration phase)
  • Event sourcing for all claim/steal changes
  • <30s average steal latency (sub-second in practice)
  • >90% swarm utilization with work stealing enabled

Implementation Notes (2026-01-07)

Files Created

FileLinesDescription
@claude-flow/cli/src/services/claim-service.ts~600Full claims service with work stealing
@claude-flow/cli/src/commands/issues.ts~450CLI commands for issue claims

CLI Commands Implemented

bash
# Issue claim commands (10 subcommands)
claude-flow issues list          # List all claims
claude-flow issues claim <id>    # Claim an issue
claude-flow issues release <id>  # Release a claim
claude-flow issues handoff       # Request handoff
claude-flow issues status <id>   # Get claim status
claude-flow issues stealable     # List stealable issues
claude-flow issues steal <id>    # Steal an issue
claude-flow issues load          # View agent load
claude-flow issues rebalance     # Rebalance swarm
claude-flow issues board         # Visual claim board

Key Features

  1. Claimant Types: Human (userId, name) and Agent (agentId, agentType)
  2. Claim Status: active, paused, blocked, stealable, completed, handoff-pending, review-requested
  3. Work Stealing: Supports overloaded, stale, blocked-timeout, voluntary reasons
  4. Load Balancing: Automatic rebalancing across swarm agents
  5. Event-Driven: ClaimEvent types for all state changes
  6. Persistence: File-based storage in .claude-flow/claims/claims.json

Dependencies

  • ADR-007: Event Sourcing (claim events)
  • ADR-002: DDD Structure (new Claims domain)
  • ADR-005: MCP-first API (claim tools)

Proposed By: v3 Architecture Team Review Requested: 2026-01-06