Back to Ruflo

ADR-G013: Evolution Pipeline

v3/@claude-flow/guidance/docs/adrs/ADR-G013-evolution-pipeline.md

3.6.302.9 KB
Original Source

ADR-G013: Evolution Pipeline

Status: Accepted Date: 2026-02-01 Author: Guidance Control Plane Team

Context

Governance rules must evolve as the system learns. Static rules become stale. But changing governance in a live autonomous system is dangerous — a bad rule change can cascade into widespread failure. A structured, auditable, reversible process for rule evolution is required.

Decision

Implement EvolutionPipeline with a strict lifecycle for rule changes:

Proposal Lifecycle

draft -> signed -> simulating -> compared -> staged -> promoted
                                                    \-> rolled-back
StateWhat Happens
draftAuthor creates proposal with kind, description, risk assessment
signedProposal receives cryptographic signature from author
simulatingProposal is applied to recorded traces in shadow mode
comparedSimulation results compared against baseline (divergence measured)
stagedProposal enters gradual rollout through canary/partial/full stages
promotedProposal becomes active policy
rolled-backProposal is reverted due to excessive divergence

Change Proposal Kinds

KindDescription
add-ruleNew governance rule
modify-ruleChange to existing rule
remove-ruleDeletion of a rule
promote-shardElevate shard to constitution
demote-ruleMove constitution rule to shard
adjust-thresholdChange gate thresholds
capability-changeModify capability algebra

Staged Rollout

Each proposal rolls out through stages:

StageTypical Config
Canary5-10% of agents, 1 hour
Partial25-50% of agents, 4 hours
Full100% of agents

Auto-rollback triggers if divergence exceeds the configured threshold (default 5%) at any stage. Divergence is measured as the fraction of golden trace decisions that change under the new rule set.

Simulation

Before staging, every proposal is simulated against recorded golden traces:

  • Apply the proposed change to a copy of the rule set
  • Replay all traces through the modified gates
  • Count how many decisions differ (divergence)
  • Identify regressions (previously-passing traces that now fail)

Consequences

  • Rule changes are auditable (every proposal has an author, signature, and risk assessment)
  • Simulation catches regressions before any real agent is affected
  • Staged rollout limits blast radius of bad changes
  • Auto-rollback prevents cascading failures
  • 43 tests validate the full lifecycle, simulation, staging, and rollback

Alternatives Considered

  • Manual rule editing: No audit trail, no simulation, no rollback
  • Feature flags: Too coarse (on/off), no staged rollout or simulation
  • Canary deployments only: Missing the simulation step that catches issues before any real traffic