v3/@claude-flow/guidance/docs/adrs/ADR-G005-proof-envelope.md
Accepted
2026-02-01
Autonomous agent operations (swarms, daemon tasks, headless evaluation) execute tool calls without real-time human oversight. After the fact, teams need to:
The RunEvent type in src/types.ts captures the minimum schema for a run: event ID (UUID), task ID, guidance hash, retrieved rule IDs, tools used, files touched, diff summary, test results, violations, outcome acceptance, rework lines, intent classification, timestamp, duration, and session ID.
The guidance hash in RunEvent.guidanceHash is the SHA-256 hash of the constitution text, binding each event to the exact rule set version that governed it.
Implement a proof envelope model for run auditing with the following properties:
Each RunEvent receives a UUID (eventId) generated by crypto.randomUUID() at creation time in RunLedger.createEvent(). This ID is immutable once assigned.
The guidanceHash field on every RunEvent stores the constitution.hash from the active PolicyBundle. This is a SHA-256 hash truncated to 16 hex characters, computed by GuidanceCompiler.hashContent():
private hashContent(content: string): string {
return createHash('sha256').update(content).digest('hex').slice(0, 16);
}
If the constitution changes between runs, the hash changes, making it immediately visible in the ledger that different guidance versions were in effect.
RunEvent.retrievedRuleIds records the exact rule IDs that were active during the run. Each gate result (GateResult.triggeredRules) links back to specific rule IDs. Violations (Violation.ruleId) identify which rule was violated.
This three-level traceability (guidance hash -> retrieved rules -> triggered/violated rules) enables precise root-cause analysis: "Rule R042 was active because of guidance version abc123, was triggered by the secrets gate, and produced violation V-001."
The RunLedger.evaluate() method runs all registered evaluators (IEvaluator implementations) against a finalized event. Each evaluator produces an EvaluatorResult with name, pass/fail, details, and an optional score. The evaluator chain is:
TestsPassEvaluator -- Were tests run? Did they pass?ForbiddenCommandEvaluator -- Were any forbidden command patterns used?ForbiddenDependencyEvaluator -- Were forbidden packages introduced?ViolationRateEvaluator -- Did violations exceed the threshold?DiffQualityEvaluator -- Did rework exceed the acceptable ratio?Custom evaluators can be added via RunLedger.addEvaluator().
RunLedger.rankViolations() aggregates violations across events by rule ID, computing frequency, average cost (rework lines), and a combined score (frequency * cost). This ranked list feeds into the optimizer loop (ADR-G008).
The ledger supports temporal queries: getEventsInRange(startMs, endMs), getRecentEvents(count), and getEventsByTask(taskId). These enable windowed analysis for optimization cycles.
RunLedger.computeMetrics() derives optimization metrics (violation rate per 10 tasks, self-correction rate, rework lines, task count) directly from the event stream, enabling data-driven guidance evolution.RunLedger.exportEvents() and importEvents() allow ledger persistence and cross-session analysis.RunLedger accepts a maxEvents constructor parameter (default 0 = unlimited). When set, the oldest 10% of events are evicted in a batch splice when the limit is exceeded (see ADR-G026). Events can also be exported and cleared periodically via exportEvents() and clear().createProofChain() factory requires { signingKey: string } as of ADR-G026. Callers must provide an explicit HMAC key; there is no fallback. The RuvBotBridgeConfig exposes a proofSigningKey field for this purpose.Date.now(), which can be manipulated on the host. Mitigation: in production, timestamps should be sourced from a trusted time service.Use git log to reconstruct what happened. Rejected because git does not capture which rules were active, which gates fired, or whether violations were detected. Git captures the what (file changes) but not the why (guidance context).
Hash-chain every event into a merkle tree for cryptographic proof of ordering. Rejected as over-engineered for the current use case. The guidance hash binding provides sufficient tamper evidence without the complexity of merkle proofs. Can be added later if regulatory compliance requires it.
Send events to an external audit logging service (e.g., AWS CloudTrail, Datadog). Rejected for the same reason as LLM-based classification: adds a network dependency, latency, and cost. The local ledger is sufficient for development workflows. External export can be layered on top via the exportEvents() API.
Log events without running evaluators. Rejected because the evaluators are what make the ledger actionable. Without them, the ledger is a write-only log that requires manual analysis.
v3/@claude-flow/guidance/src/types.ts -- RunEvent, Violation, EvaluatorResult, ViolationRanking, OptimizationMetricsv3/@claude-flow/guidance/src/ledger.ts -- RunLedger, all evaluator classesv3/@claude-flow/guidance/src/compiler.ts -- hashContent() for guidance hash generationv3/@claude-flow/guidance/src/index.ts -- GuidanceControlPlane.startRun(), finalizeRun()