v3/@claude-flow/guidance/docs/adrs/ADR-G006-deterministic-tool-gateway.md
Accepted
2026-02-01
The enforcement gates (ADR-G004) evaluate tool calls at the moment they are attempted. In a multi-agent environment (swarms, retries, replays), the same tool call may be evaluated multiple times:
Additionally, autonomous agents have no natural spending limit. Without budget enforcement, a swarm can execute thousands of tool calls, generating massive diffs and consuming unbounded resources.
The EnforcementGates class provides the evaluation primitives, but the orchestrating layer needs to add three cross-cutting concerns: idempotency, schema validation, and budget metering.
Implement deterministic tool evaluation in the GuidanceControlPlane orchestrator (src/index.ts) with three layers wrapping the EnforcementGates:
Gate evaluation is deterministic by construction. All four gates use regex pattern matching against static configuration:
evaluateDestructiveOps(command) -- matches command against destructivePatternsevaluateToolAllowlist(toolName) -- checks toolName against allowedToolsevaluateDiffSize(filePath, diffLines) -- compares diffLines to diffSizeThresholdevaluateSecrets(content) -- matches content against secretPatternsNo gate uses random state, network calls, or time-dependent logic. The same input always produces the same output, provided the GateConfig has not changed.
The GateConfig is set once during initialization and updated only via explicit updateConfig() calls, ensuring stability during a session.
EnforcementGates.aggregateDecision() applies a deterministic severity hierarchy:
const severity: Record<GateDecision, number> = {
'block': 3,
'require-confirmation': 2,
'warn': 1,
'allow': 0,
};
The most restrictive decision wins. This is a pure function of the input GateResult[] array.
The RunLedger in src/ledger.ts tracks cumulative metrics per run:
diffSummary.linesAdded and linesRemoved -- total diff sizediffSummary.filesChanged -- number of files modifiedtoolsUsed -- list of tools invokeddurationMs -- elapsed wall timeThe GuidanceControlPlane.startRun() method creates a new RunEvent for each task. During the run, recordViolation() appends violations. finalizeRun() closes the event and runs evaluators.
The DiffQualityEvaluator computes the rework ratio (reworkLines / totalLines). If the ratio exceeds maxReworkRatio (default 0.3), the evaluator fails, signaling that the run produced low-quality output requiring significant rework.
Budget enforcement is implicit through the evaluator pipeline: a run that exceeds thresholds (too many violations, too many rework lines, too large a diff) is marked as failed, which feeds back into the optimizer's violation rankings.
The GuidanceControlPlane exposes three facade methods that route to the appropriate gate combination:
| Method | Evaluates | Returns |
|---|---|---|
evaluateCommand(command: string) | destructive-ops, secrets | GateResult[] |
evaluateToolUse(toolName: string, params: Record<string, unknown>) | tool-allowlist, secrets (on serialized params) | GateResult[] |
evaluateEdit(filePath: string, content: string, diffLines: number) | diff-size, secrets | GateResult[] |
These methods are stateless -- they do not modify the ledger or the gate configuration. Side effects (logging, violation recording) happen at the orchestration level via startRun() / recordViolation() / finalizeRun().
updateConfig() is called mid-session, gate decisions may change for the same input. Mitigation: configuration changes are explicit and logged, and the guidance hash in the ledger tracks which version was active.Cache gate results by hashing the input and returning cached decisions. Rejected because the gates are already deterministic -- caching adds memory overhead without changing behavior. If performance becomes an issue (unlikely at <1ms per evaluation), caching can be layered on.
Block tool calls once a cumulative budget (diff lines, tool call count, duration) is exceeded. Rejected for now because it risks blocking legitimate long-running tasks. The passive approach (evaluate after, optimize rules) is less disruptive. Active enforcement is planned as a future gate.
Validate tool call parameters against a schema before execution. Rejected because Claude Code already validates tool parameters against its own schemas. Adding a second validation layer would be redundant. The guidance layer focuses on policy (should this tool be used?) not schema (are the parameters well-formed?).
v3/@claude-flow/guidance/src/gates.ts -- EnforcementGates.aggregateDecision(), stateless evaluation methodsv3/@claude-flow/guidance/src/ledger.ts -- RunLedger.createEvent(), DiffQualityEvaluatorv3/@claude-flow/guidance/src/index.ts -- GuidanceControlPlane.evaluateCommand(), evaluateToolUse(), evaluateEdit()v3/@claude-flow/guidance/src/types.ts -- GateDecision, GateResult