plugins/plugin-local-inference/docs/voice-cancellation-contract.md
Canonical reference for the single-token cancellation flow that binds the voice loop (VAD → ASR → turn detector → LM → TTS) to the runtime's per-turn abort surface. Read this before changing any of:
packages/shared/src/voice/voice-cancellation-token.tsplugins/plugin-local-inference/src/services/voice/cancellation-coordinator.tsplugins/plugin-local-inference/src/services/voice/optimistic-policy.tsplugins/plugin-local-inference/src/services/voice/barge-in.tspackages/core/src/runtime/turn-controller.tsWave 2 ended with three independent abort surfaces:
TurnControllerRegistry keyed by roomId (planner-loop,
action handlers, streaming useModel).BargeInController / BargeInCancelToken
(hardStop() returns an AbortSignal the engine threads into the LM
fetch).OptimisticRollbackController + VoiceStateMachine).R11's audit (.swarm/research/R11-cancellation.md) showed every voice
barge-in fires (2) but does not fire (1), so the planner-loop / action
handlers had to discover the abort indirectly via the streaming HTTP fetch
close. W3-9 binds these to one handle — VoiceCancellationToken — and
makes the engine bridge fire both the voice abort and the runtime
abort in lock-step.
interface VoiceCancellationToken {
readonly runId: string; // stable per-utterance id
readonly slot?: number; // DflashLlamaServer slot id, when known
readonly aborted: boolean; // cheap poll
readonly reason: VoiceCancellationReason | null;
readonly signal: AbortSignal; // standard signal for fetch / model
abort(reason: VoiceCancellationReason): void;
onAbort(listener: (r: VoiceCancellationReason) => void): () => void;
}
type VoiceCancellationReason =
| "barge-in" // VAD speech-start during agent-speaking
| "eot-revoked" // turn detector revoked the previous EOT decision
| "user-cancel" // explicit cancel from UI / API
| "timeout" // turn exceeded its budget
| "external"; // runtime / lifecycle abort (APP_PAUSE etc.)
Invariants:
runId.abort() is idempotent — first reason wins.signal.aborted === true after the first abort().The VoiceCancellationCoordinator (the only owner of live tokens, keyed
by roomId) wires every abort into four sinks:
coordinator.bargeIn(roomId) coordinator.armTurn({ roomId, runId, slot? })
│ │
▼ ▼
token.abort("barge-in") ─────────────► token (registered in VoiceCancellationRegistry)
│
│ onAbort listener (set by coordinator)
▼
┌───────────────────────────────────────────────────────────┐
│ 1. runtime.turnControllers.abortTurn(roomId, reason) │
│ → planner-loop / action handlers / streaming useModel │
│ see TurnAbortedError at the next yield │
│ │
│ 2. slotAbort(slot, reason) [if slot was registered] │
│ → DflashLlamaServer.abortSlot — closes in-flight │
│ fetches against the slot. On a fork with a │
│ slot-cancel REST route, the route is hit instead. │
│ │
│ 3. ttsStop(reason) │
│ → EngineVoiceBridge.triggerBargeIn → audio-sink drain │
│ (SIGKILLs the player child) + FFI/HTTP synthesis │
│ cancel at the next kernel boundary │
│ │
│ 4. AbortSignal listeners │
│ → every fetch / useModel / FFI call that took │
│ token.signal aborts at the next yield point │
└───────────────────────────────────────────────────────────┘
The reverse direction (runtime → voice) is wired symmetrically: the
coordinator subscribes to runtime.turnControllers.onEvent and aborts
the active voice token whenever the runtime emits aborted /
aborted-cleanup for the same roomId. This is the path that fires
when abortInflightInference(runtime) is called from a lifecycle
listener (APP_PAUSE_EVENT) — the voice loop drops in sync.
| Source | Entry point |
|---|---|
| VAD speech-start during agent-speaking | coordinator.bargeIn(roomId) |
| ASR-confirmed barge-in words | BargeInController.hardStop → bindBargeInController |
| Turn detector revokes EOT (user resumed) | coordinator.revokeEot(roomId) |
| Explicit cancel from UI / API | coordinator.abort(roomId, "user-cancel") |
| Lifecycle (APP_PAUSE, container shutdown) | runtime.turnControllers.abortAllTurns(reason) (auto) |
| Per-turn timeout | coordinator.abort(roomId, "timeout") |
Gated by OptimisticGenerationPolicy. Default heuristic:
voice.optimisticGenerationOnEot) wins.The state machine's handlePartialTranscript consults
policy.shouldStartOptimisticLm(eotProb) before firing the speculative
drafter. When the policy says yes, the coordinator arms a token for the
turn and the LM call takes token.signal so subsequent VAD-driven
barge-ins abort it cleanly.
packages/shared/src/voice/voice-cancellation-token.test.ts (16
tests), plugins/plugin-local-inference/src/services/voice/cancellation-coordinator.test.ts
(12 tests), plugins/plugin-local-inference/src/services/voice/optimistic-policy.test.ts
(13 tests).packages/app-core/__tests__/voice/barge-in.test.ts (9
scenarios) — covers the W3-9 brief's two load-bearing claims:
EngineVoiceBridge.start() constructs the canonical
VoiceCancellationCoordinator and the OptimisticGenerationPolicy for
the session whenever a runtime option is supplied. The bridge wires
both into the production hot path; nothing in the live voice loop is
manual any more. The flow:
EngineVoiceBridge.start({ runtime, ... })
│
├─► VoiceCancellationCoordinator (per session, owns per-roomId tokens)
│ │
│ ├─ slotAbort ← opts.slotAbort (production wires DflashLlamaServer.abortSlot
│ │ when a slot id is known per turn)
│ ├─ ttsStop ← bridge.triggerBargeIn() ← audio sink drain
│ │ + scheduler chunker flush
│ │ + in-flight TTS cancel
│ └─ runtime.turnControllers.{abortTurn, onEvent} ← both directions
│
└─► OptimisticGenerationPolicy (per session, hot-swappable power source)
│
└─ Primed with resolvePowerSourceState() ← env override OR Linux sysfs
("plugged-in" | "battery"
| "unknown")
Plugged-in / unknown → enabled
Battery → disabled
EngineVoiceBridge.bindBargeInControllerForRoom(roomId)
│
└─► coordinator.bindBargeInController(roomId, scheduler.bargeIn)
│
└─ scheduler.bargeIn.hardStop("barge-in-words")
→ coordinator.bargeIn(roomId)
→ token.abort("barge-in")
→ fan-out (runtime / slotAbort / ttsStop / AbortSignal)
VoiceStateMachine.firePrefill(partial, eotProb, turnId) ← speech-pause / EOT
│
└─ if (optimisticPolicy && !policy.shouldStartOptimisticLm(eotProb))
return — prefill suppressed (battery / below threshold / override off)
else
fire-and-forget prefillOptimistic + retain promise for speech-end
Accessors on the bridge (null when no runtime was supplied — back-
compat for callers that haven't adopted the canonical surface yet):
bridge.cancellationCoordinatorOrNull() — the live coordinator.bridge.optimisticPolicyOrNull() — the live policy.bridge.bindBargeInControllerForRoom(roomId) — idempotent binding;
returns an unsubscribe handle. Bridge dispose() tears down every
remaining binding before the FFI context goes away./v1/audio/speech C++ interrupt. The fused-build synthesis
handler is non-streaming and ignores req.is_connection_closed; the
in-flight ov_synthesize keeps running after a client-side abort. The
audio sink drain still cuts user-facing audio within one tick (the
player child is SIGKILLed), so the wasted GPU work is the only cost.
Fix tracked under R11 §5.3 — extend the in-source route handler at
plugins/plugin-local-inference/native/llama.cpp/tools/server/server.cpp
(namespace eliza_omnivoice, the audio_speech_handler() lambda).dflash-checkpoint-client.ts speaks
the post-merge shape (POST /slots/<id>/save?filename= +
DELETE /slots/<id>); the bundled fork still serves the legacy
?action= route. The capability-probe in dflash-server.ts
probeCtxCheckpointsSupported is the seam — see R11 §3.3.To add a new cancellation source:
roomId of the active voice turn.coordinator.abort(roomId, reason) with the appropriate
VoiceCancellationReason (extend the enum in
packages/shared/src/voice/voice-cancellation-token.ts first if the
new source doesn't map to an existing reason; bump the reason
taxonomy with care — telemetry consumers pin these).cancellation-coordinator.test.ts proving the
fan-out fires for the new entry point.Do NOT bypass the coordinator. Any voice-side caller that wants to abort
in-flight inference goes through this single entry point. The legacy
BargeInController.hardStop() is preserved for callers that already
hold a controller reference, but it should be wired through
coordinator.bindBargeInController() so the canonical token is the
authoritative cancel surface.