docs/adr/ADR-106-dp-sgd-and-primitive-isolation.md
Status: Proposed · Date: 2026-05-22 · Author: SOTA research loop tick-15 · Supersedes: none · Extends: ADR-105
ADR-105 specified federated learning for RuView CSI personalisation with MERIDIAN env-normalisation + Krum byzantine-robust aggregation + R7-style update-level mincut. It deferred two questions:
This ADR closes both. It is a direct extension of ADR-105 and incorporates the constraints from R3 (re-ID privacy) + R14 (empathic appliance privacy) + R15 (RF biometric physical-not-learned identification).
Adopt DP-SGD with explicit primitive-isolation enforcement on every Cognitum Seed before any model delta leaves the device.
Layer 1 — Primitive Isolation (R15 binding constraint). A static list of "on-device-only" biometric primitives. The federation client library enforces that these tensors are never serialised into a transmittable update.
| Primitive | On-device only | Reason |
|---|---|---|
| Raw CSI window (complex64 tensor) | ✅ | ADR-105 baseline |
| Gait stride frequency (Hz scalar per subject) | ✅ | R15 — biometric primitive |
| Breathing rate (BPM scalar per subject) | ✅ | R15 — biometric primitive |
| HRV rate signature (R-R interval array per subject) | ✅ | R15 — biometric primitive |
| RCS frequency response curve (per subject, per-subcarrier amplitude) | ✅ | R15 — biometric primitive |
| Limb timing vector (per subject, per stride) | ✅ | R15 — biometric primitive |
| Per-subject embedding centroid | ✅ | R3 + ADR-105 — re-ID primitive |
| MERIDIAN per-room centroid | ⚠️ | Aggregate over all subjects in the room — not per-subject |
| LoRA weight delta | ⚠️ | Encodes biometric information; mitigated by Layer 2 + Layer 3 |
| Model logits / softmax outputs | ⚠️ | Per-subject during inference; never aggregated for transmission |
| Coordinator-side aggregate model | ❌ | Distributed back to nodes; no per-subject content by construction |
The ✅ rows are enforced at the API surface — the federation client returns an error if a tensor with these tags is passed to submit_delta().
Layer 2 — Gradient clipping. Before any LoRA weight delta is computed for transmission, individual sample gradients are clipped to L2 norm C (standard DP-SGD step, Abadi 2016). This bounds the sensitivity of the released delta to any single training sample.
Recommended: C = 1.0 (after experimentation per-cog; some cogs may need C ∈ [0.5, 2.0]).
Layer 3 — Gaussian noise on aggregated deltas. Before transmission to the coordinator, Gaussian noise N(0, σ²C²I) is added to the aggregated LoRA delta. This bounds the per-round privacy leakage.
Using the Moments Accountant (Abadi 2016) for (ε, δ)-DP across federation rounds:
| Configuration | Per-round σ | Rounds | Total ε (δ=1e-5) | Verdict |
|---|---|---|---|---|
| Conservative (medical-grade) | 1.5 | 50 | 2.0 | Strong; matches HIPAA-aligned recommendations |
| Standard (typical RuView) | 1.0 | 100 | 5.0 | Strong; consistent with Google's federated keyboard work |
| Lenient (faster convergence) | 0.5 | 100 | 8.0 | Moderate; below ε=10 community soft-bound |
Recommended starting σ = 1.0 for most RuView cogs, with per-cog tuning:
cog-person-count (R8 — simple classifier): σ=1.0 sufficient.cog-pose-estimation (skeleton output): σ=1.0.cog-maritime-watch (R11): σ=1.5 (medical-grade — vessel crew vitals).The DP-SGD layer slots in at step 4 of ADR-105's protocol summary:
- Delta compression. Compute ΔW_i = W_T+1_i − W_T. [NEW: clip individual-sample gradients to L2 norm C=1.0 during local training; add Gaussian noise N(0, σ²C²I) to ΔW_i with σ from per-cog table above.] Quantise to int8 + LoRA-rank decomposition (rank=8) → ~1 MB per delta.
Krum byzantine-robust aggregation (step 5) operates on DP-noised deltas without modification — Krum's distance metric is robust to additive Gaussian noise at typical σ values.
The ruview-fed crate (per ADR-105 implementation plan, ~500 LOC) gains:
| Component | LOC | Purpose |
|---|---|---|
PrimitiveTag enum + tensor tagging trait | 60 | Layer 1 primitive isolation |
clip_gradient_l2(C) helper | 30 | Layer 2 clipping |
add_dp_noise(sigma, C) helper | 40 | Layer 3 Gaussian noise |
MomentsAccountant | 120 | (ε, δ) tracking across rounds; aborts federation if budget exceeded |
| Per-cog config schema | 50 | σ, C, max rounds budget |
Total ~300 additional LOC on top of ADR-105's 500. Federation protocol implementation budget revised to ~800 LOC total.
Status: rejected. ADR-105's Krum + LoRA + int8 quantisation provides some implicit privacy, but it's not a formal guarantee. Member-inference attacks (Shokri 2017) recover training samples from undefended FL. We need a formal (ε, δ)-DP bound.
Status: rejected. LDP would add noise per-sample at the device, then the coordinator gets noisy aggregates. This gives stronger guarantees but degrades model accuracy by 5-15× for the same ε. Central DP (CDP) with byzantine-robust aggregation is the right trade-off for our threat model where the coordinator is trusted to apply noise correctly (the coordinator is cognitum-v0 fleet manager, under installation owner's control per ADR-100 signing).
Status: deferred. Secure aggregation (Bonawitz 2016) avoids the coordinator ever seeing individual deltas, only their sum. This is the right next layer for cross-installation federation (ADR-105 explicitly deferred). For within-installation federation where the coordinator is owner-controlled, the gains don't justify the 5-10× compute and complexity cost.
Status: rejected. Krum defends against adversarial nodes, not adversarial inference. A passive coordinator (even an honest one) plus moderate compute can extract training samples from undefended deltas. DP-SGD is the proper defence.
| Threat | Layer that mitigates |
|---|---|
| Compromised seed reads its own local biometric primitives | Out of scope — physical compromise = full local compromise |
| Compromised seed exfiltrates a biometric primitive via the federation channel | Layer 1 — primitive isolation API blocks transmission |
| Passive coordinator reconstructs training samples from observed deltas (Shokri 2017) | Layer 2 + 3 — DP-SGD bounds reconstruction quality |
| Member inference attack on the trained model (Shokri 2017 §3.2) | Layer 2 + 3 — formal (ε, δ) bound |
| Coordinator + 1 colluding seed | Krum (ADR-105) still works; DP-SGD bounds the colluder's info gain |
| Brute-force gradient inversion (Zhu 2019) | Layer 2 + 3 — clipping + noise defeats gradient-from-update attack |
| Active adversary controlling >f Krum nodes | Out of scope — ADR-105 byzantine bound f < (K-2)/2 |
| Side-channel via inference latency | Out of scope — separate ADR (constant-time inference) |
cog-person-count v0.0.2 (this loop's earlier work), the baseline 34.3% class-1 accuracy would degrade to ~31-33% with σ=1.0.ruview-fed. Total federation budget revised to ~800 LOC.K_subjects × privacy_amplification — discussed in next-generation work.ruview_fed_privacy_budget (future tool; out of scope for this ADR).| Step | LOC | Notes |
|---|---|---|
| 1. PrimitiveTag enum + tensor tagging | 60 | Compile-time enforcement where possible |
| 2. Gradient clipping helper | 30 | Per-sample (microbatch-friendly) |
| 3. Gaussian noise helper | 40 | Constant-time sampling (defends weak side-channel) |
| 4. Moments Accountant | 120 | Tracks (ε, δ) across rounds; emits budget-exhausted error |
| 5. Per-cog config schema (σ, C, max_rounds) | 50 | YAML/TOML, validated at federation start |
| 6. End-to-end privacy test | — | Synthetic membership-inference attack vs DP-protected model; verify reconstruction quality is bounded by (ε, δ) prediction |
Combined with ADR-105's 500 LOC, total federation budget revised to ~800 LOC, ~3-week effort.