ADR-121: BFLD Identity Risk Scoring and Coherence Gate

Field	Value
Status	Proposed
Date	2026-05-24
Deciders	ruv
Parent	ADR-118
Relates to	ADR-024 (AETHER), ADR-027 (MERIDIAN), ADR-029 (multistatic fusion), ADR-086 (novelty gate precedent), ADR-120 (privacy class)
Tracking issue	TBD

1. Context

BFLD's distinguishing primitive is the identity_risk_score — a scalar that says "is this capture window currently capable of identifying a specific person?". The score has two consumers:

The operator — exposed as an HA diagnostic sensor (ADR-122). A spike from the long-term baseline indicates the RF environment has shifted toward a higher-leakage regime (new AP firmware, denser MIMO, attacker-grade sniffer in range).
The privacy gate (ADR-120) — when the score crosses a configurable threshold, the gate downgrades the active privacy_class automatically (e.g., 2 → 3) until the score recovers.

The score must be:

Bounded in [0, 1] for HA gauge entities.
Calibrated against actual re-ID success rate, ideally on the KIT BFId dataset.
Computable on-device at ≥ 1 Hz on a Pi 5 core or an aarch64 cognitum-v0.
Stable — small environmental changes should not produce wild swings; the score is for slow-moving regime detection, not per-frame chatter.

ADR-086 (edge novelty gate) establishes a precedent for an on-device gate primitive. BFLD's risk scoring borrows the gate-pattern but with identity leakage as the trigger condition.

2. Decision

2.1 Nine features (from BFLD spec §5)

The features are computed over a sliding window of W = 32 BFI frames (≈3 s at 10 Hz):

Feature	Definition	Source
`mean_angle_delta`	mean( ‖ Φ_t − Φ_{t-1} ‖ over subcarriers )	extractor
`subcarrier_variance`	var( ‖ Φ ‖ over subcarrier axis )	extractor
`temporal_entropy`	Shannon entropy of angle-bin histogram over W	extractor
`doppler_proxy`	FFT peak magnitude of mean-angle time series	features.rs
`path_stability`	1 − ‖ Φ_t − median(Φ_{t-W..t}) ‖ / scale	features.rs
`cross_antenna_correlation`	mean Pearson correlation across n_tx × n_rx pairs	features.rs
`burst_motion_score`	high-pass-filtered angular velocity, soft-thresholded	features.rs
`stationarity_score`	1 − rolling KL divergence over W/2 vs W	features.rs
`identity_separability_score`	top-1 cosine to nearest AETHER cluster centroid	identity_risk.rs

The first eight are sensing features (also used by the presence/motion pipeline). Only the ninth depends on the AETHER embedding and therefore on identity_class >= 1.

2.2 Identity risk formula

rust

pub fn identity_risk_score(
    sep: f32,    // identity_separability_score, [0, 1]
    stab: f32,   // temporal_stability, [0, 1] = ema(path_stability, alpha=0.1)
    consist: f32,// cross_perspective_consistency, [0, 1] = multistatic.rs
    conf: f32,   // sample_confidence, [0, 1] = f(SNR, n_subcarriers, n_rx)
) -> f32 {
    // Clamp inputs, then multiplicative combination — any factor near 0 dominates.
    let s = sep.clamp(0.0, 1.0);
    let t = stab.clamp(0.0, 1.0);
    let p = consist.clamp(0.0, 1.0);
    let c = conf.clamp(0.0, 1.0);
    (s * t * p * c).clamp(0.0, 1.0)
}

Multiplicative combination is chosen so that any weak factor (e.g., very low SNR ⇒ low conf) collapses the score toward 0. This matches the privacy intent: when the system is uncertain, the score should be low and the operator should not be alarmed.

2.3 Calibration target

The score is calibrated against re-ID success rate on a held-out test split of the KIT BFId dataset. A piecewise-linear isotonic regression maps raw scores into a calibrated [0, 1] band where score ≥ 0.8 corresponds to >80% re-ID accuracy on a 5-second window in the calibration dataset.

Calibration parameters live in v2/crates/wifi-densepose-bfld/data/risk_calibration.toml and are versioned independently of the code. A regression update is a content-only PR.

2.4 Coherence gate

The coherence gate (per ADR-029 coherence_gate.rs pattern) consumes the risk score and emits one of four actions:

rust

pub enum GateAction {
    Accept,           // score < 0.5, publish normally
    PredictOnly,      // 0.5 <= score < 0.7, publish but flag confidence
    Reject,           // 0.7 <= score < 0.9, drop the event
    Recalibrate,      // score >= 0.9, drop AND rotate site_salt
}

The Recalibrate action triggers a forced site-salt rotation — an aggressive response to a sustained high-risk regime. It costs the operator continuity of long-term aggregate analytics but is the right answer to an attacker-grade sniffer arriving in range.

2.5 Hysteresis

To prevent oscillation around the gate thresholds, the gate uses ±0.05 hysteresis and a 5-second debounce. A score must cross the boundary by the hysteresis margin and persist for the debounce window before the gate action changes.

2.6 Compute budget

Stage	Target latency	Implementation
Feature extraction (8 features)	< 3 ms per window	ndarray + nalgebra; vectorized over subcarriers
Separability (cosine to centroids)	< 5 ms per window	RuVector RaBitQ index (ADR-085) over ≤ 1k centroids
Risk score	< 0.1 ms	scalar multiplicative
Gate decision + hysteresis	< 0.1 ms	scalar

Total p95 ≤ 10 ms per window on a Pi 5 core (8 ms target). Headroom on cognitum-v0 (Pi 5 + Hailo) is ample; ESP32-S3 hosts only the extraction stage (features computed; risk score is host-side per ADR-123).

3. Consequences

Positive

The risk score becomes a first-class diagnostic surface for operators and a structural input to the privacy gate — both consumers from a single computation.
Multiplicative combination is conservative under uncertainty; the system is biased toward "report low risk when unsure", which is the right default.
Calibration is a content-only update — no recompile needed when the calibration file changes.
The recalibration gate action gives the system a self-healing response to a sniffer arrival without operator intervention.

Negative

Calibration requires the KIT BFId dataset; without it the score is uncalibrated and serves only as an internal trigger, not a publishable signal.
Multiplicative scoring can be dominated by sample_confidence, which is sensitive to channel conditions. A persistent low-SNR environment will keep the published score near 0 even when the underlying separability is high — an under-reporting failure mode that the documentation must call out.
The recalibrate action breaks historical hash continuity by design; an operator who wants long-term aggregates needs to know they will see a discontinuity on recalibrate events.

Neutral

The nine features overlap with the existing CSI pipeline. BFLD computes them on BFI; the CSI pipeline computes them on CSI. Both can be fused via cross_perspective_consistency.

4. Alternatives Considered

Alt 1: Additive scoring (`(s + t + p + c) / 4`)

Rejected: a sample with high separability but very low confidence would still produce a moderate score, which over-reports risk in degraded RF conditions.

Alt 2: Maximum scoring (`max(s, t, p, c)`)

Rejected: over-reports risk because any single high factor pins the output, even if the others contradict it.

Alt 3: Learned scoring (a small MLP)

Rejected for this ADR: introduces an opaque model whose output cannot be audited from first principles. The multiplicative formula is simple, conservative, and directly explainable to operators. A learned model is a future option once enough calibration data is in hand.

Alt 4: Per-feature thresholds instead of a continuous score

Rejected: continuous score is needed for the HA gauge entity and for downstream calibration. Per-feature thresholds would force operators to interpret nine separate binaries.

5. Acceptance Criteria

AC1: All nine features are computed in < 8 ms p95 per window on a Pi 5 core.
AC2: identity_risk_score is monotonic non-decreasing in any single input when the other three are held constant.
AC3: Calibration regression on the KIT BFId test split: score ≥ 0.8 corresponds to ≥ 80% re-ID accuracy ± 5%.
AC4: The coherence gate emits Recalibrate if score is ≥ 0.9 for ≥ 5 seconds.
AC5: Hysteresis prevents action oscillation across ± 0.05 of a threshold within a 5-second window.
AC6: At privacy_class = 3, the risk score is computed but not published to MQTT (kept local for the gate only).
AC7: A reproducible 1,000-frame synthetic fixture produces a deterministic score sequence (bit-identical across runs).

6. References

ADR-118 (umbrella)
ADR-024 (AETHER encoder for separability)
ADR-029 (coherence_gate.rs precedent)
ADR-086 (edge novelty gate pattern)
ADR-120 §2.4 (class transition consumed by gate)
KIT BFId dataset: https://publikationen.bibliothek.kit.edu/1000185756

ADR-121: BFLD Identity Risk Scoring and Coherence Gate

ADR-121: BFLD Identity Risk Scoring and Coherence Gate

1. Context

2. Decision

2.1 Nine features (from BFLD spec §5)

2.2 Identity risk formula

2.3 Calibration target

2.4 Coherence gate

2.5 Hysteresis

2.6 Compute budget

3. Consequences

Positive

Negative

Neutral

4. Alternatives Considered

Alt 1: Additive scoring ((s + t + p + c) / 4)

Alt 2: Maximum scoring (max(s, t, p, c))

Alt 3: Learned scoring (a small MLP)

Alt 4: Per-feature thresholds instead of a continuous score

5. Acceptance Criteria

6. References

Alt 1: Additive scoring (`(s + t + p + c) / 4`)

Alt 2: Maximum scoring (`max(s, t, p, c)`)