docs/research/sota-2026-05-22/R15-rf-biometric-primitives.md
Status: synthesis + privacy framing · 2026-05-22
R3 asked "can we re-identify the same person across two rooms?" and answered yes, conditional on MERIDIAN env-subtraction. R15 asks the deeper question: what features in the CSI signal are environment-invariant by construction — properties of the person's physiology that exist independent of multipath geometry?
If R3 is "the same vector appears in two embedding spaces", R15 is "what physical attribute of the body actually drives that vector". Without R15, R3 is statistical pattern-matching with no theory of why it works.
This thread catalogues five biometric primitives that survive cross-environment transfer, ranks them by invariance + discriminability + measurement difficulty, and frames the privacy implications.
Physical basis: stride frequency is determined by leg length, mass distribution, gait pattern (asymmetry coefficient). Per-individual reproducibility is ~3-5% within a year (Murray 1964); across years it drifts with fitness/age. Invariant to environment.
Discriminability: ~5-7 bits per person (Begg 2006, gait literature consensus). Enough to separate ~30-100 individuals before false-match probability exceeds 1%.
Measurement difficulty: R10's gait-band DSP (0.5-15 Hz) already extracts this. Stride frequency robust to multipath; stride asymmetry needs higher SNR (gait phase shape, not just rate).
Cross-room invariance: HIGH. The carrier of the gait signature is the Doppler shift induced by leg motion; the magnitude depends on environment (Fresnel envelope, R6) but the frequency doesn't.
Physical basis: resting respiration rate is a person-specific physiological setpoint (12-20 BPM normal range, individual ±2 BPM). The tidal-volume envelope (chest expansion amplitude) scales with lung capacity, which scales with body size and age. Invariant to environment at the rate level.
Discriminability: ~3-4 bits at the rate level alone. Combined with envelope amplitude it could reach 5-6 bits. The combined signal also has phase information (inhale/exhale ratio, breathing irregularity) that adds another 1-2 bits.
Measurement difficulty: vital_signs pipeline already extracts breathing rate. Envelope amplitude is noisier; needs ~10× more averaging.
Cross-room invariance: HIGH. Same reasoning as gait — temporal frequency is invariant, only amplitude is environment-dependent.
Physical basis: HRV is a person-specific autonomic-nervous-system signature. Resting HRV varies ±15-30 ms between individuals; under stress it changes predictably per person.
Discriminability: ~4-5 bits per person (Hjortskov 2004, HRV literature). The full HRV time-series adds another 2-3 bits over the summary statistics.
Measurement difficulty: R13's NEGATIVE physics scrutiny showed that waveform-shape HR recovery from CSI is 5 dB short of the floor. Rate-level HRV (R-R interval variability) is achievable; contour-shape HRV (which gives the autonomic signature) is not.
Cross-room invariance: HIGH at rate level, LOW at contour level. The achievable subset is rate-level HRV, which is real but lower discriminability than published claims that assume contour recovery.
Physical basis: the radar cross-section (RCS) of a stationary human at WiFi frequencies is roughly proportional to body surface area (~0.6 m² for adult, ~0.2 m² for small child). The frequency-dependent RCS shape encodes body size + body composition (fat/muscle/water ratios affect dielectric properties).
Discriminability: ~3-5 bits per person. Lower than gait or HRV because it's gross-body-only.
Measurement difficulty: Needs calibration against a known reference target in the same environment. Cross-room calibration is a research problem.
Cross-room invariance: MEDIUM. Absolute RCS depends on environment (Fresnel envelope, R6); but the ratio of RCS at different subcarrier frequencies (the frequency response of the body) is environment-invariant by R6's forward model.
Physical basis: per-individual stride length, step-time asymmetry, hip-sway pattern. These are determined by skeletal proportions + neuromuscular control. Highly invariant to environment.
Discriminability: 6-9 bits per person when full dynamics are recovered (Cunado 2003, biometric-gait literature). Among the highest-discriminability biometrics short of fingerprint.
Measurement difficulty: Requires recovering the pose (limb positions) from CSI, not just the gait rate. The full pose-from-CSI pipeline (ADR-079, ADR-101) gets within ~92.9% PCK@20 — good enough to extract limb timing in clean conditions.
Cross-room invariance: HIGH when pose is recovered correctly. The pose extractor itself uses MERIDIAN (R3) for cross-room transfer; if the pose pipeline works cross-room, so does the gait dynamics biometric.
Combining all five (assuming statistical independence, which is not true — gait correlates with body size, HRV correlates with age, etc. — so this is a soft upper bound):
| Primitive | Bits (cross-room achievable) |
|---|---|
| Gait stride frequency | 5 |
| Breathing rate + envelope | 5 |
| HRV (rate-level only) | 4 |
| Body-size RCS frequency response | 4 |
| Walking dynamics (limb timing) | 7 |
| Composite (statistically independent upper bound) | 25 bits |
| Composite (realistic correlation correction) | ~12-15 bits |
12-15 bits of biometric is enough to uniquely identify a person within a population of ~4k-30k. For a household of 4 people, that's overwhelming discrimination. For a building of 1000 people, easily sufficient. For city-scale surveillance, it would need to combine with other modalities — but the primitive is already there.
This is the part R14 + R3 hinted at but didn't fully spell out:
RF biometric is harder to remove than visual biometric. A face can be obscured with a mask. A fingerprint can be left at home. A gait + breathing + RCS signature is emitted continuously, without subject awareness, through walls.
Specifically:
These constraints take the R3 + ADR-105 framework and push it harder:
| R3 / ADR-105 constraint | R15-strengthened version |
|---|---|
| No cross-installation linkage | Hardware-isolated embedding spaces, cryptographically prove they're isolated |
| Embedding storage requires opt-in | Storage of any RF-biometric-derivable signature requires opt-in, not just the final embedding |
| Cryptographically verifiable forgetting | Forget the raw extracted biometric primitives (gait freq, breath rate, RCS curve) — not just the model output |
| No re-ID across legal entities | No sharing of any RF biometric primitive across legal entities, including aggregate / derived versions |
The federation protocol (ADR-105) needs an additional constraint:
The federation aggregator MUST NOT receive any raw per-subject biometric primitive (gait frequency, breath rate, RCS curve, limb timing). It MAY receive aggregated, MERIDIAN-normalised embedding deltas. Per-subject primitives stay on-device.
This is stronger than ADR-105's existing "data stays on-device" because MERIDIAN deltas are not "data" in the conventional sense — they're learned model parameters. But the learned parameters encode biometric features. R15 says: encode them as you must, but the measurement of the underlying biometric must never leave the device.
Concretely: the Cognitum Seed runs extract_gait_freq(csi_window) locally, produces a 5-bit signature, uses it in inference, does not send the signature to the coordinator. The coordinator sees only the model delta that influenced inference outcomes.
This adds a constraint to the ADR-105 implementation. ADR-106 (next ADR after the deferred DP-SGD) should formalise the on-device-only primitive list.
This is the loop's final research thread before the deferred follow-up items begin. After R15:
Closed: the question "what RF biometrics exist and how do they invariantise" has a worked answer.
Open: ADR-106 (on-device DP-SGD + primitive isolation), R6.1 (multi-scatterer extension), R3 follow-up (physics-informed env_sig prediction), R6.2 (Fresnel-aware antenna placement).
Together with the 12 prior threads, R15 makes the per-occupant feature surface (R14 V1/V2/V3) fully grounded in physics and constraints, with no remaining unspecified primitives. The remaining work is implementation + measurement, not research.