docs/adr/ADR-103-learned-multi-person-counter.md
PR #491 stopped the bleeding on #499. The fix replaced hard-coded denominators (variance/300, motion_band_power/250, spectral_power/500) with a self-calibrating RollingP95 streaming estimator and exposed the multi-node dedup_factor as a runtime knob. Day-0 deployments no longer collapse dynamic range, and operators can auto-tune the divisor from a known person count.
That gets us to a stable heuristic that adapts to the room. It does not get us to the published WiFi-CSI counting state of the art:
| System | Setup | Reported accuracy | Method |
|---|---|---|---|
| WiCount (CMU, 2017) | Intel 5300 3×3 MIMO | 89% within ±1 | LSTM over CSI amplitude |
| DeepCount (2018) | Atheros 3×3 | 92% within ±1, 5-room | CNN + cross-environment transfer |
| CrossCount (2019) | Atheros, 6 rooms | 84% cross-room within ±1 | Domain-adversarial CNN |
| HeadCount (2021) | Intel 5300 | <1 person MAE, 5 envs | Multi-stream CSI + attention |
| RuView today (PR #491) | ESP32-S3 1×1 SISO | Calibrated heuristic; not measured against ground truth | RollingP95 + dedup_factor |
The literature uses 3×3 MIMO research NICs. RuView uses 1×1 SISO ESP32-S3 nodes. The published number is therefore not directly attainable, but the architectural gap is large enough that a learned-counter approach on our hardware should comfortably beat today's slot heuristic — and the infrastructure to train one already exists in this repo (Candle + RTX 5080 trained pose_v1.safetensors in 2.1 s yesterday — see docs/benchmarks/pose-estimation-cog.md).
Five primitives we already have but don't yet compose into a counter:
scripts/collect-ground-truth.py + scripts/align-ground-truth.js (PR #641 streaming-safe). 1,077 samples currently; #645 tracks the path to ~30K.ruvector-mincut (already a workspace dep). The Candle trainer used it yesterday and reported Min-cut value: 0.1538 — partition: [55, 1] subcarriers.ruvnet/wifi-densepose-pretrained on HF (12.2M training steps, 60K frames, 128-dim embeddings, ~165k emb/s on M4 Pro).multistatic_bridge.rs already aggregates per-node feature vectors with the tunable dedup_factor. The new model output can be a drop-in replacement for the existing dedup divisor.Train and ship a small learned multi-person counter as a new Cognitum Cog (cog-person-count), modelled on the same packaging path as cog-pose-estimation (ADR-101). Wire it into the sensing-server's existing person-count call site (csi.rs::score_to_person_count) as a drop-in replacement for the slot heuristic.
┌──────────────────────────────┐
per-node CSI window │ Encoder (frozen first 50 ep) │
[56 sub × 20 frames] ─► init from ruvnet/wifi- │
│ densepose-pretrained │
│ → 128-dim embedding │
└──────────────┬───────────────┘
│
┌────────────────┴────────────────┐
▼ ▼
┌────────────────────┐ ┌────────────────────────┐
│ Count head │ │ Confidence head │
│ Linear(128→64) │ │ Linear(128→32) │
│ ReLU │ │ ReLU │
│ Linear(64→8) │ │ Linear(32→1) + sigmoid│
│ → softmax over │ │ → calibrated p(correct)│
│ {0..7} persons │ └────────────────────────┘
└────────┬───────────┘
│ (per-node prediction)
│
N nodes' per-node │
counts + confidences ▼
┌─────────────────────────────────────┐
│ Multi-node fusion (Stoer-Wagner) │
│ • build graph: nodes × subcarrier │
│ feature similarity │
│ • min-cut → distinct-person bound │
│ • combine with per-node count head │
│ via confidence-weighted vote │
└──────────────────┬──────────────────┘
▼
{ count: int,
confidence: float [0,1],
count_p95_low: int,
count_p95_high: int,
per_node_breakdown: [...] }
Five things to call out about this architecture:
{0..7} people, not regression to a real number. Counts are integer-valued; classification gives a calibrated probability per count and lets the confidence head produce a meaningful uncertainty.{count, confidence, count_p95_low, count_p95_high}, not a single integer. Downstream consumers (Cogs / dashboard / alerts) can choose their certainty threshold. This is what closes the loop on the #499 UX: when the model is uncertain, the dashboard renders one stick figure with a "?" badge rather than two ghosts.Same exact pipeline that produced pose_v1.safetensors yesterday. Differences:
| Pose cog (today) | Count cog (this ADR) | |
|---|---|---|
| Input | [56, 20] CSI window | [56, 20] CSI window (identical) |
| Encoder init | random (HF arch mismatch) | from HF presence model (architectures are compatible — same encoder Φ) |
| Output head | Linear(128 → 256 → 34) keypoints | Linear(128 → 64 → 8) count classes + Linear(128 → 32 → 1) confidence |
| Loss | Confidence-weighted SmoothL1 | Categorical cross-entropy + Brier-score uncertainty calibration |
| Labels | MediaPipe keypoints | Camera count (MediaPipe pose_landmarks length) |
| Data | 1,077 paired (P7) | Same source, same script — collect-ground-truth.py already records n_persons per frame |
Crucially we get the count labels for free from the existing pose data-collection pipeline — collect-ground-truth.py already records "n_persons" per camera frame and align-ground-truth.js already preserves it through windowing. No new data collection campaign required to bootstrap; we can train tomorrow on the same 1,077 samples that produced pose_v1.
The per-node count head + confidence head emit a categorical distribution over {0..7}. With N nodes, we have N such distributions plus N confidence scalars. Two fusion paths:
log p_fused(k) = Σ_n c_n · log p_n(k). Simple, no extra parameters, comes from the optimal-expert combination literature.{0..min-cut} before re-normalising. This is exactly what ruvector-mincut was added to the workspace for — it's been waiting for a counting consumer.Both fuse cleanly. v0.1.0 ships the log-sum; v0.2.0 adds the min-cut clipper after the first round of evaluation.
| Failure mode of today's slot heuristic | How the learned counter avoids it |
|---|---|
| #499 — fixed denominators clamp → one person renders as 2+ groups | Encoder produces a fixed-dim embedding; the count head is invariant to feature magnitude, only to feature shape |
dedup_factor per-room tuning is operator-visible toil | Count head's softmax is a learned per-room normaliser by construction |
| Adding nodes makes the count noisier under the slot heuristic | Multi-node fusion is additive in confidence, so each node either reduces uncertainty or stays neutral — never amplifies it |
| No per-frame uncertainty signal | confidence + count_p95_low/high exposed in every emit |
| Catastrophic failure on novel environments | LoRA per-room adapter (per ADR-079 P9 plan) hot-swappable without retraining |
| Gate | v0.1.0 (initial release) | v0.2.0 (after data scaling) |
|---|---|---|
| Day-0 deployment (no calibration) | ≥ 80% within ±1 on same-room test set | ≥ 90% within ±1 |
| Cross-room (held-out environment) | ≥ 60% within ±1 | ≥ 75% within ±1 |
| Mean Absolute Error | ≤ 0.6 persons | ≤ 0.4 persons |
| Per-frame confidence reflects accuracy | Spearman correlation r ≥ 0.5 between confidence and (predicted == true) | r ≥ 0.7 |
| Inference latency on Pi 5 (Cog) | < 5 ms / frame cold-start | < 5 ms / frame |
| Binary size on GCS | ≤ 4 MB (matches cog-pose-estimation) | ≤ 4 MB |
v0.1.0 is intentionally modest — it's bounded by data-collection scale (#645). The framework is the deliverable; the accuracy follows the data.
v2/crates/cog-person-count/ # NEW (this ADR)
├── Cargo.toml
├── src/
│ ├── main.rs # cog runtime: version | manifest | health | run
│ ├── lib.rs
│ ├── inference.rs # Candle forward pass on per-node CSI
│ ├── fusion.rs # Stoer-Wagner upper-bound + confidence-weighted log-sum
│ └── publisher.rs # emits {count, confidence, count_p95_low, count_p95_high}
├── cog/
│ ├── manifest.template.json
│ ├── config.schema.json
│ ├── README.md
│ └── artifacts/ # filled by the release pipeline
│ ├── count_v1.safetensors
│ ├── count_v1.onnx
│ └── train_results.json
└── tests/
├── smoke.rs # 5+ tests
└── fusion_test.rs # multi-node-fusion math
Plus a small server-side wiring change:
v2/crates/wifi-densepose-sensing-server/src/csi.rs::score_to_person_count — call the cog over the same /api/v1/edge/registry-discovered runtime as cog-pose-estimation. Falls back to today's PR #491 heuristic if the cog isn't installed (per the ADR-100 stub-fallback pattern).confidence + count_p95_low/high outputs let the UI render uncertainty instead of inventing ghosts.pose_v1 (#645). Without more multi-room data, v0.1.0 ships with modest absolute accuracy.ruvector-mincut. Acceptable given the ≤ 5 ms budget but worth tracking.n_persons = 0 even when a person was clearly in the room. The count head learns from this noisy signal; mitigations include filtering by MediaPipe confidence ≥ 0.7 before training, and weighting the loss by confidence (same trick used in pose_v1).train_results.json to make this call empirically.tests/fusion_test.rs::single_node_degrades_gracefully).count_v1.safetensors on the existing 1,077 paired samples + n_persons labels. Same Candle pipeline that produced pose_v1.cognitum-v0 per ADR-101's pattern.csi.rs::score_to_person_count to call the cog when installed; keep PR #491's heuristic as fallback.dedup_factor (the heuristic this learned counter replaces).docs/benchmarks/pose-estimation-cog.md — measured perf envelope for the cog runtime this ADR targets.