docs/adr/ADR-072-wiflow-architecture.md
The WiFi-DensePose project needs a neural architecture that can convert raw CSI amplitude
data into 17-keypoint COCO pose estimates. The existing train-ruvllm.js pipeline uses a
simple 2-layer FC encoder (8 -> 64 -> 128) that produces contrastive embeddings for
presence detection but cannot output spatial keypoint coordinates.
We evaluated published WiFi-based pose estimation architectures:
| Architecture | Params | Input | Key Innovation | Publication |
|---|---|---|---|---|
| WiFlow | 4.82M | 540x20 | TCN + AsymConv + Axial Attention | arXiv:2602.08661 |
| WiPose | 11.2M | 3x3x30x20 | 3D CNN + heatmap regression | CVPR 2021 |
| MetaFi++ | 8.6M | 114x30x20 | Transformer + meta-learning | NeurIPS 2023 |
| Person-in-WiFi 3D | 15.3M | Multi-antenna | Deformable attention + 3D | CVPR 2024 |
WiFlow is the lightest published SOTA architecture, designed specifically for commercial WiFi hardware. Its key advantage is operating on CSI amplitude only (no phase), which is critical for ESP32-S3 where phase calibration is unreliable.
Implement the WiFlow architecture in pure JavaScript (ruvllm native) with the following adaptations for our ESP32 single TX/RX deployment.
CSI Amplitude [128, 20]
|
Stage 1: TCN (Dilated Causal Conv)
dilation = (1, 2, 4, 8), kernel = 7
128 -> 256 -> 192 -> 128 channels
|
Stage 2: Asymmetric Conv Encoder
1xk conv (k=3), stride (1,2)
[1, 128, 20] -> [256, 8, 20]
|
Stage 3: Axial Self-Attention
Width (temporal): 8 heads
Height (feature): 8 heads
|
Decoder: Adaptive Avg Pool + Linear
[256, 8, 20] -> pool -> [2048] -> [17, 2]
|
17 COCO Keypoints [x, y] in [0, 1]
| Aspect | WiFlow Original | Our Adaptation | Reason |
|---|---|---|---|
| Input channels | 540 (18 links x 30 SC) | 128 (1 TX x 1 RX x 128 SC) | Single ESP32 link |
| Time steps | 20 | 20 | Same |
| TCN channels | 540 -> 256 -> 128 -> 64 | 128 -> 256 -> 192 -> 128 | Proportional reduction |
| Spatial blocks | 4 (stride 2) | 4 (stride 2) | Same |
| Attention heads | 8 | 8 | Same |
| Parameters | 4.82M | ~1.8M | Fewer input channels |
| Input type | Amplitude only | Amplitude only | Same |
| Output | 17 x 2 | 17 x 2 | Same |
| Stage | Parameters | % of Total |
|---|---|---|
| TCN (4 blocks, k=7, d=1,2,4,8) | ~969K | 54% |
| Asymmetric Conv (4 blocks, 1x3, stride 2) | ~174K | 10% |
| Axial Attention (width + height, 8 heads) | ~592K | 33% |
| Pose Decoder (pool + linear -> 17x2) | ~70K | 4% |
| Total | ~1.8M | 100% |
L = L_H + 0.2 * L_B
L_H = SmoothL1(predicted, target, beta=0.1)
L_B = (1/14) * sum_b (bone_length_b - prior_b)^2
14 bone connections enforce anatomical constraints:
All lengths normalized to person height.
Since we have no ground-truth pose labels from cameras, training proceeds in three phases:
ContrastiveTrainer with triplet + InfoNCE losstrain-ruvllm.js (ADR-071) train-wiflow.js (ADR-072)
| |
| 8-dim features | 128-dim raw CSI amplitude
| -> 128-dim embedding | -> 17x2 keypoint coordinates
| -> presence/activity/vitals | -> bone-constrained pose
| |
+-- ContrastiveTrainer -----+------+
+-- TrainingPipeline -------+------+
+-- LoRA per-node ----------+------+
+-- TurboQuant quantize ----+------+
+-- SafeTensors export -----+------+
Both pipelines share the ruvllm infrastructure; WiFlow adds the deeper architecture for direct pose regression while the simple encoder handles embedding tasks.
| Metric | Target | Notes |
|---|---|---|
| PCK@20 | > 80% | On lab data with 2+ nodes |
| Forward latency | < 50ms | Pi Zero 2W at INT8 |
| Model size (INT8) | < 2 MB | TurboQuant |
| Bone violation rate | < 10% | 50% tolerance |
| Temporal jitter | < 3cm | Exponential smoothing |
| Risk | Severity | Mitigation |
|---|---|---|
| Single TX/RX has less spatial info than 18 links | High | 2-node multi-static compensates; cross-node fusion from ADR-029 |
| Camera-free labels are coarse | Medium | Bone constraints enforce anatomy; contrastive pretrain provides structure |
| Pure JS too slow for real-time | Medium | INT8 quantization; axial attention is O(H^2W+HW^2) not O(H^2W^2) |
| Overfitting with ~5K frames | Medium | Temporal augmentation + noise + cross-node interpolation |
| Phase not available (amplitude-only) | Low | WiFlow was designed amplitude-only; not a limitation |
| File | Purpose |
|---|---|
scripts/wiflow-model.js | WiFlow architecture (all stages, loss, metrics) |
scripts/train-wiflow.js | Training pipeline (contrastive + pose proxy + LoRA + quant) |
scripts/benchmark-wiflow.js | Benchmarking (latency, params, FLOPs, memory, quality) |
docs/adr/ADR-072-wiflow-architecture.md | This document |
# Train on collected data
node scripts/train-wiflow.js --data data/recordings/pretrain-*.csi.jsonl
# Train with more epochs and custom output
node scripts/train-wiflow.js --data data/recordings/*.csi.jsonl --epochs 50 --output models/wiflow-v2
# Contrastive pretraining only (no labels needed)
node scripts/train-wiflow.js --data data/recordings/*.csi.jsonl --contrastive-only
# Benchmark
node scripts/benchmark-wiflow.js
# Benchmark with trained model
node scripts/benchmark-wiflow.js --model models/wiflow-v1
vendor/ruvector/npm/packages/ruvllm/src/)
ContrastiveTrainer, tripletLoss, infoNCELoss, computeGradientTrainingPipelineLoraAdapter, LoraManagerEwcManagerModelExporter, SafeTensorsWriter