docs/huggingface/MODEL_CARD.md
Detect people, track movement, and measure breathing -- through walls, without cameras, using a $27 sensor kit.
| License | MIT |
| Framework | ONNX Runtime |
| Hardware | ESP32-S3 ($9) + optional Cognitum Seed ($15) |
| Training | Self-supervised contrastive learning (no labels needed) |
| Privacy | No cameras, no images, no personally identifiable data |
This model turns ordinary WiFi signals into a human sensing system. It can detect whether someone is in a room, count how many people are present, classify what they are doing, and even measure their breathing rate -- all without any cameras.
How does it work? Every WiFi router constantly sends signals that bounce off walls, furniture, and people. When a person moves -- or even just breathes -- those bouncing signals change in tiny but measurable ways. WiFi chips can capture these changes as numbers called Channel State Information (CSI). Think of it like ripples in a pond: drop a stone and the ripples tell you something happened, even if you cannot see the stone.
This model learned to read those "WiFi ripples" and figure out what is happening in the room. It was trained using a technique called contrastive learning, which means it taught itself by comparing thousands of WiFi signal snapshots -- no human had to manually label anything.
The result is a small, fast model that runs on a $9 microcontroller and preserves complete privacy because it never captures images or audio.
| Capability | Accuracy | What you need | Notes |
|---|---|---|---|
| Presence detection | >95% | 1x ESP32-S3 ($9) | Is anyone in the room? |
| Motion classification | >90% | 1x ESP32-S3 ($9) | Still, walking, exercising, fallen |
| Breathing rate | +/- 2 BPM | 1x ESP32-S3 ($9) | Best when person is sitting or lying still |
| Heart rate estimate | +/- 5 BPM | 1x ESP32-S3 ($9) | Experimental -- less accurate during movement |
| Person counting | 1-4 people | 2x ESP32-S3 ($18) | Uses cross-node signal fusion |
| Pose estimation | 17 COCO keypoints | 2x ESP32-S3 + Seed ($27) | Full skeleton: head, shoulders, elbows, etc. |
pip install onnxruntime numpy
import onnxruntime as ort
import numpy as np
# Load the encoder model
session = ort.InferenceSession("pretrained-encoder.onnx")
# Simulated 8-dim CSI feature vector from ESP32-S3
# Dimensions: [amplitude_mean, amplitude_std, phase_slope, doppler_energy,
# subcarrier_variance, temporal_stability, csi_ratio, spectral_entropy]
features = np.array(
[[0.45, 0.30, 0.69, 0.75, 0.50, 0.25, 0.00, 0.54]],
dtype=np.float32,
)
# Encode into 128-dim embedding
result = session.run(None, {"input": features})
embedding = result[0] # shape: (1, 128)
print(f"Embedding shape: {embedding.shape}")
print(f"First 8 values: {embedding[0][:8]}")
# Load the task heads model
heads = ort.InferenceSession("pretrained-heads.onnx")
# Feed the embedding from the encoder
predictions = heads.run(None, {"embedding": embedding})
presence_score = predictions[0] # 0.0 = empty, 1.0 = occupied
person_count = predictions[1] # estimated count (float, round to int)
activity_class = predictions[2] # [still, walking, exercise, fallen]
vitals = predictions[3] # [breathing_bpm, heart_bpm]
print(f"Presence: {presence_score[0]:.2f}")
print(f"People: {int(round(person_count[0]))}")
print(f"Activity: {['still', 'walking', 'exercise', 'fallen'][activity_class.argmax()]}")
print(f"Breathing: {vitals[0][0]:.1f} BPM")
print(f"Heart: {vitals[0][1]:.1f} BPM")
+-- Presence (binary)
|
WiFi signals --> ESP32-S3 --> 8-dim features --> Encoder (TCN) --> 128-dim embedding --> Task Heads --+-- Person Count
(CSI) (on-device) (~2.5M params) (~100K) |
+-- Activity (4 classes)
|
+-- Vitals (BR + HR)
The ESP32-S3 captures raw CSI frames at ~100 Hz and computes 8 summary features per window:
| Feature | Description |
|---|---|
amplitude_mean | Average signal strength across subcarriers |
amplitude_std | Variation in signal strength (movement indicator) |
phase_slope | Rate of phase change across subcarriers |
doppler_energy | Energy in the Doppler spectrum (velocity indicator) |
subcarrier_variance | How much individual subcarriers differ |
temporal_stability | Consistency of signal over time (stillness indicator) |
csi_ratio | Ratio between antenna pairs (direction indicator) |
spectral_entropy | Randomness of the frequency spectrum |
This model was trained using self-supervised contrastive learning, which means it learned entirely from unlabeled WiFi signals. No cameras, no manual annotations, and no privacy-invasive data collection were needed.
The training process works like this:
collection-witness.json) that proves data provenance and integrityThe collection-witness.json file contains a chain of SHA-256 hashes linking every step from raw CSI capture through feature extraction to model training. This allows anyone to verify that the published model was trained on data collected by specific hardware at a specific time.
| Component | What it does | Cost | Where to get it |
|---|---|---|---|
| ESP32-S3 (8MB flash) | Captures WiFi CSI + runs feature extraction | ~$9 | Amazon, AliExpress, Adafruit |
| USB-C cable | Power + data | ~$3 | Any electronics store |
This gets you: presence detection, motion classification, breathing rate.
Add a second ESP32-S3 to enable cross-node signal fusion for better accuracy and person counting.
| Component | What it does | Cost |
|---|---|---|
| 2x ESP32-S3 (8MB) | WiFi CSI sensing nodes | ~$18 |
| Cognitum Seed (Pi Zero 2W) | Runs inference + collects ground truth | ~$15 |
| USB-C cables (x3) | Power + data | ~$9 |
| Total | ~$27 |
The Cognitum Seed runs the ONNX models on-device, orchestrates the ESP32 nodes over USB serial, and provides environmental ground truth via its onboard PIR and BME280 sensors.
| File | Size | Description |
|---|---|---|
pretrained-encoder.onnx | ~2 MB | Contrastive encoder (TCN backbone, 8-dim input, 128-dim output) |
pretrained-heads.onnx | ~100 KB | Task heads (presence, count, activity, vitals) |
pretrained.rvf | ~500 KB | RuVector format embeddings for advanced fusion pipelines |
room-profiles.json | ~10 KB | Environment calibration profiles (room geometry, baseline noise) |
collection-witness.json | ~5 KB | Cryptographic witness chain proving data provenance |
config.json | ~2 KB | Training configuration (hyperparameters, feature schema, versions) |
README.md | -- | This file |
The .rvf file contains pre-computed embeddings in RuVector format, used by the RuView application for advanced multi-node fusion and cross-viewpoint pose estimation. You only need this if you are using the full RuView pipeline. For basic inference, the ONNX files are sufficient.
RuView is the open-source application that ties everything together: firmware flashing, real-time sensing, and a browser-based dashboard.
git clone https://github.com/ruvnet/RuView.git
cd RuView
# Flash firmware (requires ESP-IDF v5.4 or use pre-built binaries from Releases)
# See the repo README for platform-specific instructions
pip install huggingface_hub
huggingface-cli download ruvnet/wifi-densepose-pretrained --local-dir models/
# Start the CSI bridge (connects ESP32 serial output to the inference pipeline)
python scripts/seed_csi_bridge.py --port COM7 --model models/pretrained-encoder.onnx
# Or run the full sensing server with web dashboard
cargo run -p wifi-densepose-sensing-server
The model works best after a brief calibration period (~60 seconds of no movement) to learn the baseline signal characteristics of your specific room. The room-profiles.json file contains example profiles; the system will create one for your environment automatically.
Be honest about what this technology can and cannot do:
WiFi sensing is a privacy-preserving alternative to cameras, but it still detects human presence and activity. Consider these points:
If you use this model in your research, please cite:
@software{wifi_densepose_2026,
title = {WiFi-DensePose: Human Pose Estimation from WiFi Channel State Information},
author = {ruvnet},
year = {2026},
url = {https://github.com/ruvnet/RuView},
license = {MIT},
note = {Self-supervised contrastive learning on ESP32-S3 CSI data}
}
MIT License. See LICENSE for details.
You are free to use, modify, and distribute this model for any purpose, including commercial applications.