Back to Ruview

ADR-071: ruvllm Training Pipeline for CSI Sensing Models

docs/adr/ADR-071-ruvllm-training-pipeline.md

0.7.016.7 KB
Original Source

ADR-071: ruvllm Training Pipeline for CSI Sensing Models

  • Status: Proposed
  • Date: 2026-04-02
  • Deciders: ruv
  • Relates to: ADR-069 (Cognitum Seed CSI Pipeline), ADR-070 (Self-Supervised Pretraining), ADR-024 (Contrastive CSI Embedding / AETHER), ADR-016 (RuVector Training Pipeline)

Context

The WiFi-DensePose project needs a training pipeline to convert collected CSI data (.csi.jsonl frames from ESP32 nodes) into deployable models for presence detection, activity classification, and vital sign estimation.

Previous ADRs established the data collection protocol (ADR-070) and Cognitum Seed inference target (ADR-069). What was missing was the actual training, refinement, quantization, and export pipeline connecting raw CSI recordings to deployable models.

Why ruvllm instead of PyTorch

CriterionruvllmPyTorchONNX Runtime
Runtime dependencyNode.js onlyPython + CUDA + pipC++ runtime
Install size~5 MB (npm)~2 GB (torch+cuda)~50 MB
SONA adaptation<1ms nativeN/AN/A
Quantization2/4/8-bit TurboQuantINT8/FP16 (separate tool)INT8 only
LoRA fine-tuningBuilt-in LoraAdapterRequires PEFT libraryN/A
EWC protectionBuilt-in EwcManagerManual implementationN/A
SafeTensors exportNative SafeTensorsWriterVia safetensors libraryN/A
Contrastive trainingBuilt-in ContrastiveTrainerManual triplet lossN/A
Edge deploymentESP32, Pi Zero, browserGPU servers onlyARM (limited)
M4 Pro performance88-135 tok/s native~30 tok/s (MPS)~50 tok/s
Ecosystem integrationRuVector, Cognitum SeedStandaloneStandalone

The ruvllm package (@ruvector/ruvllm v2.5.4) provides the complete training lifecycle in a single dependency: contrastive pretraining, task head training, LoRA refinement, EWC consolidation, quantization, and SafeTensors/RVF export. No Python dependency means the entire pipeline runs on the same Node.js runtime as the Cognitum Seed inference engine.

Decision

Use ruvllm's ContrastiveTrainer, TrainingPipeline, LoraAdapter, EwcManager, SafeTensorsWriter, and ModelExporter for the complete CSI model training lifecycle.

Training Phases

The pipeline executes five sequential phases:

Phase 1: Contrastive Pretraining

Learns an embedding space where temporally and spatially similar CSI states are close and dissimilar states are far apart.

  • Encoder architecture: 8-dim CSI feature vector -> 64-dim hidden (ReLU) -> 128-dim embedding (L2-normalized)
  • Loss functions: Triplet loss (margin=0.3) + InfoNCE (temperature=0.07)
  • Triplet strategies:
    • Temporal positive: frames within 1 second (same environment state)
    • Temporal negative: frames >30 seconds apart (different state)
    • Cross-node positive: same timestamp from different ESP32 nodes (same person, different viewpoint)
    • Cross-node negative: different timestamp + different node
    • Hard negatives: frames near motion energy transition boundaries
  • Hyperparameters: 20 epochs, batch size 32, hard negative ratio 0.7
  • Implementation: ContrastiveTrainer.addTriplet() + .train()

Phase 2: Task Head Training

Trains supervised heads on top of the frozen embedding for specific sensing tasks.

  • Presence head: 128 -> 1 (sigmoid), threshold at presence_score > 0.3
  • Activity head: 128 -> 3 (softmax: still/moving/empty), derived from motion_energy thresholds
  • Vitals head: 128 -> 2 (linear: breathing BPM, heart rate BPM), normalized targets
  • Implementation: TrainingPipeline.addData() + .train() with cosine LR scheduler, early stopping (patience=5), and quality-weighted MSE loss

Phase 3: LoRA Refinement

Per-node LoRA adapters for room-specific adaptation without forgetting the base model.

  • Configuration: rank=4, alpha=8, dropout=0.1
  • Per-node training: Each ESP32 node gets its own LoRA adapter trained on node-specific data with reduced learning rate (0.5x base)
  • Implementation: LoraManager.create() for each node, TrainingPipeline with LoraAdapter passed to constructor

Phase 4: Quantization (TurboQuant)

Reduces model size for edge deployment with minimal quality loss.

Bit WidthCompressionTypical RMSETarget Device
8-bit4x<0.001Cognitum Seed (Pi Zero)
4-bit8x<0.01Standard edge inference
2-bit16x<0.05ESP32-S3 feature extraction
  • Method: Uniform affine quantization with scale/zero-point per tensor
  • Quality validation: RMSE between original fp32 and dequantized weights

Phase 5: EWC Consolidation

Elastic Weight Consolidation prevents catastrophic forgetting when the model is later fine-tuned on new room data or updated CSI conditions.

  • Fisher information: Computed from training data gradients
  • Lambda: 2000 (base), 3000 (per-node)
  • Tasks registered: Base pretraining + one per ESP32 node
  • Implementation: EwcManager.registerTask() for each training phase

Data Pipeline

.csi.jsonl files
    |
    v
Parse frames: feature (8-dim), vitals, raw CSI
    |
    v
Generate contrastive triplets (temporal, cross-node, hard negatives)
    |
    v
Encode through CsiEncoder (8 -> 64 -> 128)
    |
    v
Phase 1: ContrastiveTrainer (triplet + InfoNCE loss)
    |
    v
Phase 2: TrainingPipeline (presence + activity + vitals heads)
    |
    v
Phase 3: LoRA per-node refinement
    |
    v
Phase 4: TurboQuant (2/4/8-bit quantization)
    |
    v
Phase 5: EWC consolidation
    |
    v
Export: SafeTensors, JSON config, RVF manifest, per-node LoRA adapters

Export Formats

FormatFileConsumer
SafeTensorsmodel.safetensorsHuggingFace ecosystem, general inference
JSON configconfig.jsonModel loading metadata
JSON modelmodel.jsonFull model state for Node.js loading
Quantized binariesquantized/model-q{2,4,8}.binEdge deployment
Per-node LoRAlora/node-{id}.jsonRoom-specific adaptation
RVF manifestmodel.rvf.jsonlCognitum Seed ingest (ADR-069)
Training metricstraining-metrics.jsonDashboards, CI validation

Hardware Targets

DeviceRoleQuantizationExpected Latency
Mac Mini M4 ProTraining (primary)fp32<5 min total
Cognitum Seed Pi ZeroInference4-bit / 8-bit<10 ms per frame
ESP32-S3Feature extraction only2-bit (encoder weights)<5 ms per frame
Browser (WASM)Visualization4-bit<20 ms per frame

Performance Targets

MetricTargetMeasured
Training time (5,783 frames, M4 Pro)<5 minTBD
Inference latency (M4 Pro)<1 msTBD
Inference latency (Pi Zero)<10 msTBD
SONA adaptation<1 ms<0.05 ms (ruvllm spec)
Presence detection accuracy>85%TBD
4-bit quality loss (RMSE)<0.01TBD
2-bit quality loss (RMSE)<0.05TBD

Consequences

Positive

  • Zero Python dependency: The entire training and inference pipeline runs on Node.js, eliminating Python/CUDA/pip dependency management on training and deployment targets.
  • Integrated lifecycle: Contrastive pretraining, task heads, LoRA refinement, EWC consolidation, and quantization in a single script using one library.
  • Edge-first: 2-bit quantization enables running the encoder on ESP32-S3. 4-bit quantization fits comfortably on Cognitum Seed Pi Zero.
  • Continual learning: EWC protection means the model can be updated with new room data without losing previously learned patterns.
  • Per-node adaptation: LoRA adapters allow room-specific fine-tuning with minimal storage overhead (rank-4 adapter ~2KB per node).
  • HuggingFace compatibility: SafeTensors export enables sharing models on the HuggingFace Hub and loading in other frameworks.
  • Reproducibility: Seeded encoder initialization and deterministic data pipeline ensure reproducible training runs.

Negative

  • No GPU acceleration: ruvllm's JS training loop does not use GPU compute. For the small model sizes in CSI sensing (8->64->128), this is acceptable (~seconds on M4 Pro), but would not scale to large vision models.
  • Simplified backpropagation: The LoRA backward pass and contrastive training use approximate gradient updates rather than full automatic differentiation. Sufficient for the target model sizes but not equivalent to PyTorch autograd.
  • Quantization is post-training only: No quantization-aware training (QAT). For 4-bit and 8-bit this produces acceptable quality loss; 2-bit may need QAT in future if quality degrades.

Risks

  • Quality ceiling: The simplified training may produce lower accuracy than a PyTorch-trained equivalent. Mitigated by: (a) the model is small enough that the training loop converges quickly, (b) SONA adaptation can compensate at inference time, (c) we can switch to PyTorch for training only if needed while keeping ruvllm for inference.
  • ruvllm API stability: The library is at v2.5.4 with active development. Mitigated by vendoring the package in vendor/ruvector/npm/packages/ruvllm/.

Implementation

Scripts

ScriptPurpose
scripts/train-ruvllm.jsFull 5-phase training pipeline
scripts/benchmark-ruvllm.jsModel benchmarking (latency, quality, accuracy)

Usage

bash
# Train on collected CSI data
node scripts/train-ruvllm.js \
  --data data/recordings/pretrain-1775182186.csi.jsonl \
  --output models/csi-v1 \
  --epochs 20

# Train with benchmark
node scripts/train-ruvllm.js \
  --data data/recordings/pretrain-*.csi.jsonl \
  --output models/csi-v1 \
  --benchmark

# Standalone benchmark
node scripts/benchmark-ruvllm.js \
  --model models/csi-v1 \
  --data data/recordings/pretrain-*.csi.jsonl \
  --samples 5000 \
  --json

Output Structure

models/csi-v1/
  model.safetensors          # SafeTensors (HuggingFace compatible)
  config.json                # Model configuration
  model.json                 # Full JSON model state
  model.rvf.jsonl            # RVF manifest for Cognitum Seed
  training-metrics.json      # Training loss curves, timing, config
  contrastive/
    triplets.jsonl           # Contrastive training pairs
    triplets.csv             # CSV format for analysis
    embeddings.json          # Embedding matrices
  quantized/
    model-q2.bin             # 2-bit quantized (ESP32 edge)
    model-q4.bin             # 4-bit quantized (Pi Zero default)
    model-q8.bin             # 8-bit quantized (high quality)
  lora/
    node-1.json              # LoRA adapter for ESP32 node 1
    node-2.json              # LoRA adapter for ESP32 node 2

Camera-Free Supervision

Motivation

Traditional WiFi-based pose estimation (WiFlow, Person-in-WiFi) requires camera-supervised training: a camera captures ground-truth poses during CSI collection, and the model learns to map CSI to those poses. This creates a deployment paradox — the camera is needed for training but the whole point of WiFi sensing is to avoid cameras.

The camera-free pipeline (scripts/train-camera-free.js) replaces camera supervision with 10 sensor signals from the Cognitum Seed and 2 ESP32 nodes, generating weak labels through sensor fusion.

10 Supervision Signals (No Camera)

#SignalSourceProvides
1PIR sensorSeed GPIO 6Binary presence ground truth
2BME280 temperatureSeed I2C 0x76Occupancy proxy (temp rises with people)
3BME280 humiditySeed I2C 0x76Breathing confirmation / zone
4Cross-node RSSI2 ESP32 nodesRough XY position (differential triangulation)
5Vitals stabilityESP32 CSIHR/BR variance indicates activity level
6Temporal CSI patternsESP32 CSIPeriodic=walking, stable=sitting, flat=empty
7kNN cluster labelsSeed vector storeNatural groupings in embedding space
8Boundary fragilitySeed Stoer-WagnerRegime change detection (entry/exit/activity)
9Reed switchSeed GPIO 5Door open/close events
10Vibration sensorSeed GPIO 13Footstep detection

Camera-Free Training Phases

The pipeline extends the base 5 phases with camera-free-specific phases:

Phase 0: Multi-Modal Data Collection
  ├── UDP port 5006 → ESP32 CSI features + vitals
  ├── HTTPS → Seed sensor embeddings (45-dim, every 100ms)
  ├── HTTPS → Seed boundary/coherence (every 10s)
  └── Build synchronized MultiModalFrame timeline

Phase 1: Weak Label Generation
  ├── Presence: PIR || CSI_presence > 0.3 || temp_rising > 0.1°C/min
  ├── Position: RSSI differential → 5×5 grid (25 zones)
  ├── Activity: CSI variance + FFT periodicity → stationary/walking/gesture/empty
  ├── Occupancy: max(node1_persons, node2_persons) validated by temp
  ├── Body region: upper/lower subcarrier groups → which body part moves
  ├── Entry/exit: reed_switch + PIR transition + boundary fragility spike
  ├── Breathing zone: humidity change rate → person location
  └── Pose proxy: 5-keypoint coarse pose from RSSI + subcarrier asymmetry + vibration

Phase 2: Enhanced Contrastive Pretraining
  ├── Base triplets (temporal, cross-node, transition, scenario boundary)
  ├── Sensor-verified negatives: PIR=0 vs PIR=1 must differ
  ├── Activity boundary: before/after fragility spike must differ
  └── Cross-modal: CSI embedding ≈ Seed embedding for same state

Phase 3: Pose Proxy Training (5-keypoint)
  ├── Head: RSSI centroid between 2 nodes
  ├── Hands: per-subcarrier variance asymmetry (left/right from 2 nodes)
  ├── Feet: vibration sensor + RSSI ground reflection
  └── Skeleton physics constraints (anthropometric bone length limits)

Phase 4: 17-Keypoint Interpolation
  ├── Shoulders = 0.3 × head + 0.7 × hands
  ├── Elbows = midpoint(shoulder, hand)
  ├── Hips = midpoint(head, feet)
  ├── Knees = midpoint(hip, foot)
  ├── Face = derived from head position
  └── Iterative bone length constraint projection (3 iterations)

Phase 5: Self-Refinement Loop (3 rounds)
  ├── Run inference on all collected data
  ├── Keep predictions where temporal consistency confidence > 0.8
  ├── Use as pseudo-labels for next training round
  └── Decaying learning rate per round (diminishing returns)

Seed API Endpoints Used

EndpointDataCollection Rate
GET /api/v1/sensor/streamSSE sensor readingsContinuous (100ms)
GET /api/v1/sensor/embedding/latest45-dim sensor embeddingPer-frame
GET /api/v1/boundaryFragility scoreEvery 10s
GET /api/v1/coherence/profileTemporal phase boundariesEvery 10s
GET /api/v1/store/querykNN similarity searchOn demand
POST /api/v1/boundary/recomputeTrigger analysisOn regime change

Graceful Degradation

The pipeline works with or without the Cognitum Seed:

ModeSignalsPose Quality
Full (Seed + 2 ESP32)10 signals5-keypoint trained, 17-keypoint interpolated
CSI-only (2 ESP32)3 signals (RSSI, vitals, temporal)Coarser position/activity only
Single node2 signals (vitals, temporal)Presence + activity only

When the Seed API is unreachable, the pipeline automatically falls back to CSI-only training, producing the same output format (SafeTensors, HuggingFace, quantized) with reduced label quality.

Output Format

Same as the base pipeline (SafeTensors + HuggingFace compatible), plus:

FileDescription
pose-decoder.json5-keypoint pose decoder weights
model.rvf.jsonlExtended with camera_free_supervision record
training-metrics.jsonIncludes weak label stats and multi-modal triplet counts

Usage

bash
# Full pipeline with Seed
node scripts/train-camera-free.js \
  --data data/recordings/pretrain-*.csi.jsonl \
  --seed-url https://169.254.42.1:8443 \
  --output models/csi-camerafree-v1

# CSI-only (no Seed)
node scripts/train-camera-free.js \
  --data data/recordings/pretrain-*.csi.jsonl \
  --no-seed \
  --output models/csi-camerafree-v1

# With benchmark
node scripts/train-camera-free.js \
  --data data/recordings/*.csi.jsonl \
  --benchmark

References

  • ruvllm source — v2.5.4
  • ADR-069 — Cognitum Seed CSI Pipeline
  • ADR-070 — Self-Supervised Pretraining Protocol
  • ADR-024 — Contrastive CSI Embedding / AETHER
  • ADR-016 — RuVector Training Pipeline Integration