docs/adr/ADR-024-contrastive-csi-embedding-model.md
| Field | Value |
|---|---|
| Status | Proposed |
| Date | 2026-03-01 |
| Deciders | ruv |
| Codename | AETHER -- Ambient Electromagnetic Topology for Hierarchical Embedding and Recognition |
| Relates to | ADR-004 (HNSW Fingerprinting), ADR-005 (SONA Self-Learning), ADR-006 (GNN-Enhanced CSI), ADR-014 (SOTA Signal Processing), ADR-015 (Public Datasets), ADR-016 (RuVector Integration), ADR-023 (Trained DensePose Pipeline) |
WiFi CSI signals encode a rich manifold of environmental and human information: room geometry via multipath reflections, human body configuration via Fresnel zone perturbations, and temporal dynamics via Doppler-like subcarrier phase shifts. The CsiToPoseTransformer (ADR-023) already learns to decode this manifold into 17-keypoint body poses through cross-attention and GNN message passing, producing intermediate body_part_features of shape [17 x d_model] that implicitly represent the latent CSI state.
These representations are currently task-coupled: they exist only as transient activations during pose regression and are discarded after the xyz_head and conf_head produce keypoint predictions. There is no mechanism to:
The gap between what the transformer internally knows and what the system externally exposes is the central problem AETHER addresses.
The name reflects the historical concept of the luminiferous aether -- the invisible medium through which electromagnetic waves were once theorized to propagate. In our context, WiFi signals propagate through physical space, and AETHER extracts a latent geometric understanding of that space from the signals themselves. The name captures three core ideas:
We evaluated and rejected a generative "RuvLLM" approach. The GOAP analysis:
| Factor | Generative (Autoregressive) | Contrastive (AETHER) |
|---|---|---|
| Domain fit | CSI is 56 continuous floats at 20 Hz -- not a discrete token vocabulary. Autoregressive generation is architecturally mismatched. | Contrastive learning on continuous sensor data is the established SOTA (SimCLR, BYOL, VICReg, CAPC). |
| Model size | Generative transformers need millions of parameters for meaningful sequence modeling. | Reuses existing 28K-param CsiToPoseTransformer + 25K projection head = 53K total. |
| Edge deployment | Cannot run on ESP32 (240 MHz, 520 KB SRAM). | INT8-quantized 53K params = ~53 KB. 10% of ESP32 SRAM. |
| Training data | Requires massive CSI corpus for autoregressive pretraining to converge. | Self-supervised augmentations work with any CSI stream -- even minutes of data. |
| Inference | Autoregressive decoding is sequential; violates 20 Hz real-time constraint. | Single forward pass: <2 ms at INT8. |
| Infrastructure | New model architecture, tokenizer, trainer, quantizer, RVF packaging. | One new module (embedding.rs), one new loss term, one new RVF segment type. |
| Collapse risk | Mode collapse in generation manifests as repetitive outputs. | Embedding collapse is detectable (variance monitoring) and preventable (VICReg regularization). |
| Component | File | Relevant API |
|---|---|---|
| CsiToPoseTransformer | graph_transformer.rs | embed() returns [17 x d_model] body_part_features (already exists) |
| Linear layers | graph_transformer.rs | Linear::new(), flatten_into(), unflatten_from() |
| GnnStack | graph_transformer.rs | 2-layer GCN on COCO skeleton with symmetric normalized adjacency |
| CrossAttention | graph_transformer.rs | 4-head scaled dot-product attention |
| SONA | sona.rs | LoraAdapter, EwcRegularizer, EnvironmentDetector, SonaProfile |
| Trainer | trainer.rs | 6-term composite loss, SGD+momentum, cosine LR, PCK/OKS metrics, checkpointing |
| Sparse Inference | sparse_inference.rs | INT8 symmetric/asymmetric quantization, FP16, neuron profiling, sparse forward |
| RVF Container | rvf_container.rs | Segment-based binary format: VEC, META, QUANT, WITNESS, PROFILE, MANIFEST |
| Dataset Pipeline | dataset.rs | MM-Fi (56 subcarriers, 17 COCO keypoints), Wi-Pose (resampled), unified DataPipeline |
| HNSW Index | ruvector-core | VectorIndex trait: add(), search(), remove(), cosine/L2/dot metrics |
| Micro-HNSW | micro-hnsw-wasm | no_std HNSW for WASM/edge: 16-dim, 32 vectors/core, LIF neurons, STDP |
Recent advances that directly inform AETHER's design:
Add a lightweight projection head that maps the GNN body-part features into a normalized embedding space while preserving the existing pose regression path:
CSI Frame(s) [n_pairs x n_subcarriers]
|
v
csi_embed (Linear 56 -> d_model=64) [EXISTING]
|
v
CrossAttention (Q=keypoint_queries, [EXISTING]
K,V=csi_embed)
|
v
GnnStack (2-layer GCN, COCO skeleton) [EXISTING]
|
+---> body_part_features [17 x 64] [EXISTING, now exposed via embed()]
| |
| v
| GlobalMeanPool --> frame_feature [64] [NEW: mean over 17 keypoints]
| |
| v
| ProjectionHead: [NEW]
| proj_1: Linear(64, 128) + BatchNorm1D(128) + ReLU
| proj_2: Linear(128, 128)
| L2-normalize
| |
| v
| z_csi [128-dim unit vector] [NEW: contrastive embedding]
|
+---> xyz_head (Linear 64->3) + conf_head [EXISTING: pose regression]
--> keypoints [17 x (x,y,z,conf)]
Key design choices:
2-layer MLP with BatchNorm: Following SimCLR v2 findings that a deeper projection head with batch normalization improves downstream task performance. The projection head discards information not useful for the contrastive objective, keeping the backbone representations richer.
128-dim output: Standard in contrastive learning literature (SimCLR, MoCo, CLIP). Large enough for high-recall HNSW search, small enough for edge deployment. L2-normalized to the unit hypersphere for cosine similarity.
BatchNorm1D in projection head: Prevents representation collapse by maintaining feature variance across the batch dimension. Acts as an implicit contrastive mechanism (VICReg insight) -- decorrelates embedding dimensions.
Shared backbone, independent heads: The backbone (csi_embed, cross-attention, GNN) is shared between pose regression and embedding extraction. This enables multi-task training where contrastive and supervised signals co-regularize the backbone.
Given a batch of N CSI windows, each augmented twice to produce 2N views, the InfoNCE loss for positive pair (i, j) is:
L_InfoNCE(i, j) = -log( exp(sim(z_i, z_j) / tau) / sum_{k != i} exp(sim(z_i, z_k) / tau) )
where:
sim(u, v) = u^T v / (||u|| * ||v||) is cosine similarity (= dot product for L2-normalized vectors)tau is the temperature hyperparameter controlling concentrationThe symmetric NT-Xent loss averages over both directions of each positive pair:
L_NT-Xent = (1 / 2N) * sum_{k=1}^{N} [ L_InfoNCE(2k-1, 2k) + L_InfoNCE(2k, 2k-1) ]
Temperature selection: tau = 0.07 (following SimCLR). Lower temperature sharpens the distribution, making the loss more sensitive to hard negatives. We use a learnable temperature initialized to 0.07 with a floor of 0.01.
Pure InfoNCE can collapse when batch sizes are small (common in CSI settings). We add VICReg regularization terms:
L_variance = (1/d) * sum_{j=1}^{d} max(0, gamma - sqrt(Var(z_j) + epsilon))
L_covariance = (1/d) * sum_{i != j} C(z)_{ij}^2
L_AETHER = alpha * L_NT-Xent + beta * L_variance + gamma_cov * L_covariance
where:
Var(z_j) is the variance of embedding dimension j across the batchC(z) is the covariance matrix of embeddings in the batchgamma = 1.0 is the target standard deviation per dimensionepsilon = 1e-4 prevents zero-variance gradientsalpha = 1.0, beta = 25.0, gamma_cov = 1.0 (per VICReg paper)The variance term prevents all embeddings from collapsing to a single point. The covariance term decorrelates dimensions, maximizing information content.
Each augmentation must preserve the identity of the CSI observation (same room, same person, same activity) while varying the irrelevant dimensions (noise, timing, hardware drift). All augmentations are physically motivated by WiFi signal propagation:
| Augmentation | Operation | Physical Motivation | Default Params |
|---|---|---|---|
| Temporal jitter | Shift window start by U(-J, +J) frames | Clock synchronization offset between AP and client | J = 3 frames |
| Subcarrier masking | Zero p_mask fraction of random subcarriers | Frequency-selective fading from narrowband interference | p_mask ~ U(0.05, 0.20) |
| Gaussian noise | Add N(0, sigma) to amplitude | Thermal noise at the receiver front-end | sigma ~ U(0.01, 0.05) |
| Phase rotation | Add U(0, 2*pi) uniform random offset per frame | Local oscillator phase drift and carrier frequency offset | per-frame |
| Amplitude scaling | Multiply by U(s_lo, s_hi) | Path loss variation from distance/obstruction changes | s_lo=0.8, s_hi=1.2 |
| Subcarrier permutation | Randomly swap adjacent subcarrier pairs with probability p_swap | Subcarrier reordering artifacts in different WiFi chipsets | p_swap = 0.1 |
| Temporal crop | Randomly drop p_drop fraction of frames from the window, then interpolate | Packet loss and variable CSI reporting rates | p_drop ~ U(0.0, 0.15) |
Each view applies 2-4 randomly selected augmentations composed sequentially. The composition is sampled per-view, ensuring the two views of the same CSI window differ.
When paired CSI + camera pose data is available (MM-Fi, Wi-Pose), align the CSI embedding space with pose semantics:
z_pose = L2_normalize(PoseEncoder(pose_keypoints_flat))
PoseEncoder: Linear(51, 128) -> ReLU -> Linear(128, 128) [51 = 17 keypoints * 3 coords]
L_cross = (1/N) * sum_{k=1}^{N} [ -log( exp(sim(z_csi_k, z_pose_k) / tau) / sum_{j} exp(sim(z_csi_k, z_pose_j) / tau) ) ]
L_total = L_supervised_pose + lambda_c * L_contrastive + lambda_x * L_cross
This ensures that CSI embeddings of the same pose are close in embedding space, enabling pose retrieval from CSI queries.
Raw CSI Window W (any stream, any environment)
|
+---> Aug_1(W) ---> CsiToPoseTransformer.embed() ---> MeanPool ---> ProjectionHead ---> z_1
| |
| L_AETHER(z_1, z_2)
| |
+---> Aug_2(W) ---> CsiToPoseTransformer.embed() ---> MeanPool ---> ProjectionHead ---> z_2
alignment = E[||z_i - z_j||^2] for positive pairs (should decrease) and uniformity = log(E[exp(-2 * ||z_i - z_j||^2)]) over all pairs (should decrease, indicating uniform distribution on hypersphere)After pretraining, attach xyz_head and conf_head and fine-tune with the existing 6-term composite loss (ADR-023 Phase 4), optionally keeping the contrastive loss as a regularizer:
L_total = L_pose_composite + lambda_c * L_contrastive
lambda_c = 0.1 (contrastive acts as regularizer, not primary objective)
The pretrained backbone starts with representations that already understand CSI spatial structure, typically requiring 3-10x fewer labeled samples for equivalent pose accuracy.
Adds L_cross to align CSI and pose embedding spaces. Only applicable when paired CSI + camera pose data is available (MM-Fi provides this).
The 128-dim L2-normalized z_csi embeddings feed four specialized HNSW indices, each serving a distinct recognition task:
| Index | Source Embedding | Update Frequency | Distance Metric | M | ef_construction | Max Elements | Use Case |
|---|---|---|---|---|---|---|---|
env_fingerprint | Mean of z_csi over 10-second window (200 frames @ 20 Hz) | On environment change detection (SONA drift) | Cosine | 16 | 200 | 10K | Room/zone identification |
activity_pattern | z_csi at activity transition boundaries (detected via embedding velocity) | Per detected activity segment | Cosine | 12 | 150 | 50K | Activity classification |
temporal_baseline | z_csi during calibration period (first 60 seconds) | At deployment / recalibration | Cosine | 16 | 200 | 1K | Anomaly/intrusion detection |
person_track | Per-person z_csi sequences (clustered by embedding trajectory) | Per confirmed detection | Cosine | 16 | 200 | 10K | Re-identification across sessions |
Index operations:
pub trait EmbeddingIndex {
/// Insert an embedding with metadata
fn insert(&mut self, embedding: &[f32; 128], metadata: EmbeddingMetadata) -> VectorId;
/// Search for k nearest neighbors
fn search(&self, query: &[f32; 128], k: usize) -> Vec<(VectorId, f32, EmbeddingMetadata)>;
/// Remove stale entries older than `max_age`
fn prune(&mut self, max_age: std::time::Duration) -> usize;
/// Index statistics
fn stats(&self) -> IndexStats;
}
pub struct EmbeddingMetadata {
pub timestamp: u64,
pub environment_id: Option<String>,
pub person_id: Option<u32>,
pub activity_label: Option<String>,
pub confidence: f32,
pub sona_profile: Option<String>,
}
Anomaly detection uses the temporal_baseline index: compute d = 1 - cosine_sim(z_current, nearest_baseline). If d > threshold_anomaly (default 0.3) for >= n_consecutive frames (default 5), flag as anomaly. This catches intrusions, falls, and environmental changes without any task-specific model.
Each SonaProfile already represents an environment-specific adaptation. AETHER adds a compact environment descriptor:
pub struct SonaProfile {
// ... existing fields ...
/// AETHER: Mean embedding of calibration CSI in this environment.
/// 128 floats = 512 bytes. Used for O(1) environment identification
/// before loading the full LoRA profile.
pub env_embedding: Option<[f32; 128]>,
}
Environment switching workflow:
z_csi for incoming CSIenv_embedding of all known SonaProfiles (128-dim dot product, <1 us each)env_embeddingThis replaces the current EnvironmentDetector statistical drift test with a semantically-aware embedding comparison.
Add a new segment type for embedding model configuration:
/// Embedding model configuration and projection head weights.
/// Segment type: SEG_EMBED = 0x0C
const SEG_EMBED: u8 = 0x0C;
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EmbeddingModelConfig {
/// Backbone feature dimension (input to projection head)
pub d_model: usize, // 64
/// Embedding output dimension
pub d_proj: usize, // 128
/// Whether to L2-normalize the output
pub normalize: bool, // true
/// Pretraining method used
pub pretrain_method: String, // "simclr" | "vicreg" | "capc"
/// Temperature for InfoNCE (if applicable)
pub temperature: f32, // 0.07
/// Augmentations used during pretraining
pub augmentations: Vec<String>,
/// Number of pretraining epochs completed
pub pretrain_epochs: usize,
/// Alignment metric at end of pretraining
pub alignment_score: f32,
/// Uniformity metric at end of pretraining
pub uniformity_score: f32,
}
The projection head weights (25K floats = 100 KB at FP32, 25 KB at INT8) are stored in the existing VEC segment alongside the transformer weights. The RVF manifest distinguishes model types:
{
"model_type": "aether-embedding",
"backbone": "csi-to-pose-transformer",
"embedding_dim": 128,
"pose_capable": true,
"pretrain_method": "simclr+vicreg"
}
Embedding extraction benefits from the same INT8 quantization and sparse neuron pruning. Critical validation: cosine distance ordering must be preserved under quantization.
Rank preservation metric:
rho = SpearmanRank(ranking_fp32, ranking_int8)
where ranking is the order of k-nearest neighbors for a test query. Requirement: rho > 0.95 for k = 10. If rho < 0.95, apply mixed-precision: backbone at INT8, projection head at FP16.
Quantization budget:
| Component | Parameters | FP32 | INT8 | FP16 |
|---|---|---|---|---|
| CsiToPoseTransformer backbone | ~28,000 | 112 KB | 28 KB | 56 KB |
| ProjectionHead (proj_1 + proj_2) | ~24,960 | 100 KB | 25 KB | 50 KB |
| PoseEncoder (cross-modal, optional) | ~7,040 | 28 KB | 7 KB | 14 KB |
| Total (without PoseEncoder) | ~53,000 | 212 KB | 53 KB | 106 KB |
| Total (with PoseEncoder) | ~60,000 | 240 KB | 60 KB | 120 KB |
ESP32 SRAM budget: 520 KB. Model at INT8: 53-60 KB = 10-12% of SRAM. Ample margin for activations, HNSW index, and runtime stack.
All new/modified files in rust-port/wifi-densepose-rs/crates/wifi-densepose-sensing-server/src/:
embedding.rs (NEW, ~450 lines)// ── Core types ──────────────────────────────────────────────────────
/// Configuration for the AETHER embedding system.
pub struct AetherConfig {
pub d_model: usize, // 64 (from TransformerConfig)
pub d_proj: usize, // 128
pub temperature: f32, // 0.07
pub vicreg_alpha: f32, // 1.0 (InfoNCE weight)
pub vicreg_beta: f32, // 25.0 (variance weight)
pub vicreg_gamma: f32, // 1.0 (covariance weight)
pub variance_target: f32, // 1.0
pub n_augmentations: usize, // 2-4 per view
}
/// 2-layer MLP projection head: Linear -> BN -> ReLU -> Linear -> L2-norm.
pub struct ProjectionHead {
proj_1: Linear, // d_model -> d_proj
bn_running_mean: Vec<f32>, // d_proj
bn_running_var: Vec<f32>, // d_proj
bn_gamma: Vec<f32>, // d_proj (learnable scale)
bn_beta: Vec<f32>, // d_proj (learnable shift)
proj_2: Linear, // d_proj -> d_proj
}
impl ProjectionHead {
pub fn new(d_model: usize, d_proj: usize) -> Self;
pub fn forward(&self, x: &[f32]) -> Vec<f32>; // returns L2-normalized
pub fn forward_train(&mut self, batch: &[Vec<f32>]) -> Vec<Vec<f32>>; // updates BN stats
pub fn flatten_into(&self, out: &mut Vec<f32>);
pub fn unflatten_from(data: &[f32], d_model: usize, d_proj: usize) -> (Self, usize);
pub fn param_count(&self) -> usize;
}
/// CSI-specific data augmentation pipeline.
pub struct CsiAugmenter {
rng: Rng64,
config: AugmentConfig,
}
pub struct AugmentConfig {
pub temporal_jitter_frames: usize, // 3
pub mask_ratio_range: (f32, f32), // (0.05, 0.20)
pub noise_sigma_range: (f32, f32), // (0.01, 0.05)
pub scale_range: (f32, f32), // (0.8, 1.2)
pub swap_prob: f32, // 0.1
pub drop_ratio_range: (f32, f32), // (0.0, 0.15)
}
impl CsiAugmenter {
pub fn new(seed: u64) -> Self;
pub fn augment(&mut self, csi_window: &[Vec<f32>]) -> Vec<Vec<f32>>;
}
/// InfoNCE loss with temperature scaling.
pub fn info_nce_loss(embeddings_a: &[Vec<f32>], embeddings_b: &[Vec<f32>], temperature: f32) -> f32;
/// VICReg variance loss: penalizes dimensions with std < target.
pub fn variance_loss(embeddings: &[Vec<f32>], target: f32) -> f32;
/// VICReg covariance loss: penalizes correlated dimensions.
pub fn covariance_loss(embeddings: &[Vec<f32>]) -> f32;
/// Combined AETHER loss = alpha * InfoNCE + beta * variance + gamma * covariance.
pub fn aether_loss(
z_a: &[Vec<f32>], z_b: &[Vec<f32>],
temperature: f32, alpha: f32, beta: f32, gamma: f32, var_target: f32,
) -> AetherLossComponents;
pub struct AetherLossComponents {
pub total: f32,
pub info_nce: f32,
pub variance: f32,
pub covariance: f32,
}
/// Full embedding extraction pipeline.
pub struct EmbeddingExtractor {
transformer: CsiToPoseTransformer,
projection: ProjectionHead,
config: AetherConfig,
}
impl EmbeddingExtractor {
pub fn new(transformer: CsiToPoseTransformer, config: AetherConfig) -> Self;
/// Extract 128-dim L2-normalized embedding from CSI features.
pub fn embed(&self, csi_features: &[Vec<f32>]) -> Vec<f32>;
/// Extract both pose keypoints AND embedding in a single forward pass.
pub fn forward_dual(&self, csi_features: &[Vec<f32>]) -> (PoseOutput, Vec<f32>);
/// Flatten all weights (transformer + projection head).
pub fn flatten_weights(&self) -> Vec<f32>;
/// Unflatten all weights.
pub fn unflatten_weights(&mut self, params: &[f32]) -> Result<(), String>;
/// Total trainable parameters.
pub fn param_count(&self) -> usize;
}
// ── Monitoring ──────────────────────────────────────────────────────
/// Alignment metric: mean L2 distance between positive pair embeddings.
pub fn alignment_metric(z_a: &[Vec<f32>], z_b: &[Vec<f32>]) -> f32;
/// Uniformity metric: log of average pairwise Gaussian kernel.
pub fn uniformity_metric(embeddings: &[Vec<f32>], t: f32) -> f32;
trainer.rs (MODIFICATIONS)// Add to LossComponents:
pub struct LossComponents {
// ... existing 6 terms ...
pub contrastive: f32, // NEW: AETHER contrastive loss
}
// Add to LossWeights:
pub struct LossWeights {
// ... existing 6 weights ...
pub contrastive: f32, // NEW: default 0.0 (disabled), set to 0.1 for joint training
}
// Add to TrainerConfig:
pub struct TrainerConfig {
// ... existing fields ...
pub contrastive_loss_weight: f32, // NEW: 0.0 = no contrastive, 0.1 = regularizer
pub aether_config: Option<AetherConfig>, // NEW: None = no AETHER
}
// New method on Trainer:
impl Trainer {
/// Self-supervised pretraining epoch using AETHER contrastive loss.
/// No pose labels required -- only raw CSI windows.
pub fn pretrain_epoch(
&mut self,
csi_windows: &[Vec<Vec<f32>>],
augmenter: &mut CsiAugmenter,
) -> PretrainEpochStats;
/// Full self-supervised pretraining loop.
pub fn run_pretraining(
&mut self,
csi_windows: &[Vec<Vec<f32>>],
n_epochs: usize,
) -> PretrainResult;
}
pub struct PretrainEpochStats {
pub epoch: usize,
pub loss: f32,
pub info_nce: f32,
pub variance: f32,
pub covariance: f32,
pub alignment: f32,
pub uniformity: f32,
pub lr: f32,
}
pub struct PretrainResult {
pub best_epoch: usize,
pub best_alignment: f32,
pub best_uniformity: f32,
pub history: Vec<PretrainEpochStats>,
pub total_time_secs: f64,
}
rvf_container.rs (MINOR ADDITION)/// Embedding model configuration segment type.
const SEG_EMBED: u8 = 0x0C;
impl RvfBuilder {
/// Add AETHER embedding model configuration.
pub fn add_embedding_config(&mut self, config: &EmbeddingModelConfig) {
let payload = serde_json::to_vec(config).unwrap_or_default();
self.push_segment(SEG_EMBED, &payload);
}
}
impl RvfReader {
/// Parse and return the embedding model config, if present.
pub fn embedding_config(&self) -> Option<EmbeddingModelConfig> {
self.find_segment(SEG_EMBED)
.and_then(|data| serde_json::from_slice(data).ok())
}
}
graph_transformer.rs (NO CHANGES NEEDED)The embed() method already exists and returns [17 x d_model]. No modifications required.
| Component | Params | Breakdown | FP32 | INT8 |
|---|---|---|---|---|
csi_embed | 3,648 | 56*64 + 64 | 14.6 KB | 3.6 KB |
keypoint_queries | 1,088 | 17*64 | 4.4 KB | 1.1 KB |
CrossAttention (4-head) | 16,640 | 4*(64*64+64) | 66.6 KB | 16.6 KB |
GnnStack (2 layers) | 8,320 | 2*(64*64+64) | 33.3 KB | 8.3 KB |
xyz_head | 195 | 64*3 + 3 | 0.8 KB | 0.2 KB |
conf_head | 65 | 64*1 + 1 | 0.3 KB | 0.1 KB |
| Backbone subtotal | 29,956 | 119.8 KB | 29.9 KB | |
proj_1 (Linear) | 8,320 | 64*128 + 128 | 33.3 KB | 8.3 KB |
bn_1 (gamma + beta) | 256 | 128 + 128 | 1.0 KB | 0.3 KB |
proj_2 (Linear) | 16,512 | 128*128 + 128 | 66.0 KB | 16.5 KB |
| ProjectionHead subtotal | 25,088 | 100.4 KB | 25.1 KB | |
| AETHER Total | 55,044 | 220.2 KB | 55.0 KB | |
PoseEncoder (optional) | 7,040 | 51128+128 + 128128+128 | 28.2 KB | 7.0 KB |
| Full system | 62,084 | 248.3 KB | 62.1 KB |
| Metric | Target | Measurement |
|---|---|---|
| Embedding extraction latency (FP32, x86) | < 1 ms | BenchmarkRunner::benchmark_inference() |
| Embedding extraction latency (INT8, ESP32) | < 2 ms | Hardware benchmark at 240 MHz |
| HNSW search latency (10K vectors, k=5) | < 0.5 ms | ruvector-core benchmark suite |
| Self-supervised pretrain convergence | < 200 epochs | Alignment/uniformity plateau detection |
| Room identification accuracy (5 rooms) | > 95% | k-NN on env_fingerprint index |
| Activity classification accuracy (6 activities) | > 85% | k-NN on activity_pattern index |
| Person re-identification mAP (5 subjects) | > 80% | Rank-1 on person_track index |
| Anomaly detection F1 | > 0.90 | Distance threshold on temporal_baseline |
| INT8 rank correlation vs FP32 | > 0.95 | Spearman over 1000 query-neighbor pairs |
| Model size at INT8 | < 65 KB | param_count * 1 byte |
| Training memory overhead | < 50 MB | Peak RSS during pretraining |
micro-hnsw-wasm stores up to 32 reference embeddings per core (256 cores = 8K embeddings)ruvector-core HNSW index in full mode (up to 1M vectors)POST /api/v1/embedding/extract (input: CSI frame, output: 128-dim vector)POST /api/v1/embedding/search (input: 128-dim vector, output: k nearest neighbors)ws://.../embedding/stream (streaming CSI -> streaming embeddings)Files:
embedding.rs (NEW): ProjectionHead, CsiAugmenter, EmbeddingExtractor, loss functions, metricsrvf_container.rs (MODIFY): Add SEG_EMBED, add_embedding_config(), embedding_config()lib.rs (MODIFY): Add pub mod embedding;Deliverables:
ProjectionHead with forward(), forward_train(), flatten_into(), unflatten_from()CsiAugmenter with all 7 augmentation strategiesinfo_nce_loss(), variance_loss(), covariance_loss(), aether_loss()EmbeddingExtractor with embed() and forward_dual()alignment_metric() and uniformity_metric()Files:
trainer.rs (MODIFY): Add pretrain_epoch(), run_pretraining(), contrastive loss to compositeembedding.rs (EXTEND): Add PretrainEpochStats, PretrainResultDeliverables:
Trainer::pretrain_epoch() running SimCLR+VICReg on raw CSI windowsTrainer::run_pretraining() full loop with monitoringLossComponents and LossWeightstrainer.rsFiles:
embedding.rs (EXTEND): Add EmbeddingIndex trait, EmbeddingMetadata, index managementmain.rs or new api_embedding.rs (MODIFY/NEW): REST endpoints for embedding searchDeliverables:
/api/v1/embedding/extract, /api/v1/embedding/searchEnvironmentDetectorFiles:
embedding.rs (EXTEND): Add PoseEncoder, cross_modal_loss()Deliverables:
PoseEncoder: Linear(51 -> 128) -> ReLU -> Linear(128 -> 128) -> L2-normFiles:
sparse_inference.rs (EXTEND): Add SpearmanRankCorrelation, embedding-specific quantization testsrvf_pipeline.rs (MODIFY): Package AETHER model into RVF with SEG_EMBEDDeliverables:
Deliverables:
Total estimated effort: 8-12 days
SonaProfile for environment identification. 900x compression with better discriminative power.| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Augmentations produce collapsed embeddings (all vectors identical) | Medium | High | VICReg variance term (beta=25) with per-dimension variance monitoring. Alert if Var(z_j) < 0.1 for any j. Switch to BYOL (stop-gradient) if collapse persists. |
| INT8 quantization degrades HNSW recall below 90% | Low | Medium | Spearman rho > 0.95 gate. Mixed-precision fallback: INT8 backbone + FP16 projection head (+25 KB). |
| Contrastive pretraining does not improve downstream pose accuracy | Low | Low | Pretraining is optional. Supervised-only training (ADR-023) remains the fallback path. Even if pose accuracy is unchanged, embeddings still enable fingerprinting/search. |
| Cross-modal alignment requires too much paired data for convergence | Medium | Low | Phase C is optional. Self-supervised CSI-only pretraining (Phase A) is the primary path. Cross-modal alignment is an enhancement, not a requirement. |
| Projection head overfits to pretraining augmentations | Low | Medium | Freeze projection head during supervised fine-tuning (only fine-tune backbone + pose heads). Alternatively, use stop-gradient on the projection head during joint training. |
| Embedding space is not discriminative enough for person re-identification | Medium | Medium | WhoFi (2025) demonstrates 95.5% accuracy with transformer CSI encoding. Our architecture is comparable. If insufficient, add a supervised contrastive loss with person labels during fine-tuning. |
embedding.rs)#[cfg(test)]
mod tests {
// ProjectionHead
fn projection_head_output_is_128_dim();
fn projection_head_output_is_l2_normalized();
fn projection_head_zero_input_does_not_nan();
fn projection_head_flatten_unflatten_roundtrip();
fn projection_head_param_count_correct();
// CsiAugmenter
fn augmenter_output_same_shape_as_input();
fn augmenter_two_views_differ();
fn augmenter_deterministic_with_same_seed();
fn temporal_jitter_shifts_window();
fn subcarrier_masking_zeros_expected_fraction();
fn gaussian_noise_changes_values();
fn amplitude_scaling_within_range();
// Loss functions
fn info_nce_zero_for_identical_embeddings();
fn info_nce_positive_for_different_embeddings();
fn info_nce_decreases_with_closer_positives();
fn variance_loss_zero_when_variance_at_target();
fn variance_loss_positive_when_variance_below_target();
fn covariance_loss_zero_for_uncorrelated_dims();
fn aether_loss_finite_for_random_embeddings();
// Metrics
fn alignment_zero_for_identical_pairs();
fn uniformity_decreases_with_uniform_distribution();
// EmbeddingExtractor
fn extractor_embed_output_shape();
fn extractor_dual_forward_produces_both_outputs();
fn extractor_flatten_unflatten_preserves_output();
}
#[cfg(test)]
mod integration_tests {
// Pretraining
fn pretrain_5_epochs_alignment_improves();
fn pretrain_loss_is_finite_throughout();
fn pretrain_embeddings_not_collapsed(); // variance > 0.5 per dim
// Joint training
fn joint_train_contrastive_plus_pose_loss_finite();
fn joint_train_pose_accuracy_not_degraded();
// RVF
fn rvf_embed_config_round_trip();
fn rvf_full_aether_model_package();
// Quantization
fn int8_embedding_rank_correlation_above_095();
fn fp16_embedding_rank_correlation_above_099();
}
Status: Required (promoted from Future Work after capability audit)
The RuVector v2.0.4 vendor crates provide 50+ attention mechanisms, contrastive losses, and optimization tools that Phases 1-6 do not use (0% utilization). Phase 7 integrates the highest-impact capabilities directly into the embedding pipeline.
Integrate sona.rs::LoraAdapter into ProjectionHead for environment-adaptive embedding projection with minimal parameters:
pub struct ProjectionHead {
proj_1: Linear, // base weights (frozen after pretraining)
proj_1_lora: Option<LoraAdapter>, // rank-4 environment delta (NEW)
// ... bn fields ...
proj_2: Linear, // base weights (frozen)
proj_2_lora: Option<LoraAdapter>, // rank-4 environment delta (NEW)
}
Parameter budget per environment:
proj_1_lora: rank 4 * (64 + 128) = 768 paramsproj_2_lora: rank 4 * (128 + 128) = 1,024 paramsMethods to add:
ProjectionHead::with_lora(rank: usize) — constructor with LoRA adaptersProjectionHead::forward() modified: out = base_out + lora.forward(input) when adapters presentProjectionHead::merge_lora() / unmerge_lora() — for fast environment switchingProjectionHead::freeze_base() — freeze base weights, train only LoRAProjectionHead::lora_params() -> Vec<f32> — flatten only LoRA weights for checkpointEnvironment switching workflow:
z_csi for incoming CSIenv_embedding of all known profiles (128-dim dot product, <1us)unmerge_lora(old) then merge_lora(new)Effort: ~120 lines in embedding.rs
Apply sona.rs::EwcRegularizer to prevent catastrophic forgetting of contrastive structure during supervised fine-tuning:
Phase A (pretrain): Train backbone + projection with InfoNCE + VICReg
↓
Consolidation: fisher = EwcRegularizer::compute_fisher(pretrained_params, contrastive_loss)
ewc.consolidate(pretrained_params)
↓
Phase B (finetune): L_total = L_pose + lambda * ewc.penalty(current_params)
grad += ewc.penalty_gradient(current_params)
Implementation:
embedding_ewc: Option<EwcRegularizer> field to Trainerrun_pretraining() completes, call ewc.compute_fisher() on contrastive loss surfacetrain_epoch(), add ewc.penalty(current_params) to total lossewc.penalty_gradient(current_params) to gradient computationEffort: ~80 lines in trainer.rs
Wire sona.rs::EnvironmentDetector into EmbeddingExtractor for real-time drift awareness:
pub struct EmbeddingExtractor {
transformer: CsiToPoseTransformer,
projection: ProjectionHead,
config: AetherConfig,
drift_detector: EnvironmentDetector, // NEW
}
Behavior:
extract() calls drift_detector.update(csi_mean, csi_var) on each framedrift_detected() returns true:
anomalous: true in FingerprintIndexDriftInfo exposed via REST: GET /api/v1/embedding/driftEffort: ~60 lines across embedding.rs
Add hard-negative mining to the contrastive loss for more efficient training:
pub struct HardNegativeMiner {
pub ratio: f32, // 0.5 = use top 50% hardest negatives
pub warmup_epochs: usize, // 5 = use all negatives for first 5 epochs
}
impl HardNegativeMiner {
/// Select top-K hardest negatives from similarity matrix.
/// Hard negatives are non-matching pairs with highest cosine similarity
/// (i.e., the model is most confused about them).
pub fn mine(&self, sim_matrix: &[Vec<f32>], epoch: usize) -> Vec<(usize, usize)>;
}
Modify info_nce_loss() to accept optional miner:
warmup_epochs: use all negatives (standard InfoNCE)ratio hardest negatives per anchorEffort: ~80 lines in embedding.rs
Extend RVF container to store embedding model config AND per-environment LoRA deltas:
pub const SEG_EMBED: u8 = 0x0C;
pub const SEG_LORA: u8 = 0x0D; // NEW: LoRA weight deltas
pub struct EmbeddingModelConfig {
pub d_model: usize,
pub d_proj: usize,
pub normalize: bool,
pub pretrain_method: String,
pub temperature: f32,
pub augmentations: Vec<String>,
pub lora_rank: Option<usize>, // Some(4) if MicroLoRA enabled
pub ewc_lambda: Option<f32>, // Some(5000.0) if EWC active
pub hard_negative_ratio: Option<f32>,
}
impl RvfBuilder {
pub fn add_embedding_config(&mut self, config: &EmbeddingModelConfig);
pub fn add_lora_profile(&mut self, name: &str, lora_weights: &[f32]);
}
impl RvfReader {
pub fn embedding_config(&self) -> Option<EmbeddingModelConfig>;
pub fn lora_profile(&self, name: &str) -> Option<Vec<f32>>;
pub fn lora_profiles(&self) -> Vec<String>; // list all stored profiles
}
Effort: ~100 lines in rvf_container.rs
| Sub-phase | What | New Params | Lines |
|---|---|---|---|
| 7.1 MicroLoRA on ProjectionHead | Environment-specific embeddings | 1,792/env | ~120 |
| 7.2 EWC++ consolidation | Pretrain→finetune memory preservation | 0 (regularizer) | ~80 |
| 7.3 EnvironmentDetector integration | Drift-aware embedding extraction | 0 | ~60 |
| 7.4 Hard-negative mining | More efficient contrastive training | 0 | ~80 |
| 7.5 RVF SEG_EMBED + SEG_LORA | Full model + LoRA profile packaging | 0 | ~100 |
| Total | 1,792/env | ~440 |
ruvector-hyperbolic-hnsw crate to embed activities in Poincare ball space, capturing the natural hierarchy (locomotion > walking > shuffling).MoEAttention for routing CSI frames to specialized embedding experts, HyperbolicAttention for hierarchical CSI structure, and SheafAttention for early-exit during embedding extraction.