Back to Ruview

ADR-004: HNSW Vector Search for Signal Fingerprinting

docs/adr/ADR-004-hnsw-vector-search-fingerprinting.md

0.7.011.8 KB
Original Source

ADR-004: HNSW Vector Search for Signal Fingerprinting

Status

Partially realized by ADR-024; extended by ADR-027

Note: ADR-024 (AETHER) implements HNSW-compatible fingerprint indices with 4 index types. ADR-027 (MERIDIAN) extends this with domain-disentangled embeddings so fingerprints match across environments, not just within a single room.

Date

2026-02-28

Context

Current Signal Matching Limitations

The WiFi-DensePose system needs to match incoming CSI patterns against known signatures for:

  1. Environment recognition: Identifying which room/area the device is in based on CSI characteristics
  2. Activity classification: Matching current CSI patterns to known human activities (walking, sitting, falling)
  3. Anomaly detection: Determining whether current readings deviate significantly from baseline
  4. Survivor re-identification (MAT module): Tracking individual survivors across scan sessions

Current approach in CSIProcessor._calculate_detection_confidence():

python
# Fixed thresholds, no similarity search
amplitude_indicator = np.mean(features.amplitude_mean) > 0.1
phase_indicator = np.std(features.phase_difference) > 0.05
motion_indicator = motion_score > 0.3
confidence = (0.4 * amplitude_indicator + 0.3 * phase_indicator + 0.3 * motion_indicator)

This is a O(1) fixed-threshold check that:

  • Cannot learn from past observations
  • Has no concept of "similar patterns seen before"
  • Requires manual threshold tuning per environment
  • Produces binary indicators (above/below threshold) losing gradient information

What HNSW Provides

Hierarchical Navigable Small World (HNSW) graphs enable approximate nearest-neighbor search in high-dimensional vector spaces with:

  • O(log n) query time vs O(n) brute-force
  • High recall: >95% recall at 10x speed of exact search
  • Dynamic insertion: New vectors added without full rebuild
  • SIMD acceleration: RuVector's implementation uses AVX2/NEON for distance calculations

RuVector extends standard HNSW with:

  • Hyperbolic HNSW: Search in Poincaré ball space for hierarchy-aware results (e.g., "walking" is closer to "running" than to "sitting" in activity hierarchy)
  • GNN enhancement: Graph neural networks refine neighbor connections after queries
  • Tiered compression: 2-32x memory reduction through adaptive quantization

Decision

We will integrate RuVector's HNSW implementation as the primary similarity search engine for all CSI pattern matching operations, replacing fixed-threshold detection with similarity-based retrieval.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    HNSW Search Pipeline                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  CSI Input     Feature           Vector         HNSW            │
│  ────────▶    Extraction   ────▶ Encode   ────▶ Search          │
│              (existing)        (new)          (new)             │
│                                                  │              │
│                                    ┌─────────────┤              │
│                                    ▼             ▼              │
│                              Top-K Results    Confidence        │
│                              [vec_id, dist,   Score from        │
│                               metadata]       Distance Dist.   │
│                                    │                            │
│                                    ▼                            │
│                              ┌────────────┐                     │
│                              │ Decision   │                     │
│                              │ Fusion     │                     │
│                              └────────────┘                     │
│                               Combines HNSW similarity with     │
│                               existing threshold-based logic    │
└─────────────────────────────────────────────────────────────────┘

Index Configuration

rust
/// HNSW configuration tuned for CSI vector characteristics
pub struct CsiHnswConfig {
    /// Vector dimensionality (matches CsiFeatures encoding)
    dim: usize,  // 329 for 64 subcarriers

    /// Maximum number of connections per node per layer
    /// Higher M = better recall, more memory
    /// CSI vectors are moderately dimensional; M=16 balances well
    m: usize,  // 16

    /// Size of dynamic candidate list during construction
    /// ef_construction = 200 gives >99% recall for 329-dim vectors
    ef_construction: usize,  // 200

    /// Size of dynamic candidate list during search
    /// ef_search = 64 gives >95% recall with <1ms latency at 100K vectors
    ef_search: usize,  // 64

    /// Distance metric
    /// Cosine similarity works best for normalized CSI features
    metric: DistanceMetric,  // Cosine

    /// Maximum elements (pre-allocated for performance)
    max_elements: usize,  // 1_000_000

    /// Enable SIMD acceleration
    simd: bool,  // true

    /// Quantization level for memory reduction
    quantization: Quantization,  // PQ8 (product quantization, 8-bit)
}

Multiple Index Strategy

Different use cases require different index configurations:

Index NameVectorsDimDistanceUse Case
env_fingerprint10K-1M329CosineEnvironment/room identification
activity_pattern1K-50K329EuclideanActivity classification
temporal_pattern10K-500K329CosineTemporal anomaly detection
survivor_track100-10K329CosineMAT survivor re-identification

Similarity-Based Detection Enhancement

Replace fixed thresholds with distance-based confidence:

rust
/// Enhanced detection using HNSW similarity search
pub struct SimilarityDetector {
    /// HNSW index of known human-present CSI patterns
    human_patterns: HnswIndex,

    /// HNSW index of known empty-room CSI patterns
    empty_patterns: HnswIndex,

    /// Fusion weight between similarity and threshold methods
    fusion_alpha: f64,  // 0.7 = 70% similarity, 30% threshold
}

impl SimilarityDetector {
    /// Detect human presence using similarity search + threshold fusion
    pub fn detect(&self, features: &CsiFeatures) -> DetectionResult {
        let query_vec = features.to_rvf_vector();

        // Search both indices
        let human_neighbors = self.human_patterns.search(&query_vec, k=5);
        let empty_neighbors = self.empty_patterns.search(&query_vec, k=5);

        // Distance-based confidence
        let avg_human_dist = human_neighbors.mean_distance();
        let avg_empty_dist = empty_neighbors.mean_distance();

        // Similarity confidence: how much closer to human patterns vs empty
        let similarity_confidence = avg_empty_dist / (avg_human_dist + avg_empty_dist);

        // Fuse with traditional threshold-based confidence
        let threshold_confidence = self.traditional_threshold_detect(features);
        let fused_confidence = self.fusion_alpha * similarity_confidence
                             + (1.0 - self.fusion_alpha) * threshold_confidence;

        DetectionResult {
            human_detected: fused_confidence > 0.5,
            confidence: fused_confidence,
            similarity_confidence,
            threshold_confidence,
            nearest_human_pattern: human_neighbors[0].metadata.clone(),
            nearest_empty_pattern: empty_neighbors[0].metadata.clone(),
        }
    }
}

Incremental Learning Loop

Every confirmed detection enriches the index:

1. CSI captured → features extracted → vector encoded
2. HNSW search returns top-K neighbors + distances
3. Detection decision made (similarity + threshold fusion)
4. If confirmed (by temporal consistency or ground truth):
   a. Insert vector into appropriate index (human/empty)
   b. GNN layer updates neighbor relationships (ADR-006)
   c. SONA adapts fusion weights (ADR-005)
5. Periodically: prune stale vectors, rebuild index layers

Performance Analysis

Memory requirements (PQ8 quantization):

Vector CountRaw SizePQ8 CompressedHNSW OverheadTotal
10,00012.9 MB1.6 MB2.5 MB4.1 MB
100,000129 MB16 MB25 MB41 MB
1,000,0001.29 GB160 MB250 MB410 MB

Latency expectations (329-dim vectors, ef_search=64):

Vector CountBrute ForceHNSWSpeedup
10,0003.2 ms0.08 ms40x
100,00032 ms0.3 ms107x
1,000,000320 ms0.9 ms356x

Hyperbolic Extension for Activity Hierarchy

WiFi-sensed activities have natural hierarchy:

                    motion
                   /      \
              locomotion   stationary
              /    \         /    \
         walking  running  sitting  lying
         /    \
      normal  shuffling

Hyperbolic HNSW in Poincaré ball space preserves this hierarchy during search, so a query for "shuffling" returns "walking" before "sitting" even if Euclidean distances are similar.

rust
/// Hyperbolic HNSW for hierarchy-aware activity matching
pub struct HyperbolicActivityIndex {
    index: HnswIndex,
    curvature: f64,  // -1.0 for unit Poincaré ball
}

impl HyperbolicActivityIndex {
    pub fn search(&self, query: &[f32], k: usize) -> Vec<SearchResult> {
        // Uses Poincaré distance: d(u,v) = arcosh(1 + 2||u-v||²/((1-||u||²)(1-||v||²)))
        self.index.search_hyperbolic(query, k, self.curvature)
    }
}

Consequences

Positive

  • Adaptive detection: System improves with more data; no manual threshold tuning
  • Sub-millisecond search: HNSW provides <1ms queries even at 1M vectors
  • Memory efficient: PQ8 reduces storage 8x with <5% recall loss
  • Hierarchy-aware: Hyperbolic mode respects activity relationships
  • Incremental: New patterns added without full index rebuild
  • Explainable: "This detection matched pattern X from room Y at time Z"

Negative

  • Cold-start problem: Need initial fingerprint data before similarity search is useful
  • Index maintenance: Periodic pruning and layer rebalancing needed
  • Approximation: HNSW is approximate; may miss exact nearest neighbor (mitigated by high ef_search)
  • Memory for indices: HNSW graph structure adds 2.5x overhead on top of vectors

Migration Strategy

  1. Phase 1: Run HNSW search in parallel with existing threshold detection, log both results
  2. Phase 2: A/B test fusion weights (alpha parameter) on labeled data
  3. Phase 3: Gradually increase fusion_alpha from 0.0 (pure threshold) to 0.7 (primarily similarity)
  4. Phase 4: Threshold detection becomes fallback for cold-start/empty-index scenarios

References