ADR-004: HNSW Vector Search for Signal Fingerprinting

Status

Partially realized by ADR-024; extended by ADR-027

Note: ADR-024 (AETHER) implements HNSW-compatible fingerprint indices with 4 index types. ADR-027 (MERIDIAN) extends this with domain-disentangled embeddings so fingerprints match across environments, not just within a single room.

Date

2026-02-28

Context

Current Signal Matching Limitations

The WiFi-DensePose system needs to match incoming CSI patterns against known signatures for:

Environment recognition: Identifying which room/area the device is in based on CSI characteristics
Activity classification: Matching current CSI patterns to known human activities (walking, sitting, falling)
Anomaly detection: Determining whether current readings deviate significantly from baseline
Survivor re-identification (MAT module): Tracking individual survivors across scan sessions

Current approach in CSIProcessor._calculate_detection_confidence():

python

# Fixed thresholds, no similarity search
amplitude_indicator = np.mean(features.amplitude_mean) > 0.1
phase_indicator = np.std(features.phase_difference) > 0.05
motion_indicator = motion_score > 0.3
confidence = (0.4 * amplitude_indicator + 0.3 * phase_indicator + 0.3 * motion_indicator)

This is a O(1) fixed-threshold check that:

Cannot learn from past observations
Has no concept of "similar patterns seen before"
Requires manual threshold tuning per environment
Produces binary indicators (above/below threshold) losing gradient information

What HNSW Provides

Hierarchical Navigable Small World (HNSW) graphs enable approximate nearest-neighbor search in high-dimensional vector spaces with:

O(log n) query time vs O(n) brute-force
High recall: >95% recall at 10x speed of exact search
Dynamic insertion: New vectors added without full rebuild
SIMD acceleration: RuVector's implementation uses AVX2/NEON for distance calculations

RuVector extends standard HNSW with:

Hyperbolic HNSW: Search in Poincaré ball space for hierarchy-aware results (e.g., "walking" is closer to "running" than to "sitting" in activity hierarchy)
GNN enhancement: Graph neural networks refine neighbor connections after queries
Tiered compression: 2-32x memory reduction through adaptive quantization

Decision

We will integrate RuVector's HNSW implementation as the primary similarity search engine for all CSI pattern matching operations, replacing fixed-threshold detection with similarity-based retrieval.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    HNSW Search Pipeline                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  CSI Input     Feature           Vector         HNSW            │
│  ────────▶    Extraction   ────▶ Encode   ────▶ Search          │
│              (existing)        (new)          (new)             │
│                                                  │              │
│                                    ┌─────────────┤              │
│                                    ▼             ▼              │
│                              Top-K Results    Confidence        │
│                              [vec_id, dist,   Score from        │
│                               metadata]       Distance Dist.   │
│                                    │                            │
│                                    ▼                            │
│                              ┌────────────┐                     │
│                              │ Decision   │                     │
│                              │ Fusion     │                     │
│                              └────────────┘                     │
│                               Combines HNSW similarity with     │
│                               existing threshold-based logic    │
└─────────────────────────────────────────────────────────────────┘

Index Configuration

rust

/// HNSW configuration tuned for CSI vector characteristics
pub struct CsiHnswConfig {
    /// Vector dimensionality (matches CsiFeatures encoding)
    dim: usize,  // 329 for 64 subcarriers

    /// Maximum number of connections per node per layer
    /// Higher M = better recall, more memory
    /// CSI vectors are moderately dimensional; M=16 balances well
    m: usize,  // 16

    /// Size of dynamic candidate list during construction
    /// ef_construction = 200 gives >99% recall for 329-dim vectors
    ef_construction: usize,  // 200

    /// Size of dynamic candidate list during search
    /// ef_search = 64 gives >95% recall with <1ms latency at 100K vectors
    ef_search: usize,  // 64

    /// Distance metric
    /// Cosine similarity works best for normalized CSI features
    metric: DistanceMetric,  // Cosine

    /// Maximum elements (pre-allocated for performance)
    max_elements: usize,  // 1_000_000

    /// Enable SIMD acceleration
    simd: bool,  // true

    /// Quantization level for memory reduction
    quantization: Quantization,  // PQ8 (product quantization, 8-bit)
}

Multiple Index Strategy

Different use cases require different index configurations:

Index Name	Vectors	Dim	Distance	Use Case
`env_fingerprint`	10K-1M	329	Cosine	Environment/room identification
`activity_pattern`	1K-50K	329	Euclidean	Activity classification
`temporal_pattern`	10K-500K	329	Cosine	Temporal anomaly detection
`survivor_track`	100-10K	329	Cosine	MAT survivor re-identification

Similarity-Based Detection Enhancement

Replace fixed thresholds with distance-based confidence:

rust

/// Enhanced detection using HNSW similarity search
pub struct SimilarityDetector {
    /// HNSW index of known human-present CSI patterns
    human_patterns: HnswIndex,

    /// HNSW index of known empty-room CSI patterns
    empty_patterns: HnswIndex,

    /// Fusion weight between similarity and threshold methods
    fusion_alpha: f64,  // 0.7 = 70% similarity, 30% threshold
}

impl SimilarityDetector {
    /// Detect human presence using similarity search + threshold fusion
    pub fn detect(&self, features: &CsiFeatures) -> DetectionResult {
        let query_vec = features.to_rvf_vector();

        // Search both indices
        let human_neighbors = self.human_patterns.search(&query_vec, k=5);
        let empty_neighbors = self.empty_patterns.search(&query_vec, k=5);

        // Distance-based confidence
        let avg_human_dist = human_neighbors.mean_distance();
        let avg_empty_dist = empty_neighbors.mean_distance();

        // Similarity confidence: how much closer to human patterns vs empty
        let similarity_confidence = avg_empty_dist / (avg_human_dist + avg_empty_dist);

        // Fuse with traditional threshold-based confidence
        let threshold_confidence = self.traditional_threshold_detect(features);
        let fused_confidence = self.fusion_alpha * similarity_confidence
                             + (1.0 - self.fusion_alpha) * threshold_confidence;

        DetectionResult {
            human_detected: fused_confidence > 0.5,
            confidence: fused_confidence,
            similarity_confidence,
            threshold_confidence,
            nearest_human_pattern: human_neighbors[0].metadata.clone(),
            nearest_empty_pattern: empty_neighbors[0].metadata.clone(),
        }
    }
}

Incremental Learning Loop

Every confirmed detection enriches the index:

1. CSI captured → features extracted → vector encoded
2. HNSW search returns top-K neighbors + distances
3. Detection decision made (similarity + threshold fusion)
4. If confirmed (by temporal consistency or ground truth):
   a. Insert vector into appropriate index (human/empty)
   b. GNN layer updates neighbor relationships (ADR-006)
   c. SONA adapts fusion weights (ADR-005)
5. Periodically: prune stale vectors, rebuild index layers

Performance Analysis

Memory requirements (PQ8 quantization):

Vector Count	Raw Size	PQ8 Compressed	HNSW Overhead	Total
10,000	12.9 MB	1.6 MB	2.5 MB	4.1 MB
100,000	129 MB	16 MB	25 MB	41 MB
1,000,000	1.29 GB	160 MB	250 MB	410 MB

Latency expectations (329-dim vectors, ef_search=64):

Vector Count	Brute Force	HNSW	Speedup
10,000	3.2 ms	0.08 ms	40x
100,000	32 ms	0.3 ms	107x
1,000,000	320 ms	0.9 ms	356x

Hyperbolic Extension for Activity Hierarchy

WiFi-sensed activities have natural hierarchy:

                    motion
                   /      \
              locomotion   stationary
              /    \         /    \
         walking  running  sitting  lying
         /    \
      normal  shuffling

Hyperbolic HNSW in Poincaré ball space preserves this hierarchy during search, so a query for "shuffling" returns "walking" before "sitting" even if Euclidean distances are similar.

rust

/// Hyperbolic HNSW for hierarchy-aware activity matching
pub struct HyperbolicActivityIndex {
    index: HnswIndex,
    curvature: f64,  // -1.0 for unit Poincaré ball
}

impl HyperbolicActivityIndex {
    pub fn search(&self, query: &[f32], k: usize) -> Vec<SearchResult> {
        // Uses Poincaré distance: d(u,v) = arcosh(1 + 2||u-v||²/((1-||u||²)(1-||v||²)))
        self.index.search_hyperbolic(query, k, self.curvature)
    }
}

Consequences

Positive

Adaptive detection: System improves with more data; no manual threshold tuning
Sub-millisecond search: HNSW provides <1ms queries even at 1M vectors
Memory efficient: PQ8 reduces storage 8x with <5% recall loss
Hierarchy-aware: Hyperbolic mode respects activity relationships
Incremental: New patterns added without full index rebuild
Explainable: "This detection matched pattern X from room Y at time Z"

Negative

Cold-start problem: Need initial fingerprint data before similarity search is useful
Index maintenance: Periodic pruning and layer rebalancing needed
Approximation: HNSW is approximate; may miss exact nearest neighbor (mitigated by high ef_search)
Memory for indices: HNSW graph structure adds 2.5x overhead on top of vectors

Migration Strategy

Phase 1: Run HNSW search in parallel with existing threshold detection, log both results
Phase 2: A/B test fusion weights (alpha parameter) on labeled data
Phase 3: Gradually increase fusion_alpha from 0.0 (pure threshold) to 0.7 (primarily similarity)
Phase 4: Threshold detection becomes fallback for cold-start/empty-index scenarios

References

HNSW: Efficient and Robust Approximate Nearest Neighbor
Product Quantization for Nearest Neighbor Search
Poincaré Embeddings for Learning Hierarchical Representations
RuVector HNSW Implementation
ADR-003: RVF Cognitive Containers for CSI Data