docs/research/sota-2026-05-22/R3_2-embedding-level-physics-env.md
Status: corrected architecture matches labelled oracle (with zero labels), but synthetic AETHER stand-in is too weak to reach 80%+ · 2026-05-22
R3.1 NEGATIVE showed that physics-informed env subtraction at raw-CSI level fails because within-room position variance dominates. R3.1's corrected sketch:
raw CSI → AETHER embedding (position-invariant) → physics-informed env subtraction → K-NN
This tick implements the corrected architecture. The question: does moving the operation from raw CSI to the embedding level actually close the cross-room gap?
Same 2-room setup as R3.1 (5×5 + 4×6 m rooms, 10 subjects with body-size variation 0.85-1.15×, 3 positions per room). AETHER is simulated by per-subject-per-room mean across positions — a position-invariant signature. (Real AETHER does this via contrastive learning; mean-pooling is a soft approximation.) Four cross-room K-NN approaches benchmarked.
| Approach | Cross-room 1-shot K-NN |
|---|---|
| Within-room AETHER (sanity check) | 100% |
| Cross-room AETHER raw (no env subtraction) | 10% (= chance) |
| Cross-room AETHER + labelled MERIDIAN (oracle) | 20% (2× chance) |
| Cross-room AETHER + physics-informed env (no labels) | 10% (= chance) |
| Cross-room AETHER + physics + residual correction | 20% (2× chance) |
| Chance | 10% |
The architecturally-correct approach (physics + residual correction) MATCHES the labelled MERIDIAN oracle with ZERO labels. That's the meaningful positive finding: the corrected architecture works, just at the same level as the labelled oracle.
But the labelled oracle is itself only 2× chance. Neither approach reaches the 80%+ target from R3 tick 12. Why?
In R3 tick 12, AETHER was simulated as 128-dim Gaussian embeddings with strong per-subject signal direction. There, MERIDIAN reached 100%. In R3.2, AETHER is simulated as mean-pooling of complex-52 CSI signatures across 3 positions, with the per-subject signal coming from 30% body-size variation alone.
The per-subject signal in R3.2's setup is much weaker than R3 tick 12's. The cross-room MERIDIAN can only do 20% because the per-subject signature itself doesn't dominate the residual noise floor.
R3.2 is the third explicit "this synthetic experiment is too weak to demonstrate the production claim" finding:
| Tick | Finding | Production implication |
|---|---|---|
| R3.1 | Physics-informed at raw level fails (architecture error) | Apply at embedding level (R3.1 → R3.2) |
| R6.2.2.1 | 2D N=5 knee doesn't hold in 3D | Use chest zones + bump N (R6.2.2.1 → R6.2.4) |
| R3.2 (this) | Mean-pooling AETHER too weak; can't reach 80%+ | Need real AETHER (contrastive); structural validation only |
All three "honest scope" findings are productive: they don't kill the architectural sketch, they identify the gap that production work must fill.
Replace the mean-pooling AETHER stand-in with a contrastive-learning head (ADR-024). Train on MM-Fi or similar dataset; freeze the AETHER head; run the R3.2 protocol again with real embeddings. Expected result: if the architecture is correct, cross-room K-NN should hit 70-90%+ (real AETHER's per-subject signal is much stronger than 30% body-size variation).
This experiment needs ~1-2 days of training work + a real AETHER checkpoint. Out of scope for this 12-hour synthetic loop.
R3 (tick 12) — synthetic embedding-space, claimed 100% with MERIDIAN R3.1 — raw-CSI level fails, identifies architecture error R3.2 — embedding-level physics-informed structurally validated; empirical performance bounded by synthetic AETHER weakness
The arc has produced: