Back to Ruview

QE Queen Summary Report -- wifi-densepose

docs/qe-reports/00-qe-queen-summary.md

1.99.0-pip20.2 KB
Original Source

QE Queen Summary Report -- wifi-densepose

Date: 2026-04-05
Fleet ID: fleet-02558e91
Orchestrator: QE Queen Coordinator (ADR-001)
Domains Activated: test-generation, coverage-analysis, quality-assessment, security-compliance, defect-intelligence


1. Project Scope and Quality Posture Overview

1.1 Codebase Dimensions

Language / LayerFilesLines of CodePurpose
Rust (.rs)379153,139Core workspace -- 19 crates (16 in workspace, 3 excluded/auxiliary)
Python (.py)10538,656v1 implementation -- API, services, sensing, hardware, middleware
C/H (firmware)489,445ESP32 CSI node firmware -- collectors, OTA, WASM runtime
TypeScript/TSX (mobile)487,571React Native mobile app -- screens, stores, services
JavaScript (UI)~11725,798Web observatory UI, components, utilities
Markdown (docs)~79+70,53979 ADRs, user guides, research, witness logs
Total~776~305,148

1.2 Architecture Summary

The project implements WiFi-based human pose estimation using Channel State Information (CSI). It is structured as a multi-language, multi-platform system:

  • Rust workspace (v0.3.0): 16 crates in workspace plus wifi-densepose-wasm-edge (excluded for wasm32 target) and ruv-neural (auxiliary). Covers signal processing (RuvSense with 14 modules), neural inference (ONNX/PyTorch/Candle), mass casualty assessment (MAT), cross-viewpoint fusion (RuVector v2.0.4), hardware TDM protocol, and web APIs.
  • Python v1: Original implementation with 12 source modules covering API endpoints, CSI extraction, pose services, sensing, database, and middleware.
  • ESP32 firmware: C code for real WiFi CSI collection, edge processing, OTA updates, mmWave sensor integration, WASM runtime, and swarm bridging.
  • Mobile UI: React Native app with pose visualization, MAT screens, vitals monitoring, and RSSI scanning.
  • Web observatory: Three.js-based visualization for RF sensing, phase constellations, and subcarrier manifolds.

1.3 Governance and Process Maturity

IndicatorStatusDetails
Architecture Decision RecordsStrong79 ADRs documented in docs/adr/
CI/CD pipelinesStrong8 GitHub Actions workflows (CI, CD, security scan, firmware CI, QEMU, desktop release, verify pipeline, submodules)
Security scanningStrongDedicated security-scan.yml with Bandit, Semgrep, Safety; runs daily on schedule
Deterministic verificationStrongSHA-256 proof pipeline (archive/v1/data/proof/verify.py) with witness bundles (ADR-028)
Code formattingModerateBlack/Flake8 enforced for Python in CI; no rustfmt.toml found for Rust
Type checkingModerateMyPy configured in CI for Python; Rust has native type safety
Dependency managementStrongWorkspace-level Cargo.toml with pinned versions; requirements.txt for Python

2. Test Pyramid Health

2.1 Overall Test Inventory

Test LayerRustPythonMobile (TS)Firmware (C)Total
Unit tests2,618 #[test]322 functions / 15 files202 test cases / 25 files03,142
Integration tests16 files / 7 crates132 functions / 11 files00148+ functions
E2E tests08 functions / 1 file008 functions
Performance tests026 functions / 2 files0026 functions
Fuzz tests0003 files (harnesses)3 harnesses
Subtotal~2,634~488~2023~3,327

2.2 Test Pyramid Shape Analysis

Ideal Pyramid          Actual Shape          Assessment
                                            
    /\                    /\                  
   /E2E\                / 8 \                E2E: CRITICALLY THIN
  /------\              /----\               
 / Integ. \            / 148  \              Integration: THIN
/----------\          /--------\             
/   Unit    \        /  3,142   \            Unit: HEALTHY base
--------------      --------------          

Pyramid Ratio (unit : integration : e2e):

  • Actual: 394 : 19 : 1
  • Healthy target: 70 : 20 : 10 (percentage)
  • Actual percentage: 95.3% : 4.5% : 0.2%

Verdict: The pyramid is severely bottom-heavy. Unit tests are plentiful (good), but integration and E2E layers are dangerously thin relative to the project's complexity. For a multi-crate, multi-service system with hardware integration, the integration layer should be 3-4x larger, and E2E should be 10-20x larger.

2.3 Rust Test Distribution by Crate

CrateSource LinesTest CountTests per 1K LOCIntegration TestsAssessment
wifi-densepose-wasm-edge28,88864322.33 filesGood
wifi-densepose-signal16,19437022.81 fileGood
ruv-neural~558 (test-only)364N/A1 fileTest-only crate
wifi-densepose-train10,56229928.36 filesStrong
wifi-densepose-sensing-server17,82527415.43 filesModerate
wifi-densepose-mat19,5721598.11 fileNeeds improvement
wifi-densepose-wifiscan5,77915026.00Unit only
wifi-densepose-hardware4,00510626.50Unit only
wifi-densepose-ruvector4,62910622.90Unit only
wifi-densepose-vitals1,8635227.90Unit only
wifi-densepose-desktop3,3093911.81 fileThin
wifi-densepose-core2,5962810.80Thin for core crate
wifi-densepose-nn2,959237.80Needs improvement
wifi-densepose-cli1,31753.80Critically thin
wifi-densepose-wasm1,80500.00ZERO tests
wifi-densepose-api1 (stub)0N/A0Stub only
wifi-densepose-config1 (stub)0N/A0Stub only
wifi-densepose-db1 (stub)0N/A0Stub only

2.4 Python Test Coverage by Module

Source ModuleSource LinesHas Unit TestsHas Integration TestsAssessment
api (13 files)3,694NoYes (test_api_endpoints, test_rate_limiting)Partial
services (7 files)3,038NoYes (test_inference_pipeline)Partial
sensing (6 files)2,117Yes (test_sensing)Yes (test_streaming_pipeline)Moderate
tasks (3 files)1,977NoNoZERO coverage
middleware (4 files)1,798NoNoZERO coverage
database (5 files)1,715NoNoZERO coverage
commands (3 files)1,161NoNoZERO coverage
core (4 files)1,117No (tests focus on CSI extractor from hardware/)NoZERO coverage
config (3 files)923NoNoZERO coverage
hardware (3 files)755Yes (test_csi_extractor, test_esp32_binary_parser)Yes (test_hardware_integration)Good
models (3 files)578NoNoZERO coverage
testing (3 files)500NoNoZERO coverage

Key finding: Python unit tests concentrate heavily on CSI extraction and processing (the hardware layer). 11 of 12 source modules have zero dedicated unit test files. The 322 unit test functions map almost entirely to hardware/csi_extractor.py and related signal processing code.

2.5 Mobile UI Test Coverage

The mobile UI has 25 test files with 202 test cases, covering:

  • Stores: poseStore (21), matStore (18), settingsStore (13) -- good state management coverage
  • Components: SignalBar, GaugeArc, ConnectionBanner, SparklineChart, OccupancyGrid, StatusDot, HudOverlay -- 7 components tested
  • Hooks: useServerReachability, useRssiScanner, usePoseStream -- 3 hooks tested
  • Services: api (14), ws (7), simulation (10), rssi (6) -- good service layer coverage
  • Screens: MAT (4), Live (4), Vitals (5), Zones (6), Settings (6) -- all main screens tested
  • Utils: ringBuffer (20), urlValidator (13), colorMap (9) -- thorough utility testing

Assessment: Mobile testing is the strongest layer relative to its codebase size. Good breadth across stores, components, services, and screens.

2.6 Firmware Test Coverage

Test TypeCountCoverage
Fuzz harnesses3fuzz_csi_serialize.c, fuzz_edge_enqueue.c, fuzz_nvs_config.c
Unit tests0No structured unit testing framework
Integration tests0No automated hardware-in-the-loop tests

Assessment: The firmware has fuzz testing (a positive for security-critical embedded code), but lacks structured unit tests. The 9,445 lines of C code for a safety-relevant embedded system (disaster survivor detection via MAT) warrant stronger test coverage.


3. Cross-Cutting Quality Concerns

3.1 Code Complexity and Maintainability

MetricValueThresholdStatus
AQE quality score37/100>70FAIL
Cyclomatic complexity (avg)24.09<15FAIL
Maintainability index24.35>50FAIL
Security score85/100>80PASS

Large file risk (>500 lines in Rust src/):

FileLinesRisk
sensing-server/src/main.rs4,846Monolith risk -- nearly 10x the 500-line guideline
sensing-server/src/training_api.rs1,946High complexity
wasm/src/mat.rs1,673Hard to test, 0 tests in crate
train/src/metrics.rs1,664Complex math, needs exhaustive testing
signal/src/ruvsense/pose_tracker.rs1,523Critical path, well-tested
mat/src/integration/csi_receiver.rs1,401Integration boundary
mat/src/integration/hardware_adapter.rs1,360Hardware boundary, audit needed

24 Rust source files exceed 500 lines, violating the project's own CLAUDE.md guideline.

3.2 Error Handling Quality (Rust)

PatternCountAssessment
Result<> returns450Good -- idiomatic error handling in use
.unwrap() calls720HIGH RISK -- 720 potential panic points in production code
.expect() calls35Acceptable -- provides context on failure
panic!() calls1Good -- minimal explicit panics
unsafe blocks340NEEDS AUDIT -- high count for an application-level project

Critical concern: The 720 .unwrap() calls represent potential runtime panics. In a system processing real-time WiFi CSI data for pose estimation (and mass casualty assessment), an unwrap failure could crash the entire pipeline. Each call should be reviewed and converted to proper error propagation with ? operator or explicit error handling.

The 340 unsafe blocks are high for a project that is not a systems-level library. These need a focused audit to verify memory safety invariants are upheld, especially in signal processing and hardware interaction code.

3.3 Security Posture

CheckResultDetails
Hardcoded secrets in Python0 foundClean
SQL injection risk (f-string SQL)0 foundClean -- likely using parameterized queries
Python eval() usage2 callsSafe -- both are PyTorch model.eval() (inference mode), not Python eval
Firmware buffer overflow risk0 strcpy/sprintfClean -- uses safe string functions
CI security scanningActiveBandit, Semgrep, Safety in dedicated workflow, runs daily
Dependency scanningActiveSafety checks in CI

Security assessment: GOOD. The project follows secure coding practices. The dedicated security-scan workflow with daily scheduling is a strong indicator of security maturity. No critical vulnerabilities detected in static analysis patterns.

3.4 Documentation Quality

MetricValueAssessment
Rust /// doc comments11,965Strong
Rust //! module docs3,512Strong
Rust pub fn with docs1,781 / 3,912 (45.5%)Moderate -- 54.5% of public functions lack doc comments
Python functions with docstrings~543 / ~801 (67.8%)Good
Python classes with docstrings~121 / ~150 (80.7%)Strong
ADRs79Excellent governance
TODO/FIXME markers1 (Python), 0 (Rust)Clean -- no deferred technical debt markers

3.5 CI/CD Pipeline Coverage

WorkflowTriggerScope
ci.ymlPush/PR to main, develop, feature/*Python quality (Black, Flake8, MyPy), security (Bandit, Safety)
cd.yml(deployment)Production deployment
security-scan.ymlPush/PR + daily cronSAST with Bandit, Semgrep; dependency scanning with Safety
firmware-ci.ymlPush/PRESP32 firmware build verification
firmware-qemu.ymlPush/PRESP32 QEMU emulation tests
desktop-release.ymlReleaseDesktop application packaging
verify-pipeline.ymlPush/PRDeterministic proof verification
update-submodules.ymlManual/scheduledGit submodule sync

Gap: No CI workflow runs cargo test --workspace for the Rust codebase. The 2,618+ Rust tests appear to run only locally. This is a significant gap -- the largest and most critical codebase has no automated CI test execution.


4. Recommendations Matrix

#RecommendationPriorityEffortImpactDomain
R1Add Rust workspace tests to CI -- Create a GitHub Actions workflow that runs cargo test --workspace --no-default-features. The 2,618 Rust tests are the project's primary safety net but run only locally.CRITICALLow (1-2 days)Very HighCI/CD
R2Reduce .unwrap() calls -- Audit and convert the 720 .unwrap() calls in Rust production code to proper ? error propagation. Prioritize crates in the real-time pipeline: signal, mat, hardware, sensing-server.CRITICALHigh (2-3 weeks)Very HighReliability
R3Audit unsafe blocks -- Review all 340 unsafe blocks. Document safety invariants for each. Consider using unsafe_code lint to flag new additions.CRITICALMedium (1-2 weeks)HighSecurity
R4Add Python unit tests for untested modules -- 11 of 12 Python source modules have zero unit tests. Priority targets: api/ (3,694 LOC), services/ (3,038 LOC), database/ (1,715 LOC), middleware/ (1,798 LOC).HIGHMedium (2-3 weeks)HighCoverage
R5Add integration tests for 7 Rust crates -- wifi-densepose-core, wifi-densepose-hardware, wifi-densepose-nn, wifi-densepose-ruvector, wifi-densepose-vitals, wifi-densepose-wifiscan, wifi-densepose-cli have unit tests but no integration test directory.HIGHMedium (2 weeks)HighCoverage
R6Break up sensing-server/src/main.rs (4,846 lines) -- Extract route handlers, middleware, and configuration into separate modules. This single file is nearly 10x the project's 500-line guideline.HIGHMedium (1 week)MediumMaintainability
R7Add E2E tests -- Only 1 E2E test file exists (test_healthcare_scenario.py with 8 tests). For a system with REST API, WebSocket streaming, hardware integration, and mobile clients, E2E coverage is critically insufficient.HIGHHigh (3-4 weeks)Very HighCoverage
R8Add tests to wifi-densepose-wasm (1,805 LOC, 0 tests) -- This crate contains MAT WebAssembly bindings used in browser deployment. Zero test coverage for a user-facing interface is unacceptable.HIGHLow (3-5 days)MediumCoverage
R9Add firmware unit tests -- Adopt a C unit test framework (Unity, CMock, or CTest) for the 9,445 lines of ESP32 firmware. The fuzz harnesses are a good start but do not substitute for structured unit tests.MEDIUMMedium (2 weeks)MediumCoverage
R10Improve Rust public API documentation -- 54.5% of pub fn declarations lack doc comments. Add #![warn(missing_docs)] to crate lib.rs files to enforce documentation.MEDIUMMedium (1-2 weeks)MediumDocumentation
R11Add rustfmt.toml -- No Rust formatting configuration found. Add workspace-level rustfmt.toml and enforce in CI with cargo fmt --check.LOWLow (1 day)LowConsistency
R12Reduce cyclomatic complexity -- Average complexity of 24.09 is well above the 15 threshold. Target the 24 files over 500 lines for refactoring.MEDIUMHigh (3-4 weeks)HighMaintainability

5. Overall Quality Score

5.1 Scoring Methodology

Weighted scoring across 8 dimensions, each rated 0-100:

DimensionWeightScoreWeightedRationale
Unit test coverage20%6813.63,142 unit tests is strong for Rust/mobile, but Python modules severely undertested
Integration test coverage15%324.8Only 7 of 19 Rust crates have integration tests; Python integration tests exist but skip core modules
E2E test coverage10%80.81 E2E file with 8 tests for a multi-platform system is critically insufficient
Security posture15%8212.3Strong CI security scanning, clean code patterns, daily Bandit/Semgrep/Safety; offset by 340 unsafe blocks needing audit
Code quality / complexity15%355.3AQE score 37/100, 720 unwraps, 24 oversized files, high cyclomatic complexity
CI/CD maturity10%555.58 workflows is good breadth, but missing Rust test execution in CI is a major gap
Documentation10%787.879 ADRs, strong docstrings in Python, moderate Rust doc coverage, witness bundles
Architecture governance5%904.5Exemplary ADR practice, DDD bounded contexts, deterministic verification pipeline
Total100%54.6

5.2 Final Verdict

+---------------------------------------------------------------+
|              QE QUEEN ORCHESTRATION COMPLETE                   |
+---------------------------------------------------------------+
|  Project: wifi-densepose (WiFi CSI Pose Estimation)            |
|  Total Codebase: ~305K lines across 5 languages                |
|  Total Tests: 3,327 (2,618 Rust + 488 Python + 202 Mobile     |
|               + 3 firmware fuzz + 16 Rust integration files)   |
|  Fleet ID: fleet-02558e91                                      |
|  Domains Analyzed: 5                                           |
|  Duration: ~120s                                               |
|  Status: COMPLETED                                             |
|                                                                |
|  OVERALL QUALITY SCORE: 55 / 100                               |
|  GRADE: C+                                                     |
|  RELEASE READINESS: NOT READY (quality gate FAILED)            |
+---------------------------------------------------------------+

5.3 Summary Assessment

Strengths:

  • Exceptional architecture governance with 79 ADRs and deterministic verification (witness bundles)
  • Strong Rust unit test count (2,618) with good distribution across signal processing and training crates
  • Mature security CI pipeline with daily scheduled scanning (Bandit, Semgrep, Safety)
  • Mobile UI has the best test-to-code ratio in the entire project
  • No hardcoded secrets, no unsafe string operations in firmware, clean security patterns

Critical Gaps:

  • Rust tests do not run in CI -- the 2,618 tests are only a local safety net
  • 720 .unwrap() calls create panic risk in production signal processing pipelines
  • 340 unsafe blocks need formal audit with documented safety invariants
  • 11 of 12 Python source modules have zero unit tests
  • Only 8 E2E test functions for a multi-platform, multi-service system
  • sensing-server/main.rs at 4,846 lines is a monolith risk

Path to Release Readiness (target: 75/100):

  1. Add Rust CI workflow (+10 points to CI maturity)
  2. Add Python unit tests for top 4 untested modules (+8 points to unit coverage)
  3. Audit and reduce .unwrap() count by 50% (+5 points to code quality)
  4. Add 5+ E2E test scenarios (+4 points to E2E coverage)
  5. Add integration tests to core, hardware, nn crates (+5 points to integration coverage)

Report generated by QE Queen Coordinator (fleet-02558e91)
Learnings stored: queen-orchestration-full-qe-2026-04-05 in namespace learning
AQE v3 quality assessment saved to: .agentic-qe/results/quality/2026-04-05T11-02-19_assessment.json