docs/qe-reports/00-qe-queen-summary.md
Date: 2026-04-05
Fleet ID: fleet-02558e91
Orchestrator: QE Queen Coordinator (ADR-001)
Domains Activated: test-generation, coverage-analysis, quality-assessment, security-compliance, defect-intelligence
| Language / Layer | Files | Lines of Code | Purpose |
|---|---|---|---|
| Rust (.rs) | 379 | 153,139 | Core workspace -- 19 crates (16 in workspace, 3 excluded/auxiliary) |
| Python (.py) | 105 | 38,656 | v1 implementation -- API, services, sensing, hardware, middleware |
| C/H (firmware) | 48 | 9,445 | ESP32 CSI node firmware -- collectors, OTA, WASM runtime |
| TypeScript/TSX (mobile) | 48 | 7,571 | React Native mobile app -- screens, stores, services |
| JavaScript (UI) | ~117 | 25,798 | Web observatory UI, components, utilities |
| Markdown (docs) | ~79+ | 70,539 | 79 ADRs, user guides, research, witness logs |
| Total | ~776 | ~305,148 |
The project implements WiFi-based human pose estimation using Channel State Information (CSI). It is structured as a multi-language, multi-platform system:
wifi-densepose-wasm-edge (excluded for wasm32 target) and ruv-neural (auxiliary). Covers signal processing (RuvSense with 14 modules), neural inference (ONNX/PyTorch/Candle), mass casualty assessment (MAT), cross-viewpoint fusion (RuVector v2.0.4), hardware TDM protocol, and web APIs.| Indicator | Status | Details |
|---|---|---|
| Architecture Decision Records | Strong | 79 ADRs documented in docs/adr/ |
| CI/CD pipelines | Strong | 8 GitHub Actions workflows (CI, CD, security scan, firmware CI, QEMU, desktop release, verify pipeline, submodules) |
| Security scanning | Strong | Dedicated security-scan.yml with Bandit, Semgrep, Safety; runs daily on schedule |
| Deterministic verification | Strong | SHA-256 proof pipeline (archive/v1/data/proof/verify.py) with witness bundles (ADR-028) |
| Code formatting | Moderate | Black/Flake8 enforced for Python in CI; no rustfmt.toml found for Rust |
| Type checking | Moderate | MyPy configured in CI for Python; Rust has native type safety |
| Dependency management | Strong | Workspace-level Cargo.toml with pinned versions; requirements.txt for Python |
| Test Layer | Rust | Python | Mobile (TS) | Firmware (C) | Total |
|---|---|---|---|---|---|
| Unit tests | 2,618 #[test] | 322 functions / 15 files | 202 test cases / 25 files | 0 | 3,142 |
| Integration tests | 16 files / 7 crates | 132 functions / 11 files | 0 | 0 | 148+ functions |
| E2E tests | 0 | 8 functions / 1 file | 0 | 0 | 8 functions |
| Performance tests | 0 | 26 functions / 2 files | 0 | 0 | 26 functions |
| Fuzz tests | 0 | 0 | 0 | 3 files (harnesses) | 3 harnesses |
| Subtotal | ~2,634 | ~488 | ~202 | 3 | ~3,327 |
Ideal Pyramid Actual Shape Assessment
/\ /\
/E2E\ / 8 \ E2E: CRITICALLY THIN
/------\ /----\
/ Integ. \ / 148 \ Integration: THIN
/----------\ /--------\
/ Unit \ / 3,142 \ Unit: HEALTHY base
-------------- --------------
Pyramid Ratio (unit : integration : e2e):
Verdict: The pyramid is severely bottom-heavy. Unit tests are plentiful (good), but integration and E2E layers are dangerously thin relative to the project's complexity. For a multi-crate, multi-service system with hardware integration, the integration layer should be 3-4x larger, and E2E should be 10-20x larger.
| Crate | Source Lines | Test Count | Tests per 1K LOC | Integration Tests | Assessment |
|---|---|---|---|---|---|
| wifi-densepose-wasm-edge | 28,888 | 643 | 22.3 | 3 files | Good |
| wifi-densepose-signal | 16,194 | 370 | 22.8 | 1 file | Good |
| ruv-neural | ~558 (test-only) | 364 | N/A | 1 file | Test-only crate |
| wifi-densepose-train | 10,562 | 299 | 28.3 | 6 files | Strong |
| wifi-densepose-sensing-server | 17,825 | 274 | 15.4 | 3 files | Moderate |
| wifi-densepose-mat | 19,572 | 159 | 8.1 | 1 file | Needs improvement |
| wifi-densepose-wifiscan | 5,779 | 150 | 26.0 | 0 | Unit only |
| wifi-densepose-hardware | 4,005 | 106 | 26.5 | 0 | Unit only |
| wifi-densepose-ruvector | 4,629 | 106 | 22.9 | 0 | Unit only |
| wifi-densepose-vitals | 1,863 | 52 | 27.9 | 0 | Unit only |
| wifi-densepose-desktop | 3,309 | 39 | 11.8 | 1 file | Thin |
| wifi-densepose-core | 2,596 | 28 | 10.8 | 0 | Thin for core crate |
| wifi-densepose-nn | 2,959 | 23 | 7.8 | 0 | Needs improvement |
| wifi-densepose-cli | 1,317 | 5 | 3.8 | 0 | Critically thin |
| wifi-densepose-wasm | 1,805 | 0 | 0.0 | 0 | ZERO tests |
| wifi-densepose-api | 1 (stub) | 0 | N/A | 0 | Stub only |
| wifi-densepose-config | 1 (stub) | 0 | N/A | 0 | Stub only |
| wifi-densepose-db | 1 (stub) | 0 | N/A | 0 | Stub only |
| Source Module | Source Lines | Has Unit Tests | Has Integration Tests | Assessment |
|---|---|---|---|---|
| api (13 files) | 3,694 | No | Yes (test_api_endpoints, test_rate_limiting) | Partial |
| services (7 files) | 3,038 | No | Yes (test_inference_pipeline) | Partial |
| sensing (6 files) | 2,117 | Yes (test_sensing) | Yes (test_streaming_pipeline) | Moderate |
| tasks (3 files) | 1,977 | No | No | ZERO coverage |
| middleware (4 files) | 1,798 | No | No | ZERO coverage |
| database (5 files) | 1,715 | No | No | ZERO coverage |
| commands (3 files) | 1,161 | No | No | ZERO coverage |
| core (4 files) | 1,117 | No (tests focus on CSI extractor from hardware/) | No | ZERO coverage |
| config (3 files) | 923 | No | No | ZERO coverage |
| hardware (3 files) | 755 | Yes (test_csi_extractor, test_esp32_binary_parser) | Yes (test_hardware_integration) | Good |
| models (3 files) | 578 | No | No | ZERO coverage |
| testing (3 files) | 500 | No | No | ZERO coverage |
Key finding: Python unit tests concentrate heavily on CSI extraction and processing (the hardware layer). 11 of 12 source modules have zero dedicated unit test files. The 322 unit test functions map almost entirely to hardware/csi_extractor.py and related signal processing code.
The mobile UI has 25 test files with 202 test cases, covering:
Assessment: Mobile testing is the strongest layer relative to its codebase size. Good breadth across stores, components, services, and screens.
| Test Type | Count | Coverage |
|---|---|---|
| Fuzz harnesses | 3 | fuzz_csi_serialize.c, fuzz_edge_enqueue.c, fuzz_nvs_config.c |
| Unit tests | 0 | No structured unit testing framework |
| Integration tests | 0 | No automated hardware-in-the-loop tests |
Assessment: The firmware has fuzz testing (a positive for security-critical embedded code), but lacks structured unit tests. The 9,445 lines of C code for a safety-relevant embedded system (disaster survivor detection via MAT) warrant stronger test coverage.
| Metric | Value | Threshold | Status |
|---|---|---|---|
| AQE quality score | 37/100 | >70 | FAIL |
| Cyclomatic complexity (avg) | 24.09 | <15 | FAIL |
| Maintainability index | 24.35 | >50 | FAIL |
| Security score | 85/100 | >80 | PASS |
Large file risk (>500 lines in Rust src/):
| File | Lines | Risk |
|---|---|---|
sensing-server/src/main.rs | 4,846 | Monolith risk -- nearly 10x the 500-line guideline |
sensing-server/src/training_api.rs | 1,946 | High complexity |
wasm/src/mat.rs | 1,673 | Hard to test, 0 tests in crate |
train/src/metrics.rs | 1,664 | Complex math, needs exhaustive testing |
signal/src/ruvsense/pose_tracker.rs | 1,523 | Critical path, well-tested |
mat/src/integration/csi_receiver.rs | 1,401 | Integration boundary |
mat/src/integration/hardware_adapter.rs | 1,360 | Hardware boundary, audit needed |
24 Rust source files exceed 500 lines, violating the project's own CLAUDE.md guideline.
| Pattern | Count | Assessment |
|---|---|---|
Result<> returns | 450 | Good -- idiomatic error handling in use |
.unwrap() calls | 720 | HIGH RISK -- 720 potential panic points in production code |
.expect() calls | 35 | Acceptable -- provides context on failure |
panic!() calls | 1 | Good -- minimal explicit panics |
unsafe blocks | 340 | NEEDS AUDIT -- high count for an application-level project |
Critical concern: The 720 .unwrap() calls represent potential runtime panics. In a system processing real-time WiFi CSI data for pose estimation (and mass casualty assessment), an unwrap failure could crash the entire pipeline. Each call should be reviewed and converted to proper error propagation with ? operator or explicit error handling.
The 340 unsafe blocks are high for a project that is not a systems-level library. These need a focused audit to verify memory safety invariants are upheld, especially in signal processing and hardware interaction code.
| Check | Result | Details |
|---|---|---|
| Hardcoded secrets in Python | 0 found | Clean |
| SQL injection risk (f-string SQL) | 0 found | Clean -- likely using parameterized queries |
Python eval() usage | 2 calls | Safe -- both are PyTorch model.eval() (inference mode), not Python eval |
| Firmware buffer overflow risk | 0 strcpy/sprintf | Clean -- uses safe string functions |
| CI security scanning | Active | Bandit, Semgrep, Safety in dedicated workflow, runs daily |
| Dependency scanning | Active | Safety checks in CI |
Security assessment: GOOD. The project follows secure coding practices. The dedicated security-scan workflow with daily scheduling is a strong indicator of security maturity. No critical vulnerabilities detected in static analysis patterns.
| Metric | Value | Assessment |
|---|---|---|
Rust /// doc comments | 11,965 | Strong |
Rust //! module docs | 3,512 | Strong |
Rust pub fn with docs | 1,781 / 3,912 (45.5%) | Moderate -- 54.5% of public functions lack doc comments |
| Python functions with docstrings | ~543 / ~801 (67.8%) | Good |
| Python classes with docstrings | ~121 / ~150 (80.7%) | Strong |
| ADRs | 79 | Excellent governance |
| TODO/FIXME markers | 1 (Python), 0 (Rust) | Clean -- no deferred technical debt markers |
| Workflow | Trigger | Scope |
|---|---|---|
ci.yml | Push/PR to main, develop, feature/* | Python quality (Black, Flake8, MyPy), security (Bandit, Safety) |
cd.yml | (deployment) | Production deployment |
security-scan.yml | Push/PR + daily cron | SAST with Bandit, Semgrep; dependency scanning with Safety |
firmware-ci.yml | Push/PR | ESP32 firmware build verification |
firmware-qemu.yml | Push/PR | ESP32 QEMU emulation tests |
desktop-release.yml | Release | Desktop application packaging |
verify-pipeline.yml | Push/PR | Deterministic proof verification |
update-submodules.yml | Manual/scheduled | Git submodule sync |
Gap: No CI workflow runs cargo test --workspace for the Rust codebase. The 2,618+ Rust tests appear to run only locally. This is a significant gap -- the largest and most critical codebase has no automated CI test execution.
| # | Recommendation | Priority | Effort | Impact | Domain |
|---|---|---|---|---|---|
| R1 | Add Rust workspace tests to CI -- Create a GitHub Actions workflow that runs cargo test --workspace --no-default-features. The 2,618 Rust tests are the project's primary safety net but run only locally. | CRITICAL | Low (1-2 days) | Very High | CI/CD |
| R2 | Reduce .unwrap() calls -- Audit and convert the 720 .unwrap() calls in Rust production code to proper ? error propagation. Prioritize crates in the real-time pipeline: signal, mat, hardware, sensing-server. | CRITICAL | High (2-3 weeks) | Very High | Reliability |
| R3 | Audit unsafe blocks -- Review all 340 unsafe blocks. Document safety invariants for each. Consider using unsafe_code lint to flag new additions. | CRITICAL | Medium (1-2 weeks) | High | Security |
| R4 | Add Python unit tests for untested modules -- 11 of 12 Python source modules have zero unit tests. Priority targets: api/ (3,694 LOC), services/ (3,038 LOC), database/ (1,715 LOC), middleware/ (1,798 LOC). | HIGH | Medium (2-3 weeks) | High | Coverage |
| R5 | Add integration tests for 7 Rust crates -- wifi-densepose-core, wifi-densepose-hardware, wifi-densepose-nn, wifi-densepose-ruvector, wifi-densepose-vitals, wifi-densepose-wifiscan, wifi-densepose-cli have unit tests but no integration test directory. | HIGH | Medium (2 weeks) | High | Coverage |
| R6 | Break up sensing-server/src/main.rs (4,846 lines) -- Extract route handlers, middleware, and configuration into separate modules. This single file is nearly 10x the project's 500-line guideline. | HIGH | Medium (1 week) | Medium | Maintainability |
| R7 | Add E2E tests -- Only 1 E2E test file exists (test_healthcare_scenario.py with 8 tests). For a system with REST API, WebSocket streaming, hardware integration, and mobile clients, E2E coverage is critically insufficient. | HIGH | High (3-4 weeks) | Very High | Coverage |
| R8 | Add tests to wifi-densepose-wasm (1,805 LOC, 0 tests) -- This crate contains MAT WebAssembly bindings used in browser deployment. Zero test coverage for a user-facing interface is unacceptable. | HIGH | Low (3-5 days) | Medium | Coverage |
| R9 | Add firmware unit tests -- Adopt a C unit test framework (Unity, CMock, or CTest) for the 9,445 lines of ESP32 firmware. The fuzz harnesses are a good start but do not substitute for structured unit tests. | MEDIUM | Medium (2 weeks) | Medium | Coverage |
| R10 | Improve Rust public API documentation -- 54.5% of pub fn declarations lack doc comments. Add #![warn(missing_docs)] to crate lib.rs files to enforce documentation. | MEDIUM | Medium (1-2 weeks) | Medium | Documentation |
| R11 | Add rustfmt.toml -- No Rust formatting configuration found. Add workspace-level rustfmt.toml and enforce in CI with cargo fmt --check. | LOW | Low (1 day) | Low | Consistency |
| R12 | Reduce cyclomatic complexity -- Average complexity of 24.09 is well above the 15 threshold. Target the 24 files over 500 lines for refactoring. | MEDIUM | High (3-4 weeks) | High | Maintainability |
Weighted scoring across 8 dimensions, each rated 0-100:
| Dimension | Weight | Score | Weighted | Rationale |
|---|---|---|---|---|
| Unit test coverage | 20% | 68 | 13.6 | 3,142 unit tests is strong for Rust/mobile, but Python modules severely undertested |
| Integration test coverage | 15% | 32 | 4.8 | Only 7 of 19 Rust crates have integration tests; Python integration tests exist but skip core modules |
| E2E test coverage | 10% | 8 | 0.8 | 1 E2E file with 8 tests for a multi-platform system is critically insufficient |
| Security posture | 15% | 82 | 12.3 | Strong CI security scanning, clean code patterns, daily Bandit/Semgrep/Safety; offset by 340 unsafe blocks needing audit |
| Code quality / complexity | 15% | 35 | 5.3 | AQE score 37/100, 720 unwraps, 24 oversized files, high cyclomatic complexity |
| CI/CD maturity | 10% | 55 | 5.5 | 8 workflows is good breadth, but missing Rust test execution in CI is a major gap |
| Documentation | 10% | 78 | 7.8 | 79 ADRs, strong docstrings in Python, moderate Rust doc coverage, witness bundles |
| Architecture governance | 5% | 90 | 4.5 | Exemplary ADR practice, DDD bounded contexts, deterministic verification pipeline |
| Total | 100% | 54.6 |
+---------------------------------------------------------------+
| QE QUEEN ORCHESTRATION COMPLETE |
+---------------------------------------------------------------+
| Project: wifi-densepose (WiFi CSI Pose Estimation) |
| Total Codebase: ~305K lines across 5 languages |
| Total Tests: 3,327 (2,618 Rust + 488 Python + 202 Mobile |
| + 3 firmware fuzz + 16 Rust integration files) |
| Fleet ID: fleet-02558e91 |
| Domains Analyzed: 5 |
| Duration: ~120s |
| Status: COMPLETED |
| |
| OVERALL QUALITY SCORE: 55 / 100 |
| GRADE: C+ |
| RELEASE READINESS: NOT READY (quality gate FAILED) |
+---------------------------------------------------------------+
Strengths:
Critical Gaps:
.unwrap() calls create panic risk in production signal processing pipelinesunsafe blocks need formal audit with documented safety invariantssensing-server/main.rs at 4,846 lines is a monolith riskPath to Release Readiness (target: 75/100):
.unwrap() count by 50% (+5 points to code quality)core, hardware, nn crates (+5 points to integration coverage)Report generated by QE Queen Coordinator (fleet-02558e91)
Learnings stored: queen-orchestration-full-qe-2026-04-05 in namespace learning
AQE v3 quality assessment saved to: .agentic-qe/results/quality/2026-04-05T11-02-19_assessment.json