Back to Ruview

Quality Experience (QX) Analysis: WiFi-DensePose

docs/qe-reports/05-quality-experience.md

1.99.0-pip45.6 KB
Original Source

Quality Experience (QX) Analysis: WiFi-DensePose

Report ID: QX-2026-005 Date: 2026-04-05 Scope: Full-stack quality experience across API, CLI, Mobile, DX, and Hardware QX Score: 71/100 (C+)


Table of Contents

  1. Executive Summary
  2. Overall QX Scores
  3. User Journey Analysis by Persona
  4. API Experience Analysis
  5. CLI Experience Analysis
  6. Mobile App UX Analysis
  7. Developer Experience (DX) Analysis
  8. Hardware Integration UX Analysis
  9. Cross-Cutting Quality Concerns
  10. Oracle Problems Detected
  11. Prioritized Recommendations
  12. Heuristic Scoring Summary

1. Executive Summary

The WiFi-DensePose system demonstrates strong architectural foundations with a well-structured FastAPI backend, a mature React Native mobile app, and a comprehensive CLI. However, the quality experience is uneven across touchpoints, with several gaps that impact different user personas in distinct ways.

Key Findings

Strengths:

  • Comprehensive error handling middleware with structured error responses, request IDs, and environment-aware detail levels (archive/v1/src/middleware/error_handler.py)
  • Robust WebSocket reconnection with exponential backoff and automatic simulation fallback in the mobile app (ui/mobile/src/services/ws.service.ts)
  • Well-designed health check architecture with component-level status, readiness probes, and liveness endpoints (archive/v1/src/api/routers/health.py)
  • Strong input validation on API models with Pydantic, including range constraints and clear field descriptions (archive/v1/src/api/routers/pose.py)
  • Persistent settings with AsyncStorage in the mobile app, surviving app restarts (ui/mobile/src/stores/settingsStore.ts)
  • Server URL validation with test-before-save workflow in mobile settings (ui/mobile/src/screens/SettingsScreen/ServerUrlInput.tsx)

Critical Issues:

  • API documentation is disabled in production (docs_url=None, redoc_url=None when is_production=True), leaving production API consumers without discoverability (in archive/v1/src/api/main.py line 146-148)
  • No user-facing progress indicator during calibration -- the calibration endpoint returns an estimated duration but there is no polling endpoint progress beyond percentage (archive/v1/src/api/routers/pose.py lines 320-361)
  • Rate limit responses lack a human-readable Retry-After message body; the client receives a bare "Rate limit exceeded" string with retry information only in HTTP headers (archive/v1/src/middleware/rate_limit.py line 323)
  • CLI status command uses emoji/Unicode characters that break in terminals without UTF-8 support (archive/v1/src/commands/status.py lines 360-474)
  • Mobile app MainTabs.tsx passes an inline arrow function as the component prop to Tab.Screen (line 130), causing unnecessary re-renders on every parent render cycle

Top 3 Recommendations:

  1. Add a separate production API documentation URL (e.g., /api-docs) with authentication, rather than removing docs entirely
  2. Implement a WebSocket-based calibration progress stream or add a polling endpoint that returns step-by-step progress
  3. Add a --no-emoji CLI flag or auto-detect terminal capabilities to avoid broken status output

2. Overall QX Scores

DimensionScoreGradeAssessment
Overall QX71/100C+Functional but inconsistent across touchpoints
API Experience78/100B-Well-structured endpoints, good error model, weak discoverability
CLI Experience65/100D+Adequate commands, poor terminal compatibility, limited help
Mobile UX80/100BStrong connection handling, good fallbacks, minor render issues
Developer Experience68/100D+Steep learning curve, complex build, limited onboarding docs
Hardware UX62/100DComplex provisioning, limited error recovery guidance
Accessibility45/100FNo ARIA consideration in mobile, no high-contrast support
Trust & Reliability76/100B-Good health checks, rate limiting, auth framework in place
Cross-Codebase Consistency70/100CDifferent error formats between API/CLI, naming inconsistencies

3. User Journey Analysis by Persona

3.1 Developer Persona

Journey: Clone repo -> Set up environment -> Build -> Run tests -> Develop -> Submit PR

StepSuccess RatePain LevelBottleneck
Clone & orientModerateMEDIUMMultiple codebases (Python v1, Rust, firmware, mobile) with no single entry point guide
Environment setupLowHIGHRequires Python + Rust toolchain + Node.js + ESP-IDF for full development
Build Python APIModerateMEDIUMDependency management not containerized for easy onboarding
Run Rust testsHighLOWcargo test --workspace --no-default-features works reliably (1,031+ tests)
Run Python testsModerateMEDIUMRequires database setup, Redis optional but affects behavior
Contribute to mobileModerateMEDIUMExpo/React Native setup is standard but undocumented within this repo

Key Findings:

  • CLAUDE.md is comprehensive for AI agents but not optimized for human developers; it mixes agent configuration with build instructions
  • No CONTRIBUTING.md file exists
  • Build commands are scattered: Python uses pip, Rust uses cargo, mobile uses npm, firmware uses ESP-IDF
  • Test commands differ between npm test, cargo test, and python -m pytest with no unified runner
  • The pre-merge checklist in CLAUDE.md has 12 items, which is thorough but creates friction for external contributors

3.2 Operator Persona

Journey: Install -> Configure -> Start server -> Monitor -> Troubleshoot

StepSuccess RatePain LevelBottleneck
InstallLowHIGHNo single installation script or Docker Compose for the full stack
ConfigureModerateMEDIUMConfig file path must be specified; no --init to generate default config
Start serverModerateMEDIUMwifi-densepose start works but database must be initialized first
Monitor statusHighLOWwifi-densepose status --detailed provides comprehensive output
Stop serverHighLOWBoth graceful and force-stop options available
TroubleshootLowHIGHError messages reference internal exceptions; no runbook or FAQ

Key Findings:

  • The CLI offers start, stop, status, db init/migrate/rollback, config show/validate/failsafe, tasks run/status, and version -- a reasonable command set
  • However, there is no wifi-densepose init command to scaffold a working configuration from scratch
  • The config validate command checks database, Redis, and directory availability -- good for operators
  • The config failsafe command showing SQLite fallback status is a strong resilience feature
  • Missing: log rotation configuration, log level adjustment at runtime, and a wifi-densepose doctor self-diagnosis command

3.3 End-User Persona (Mobile App User)

Journey: Open app -> Connect to server -> View live data -> Check vitals -> Manage zones -> Configure settings

StepSuccess RatePain LevelBottleneck
Open appHighLOWClean initial load with loading spinners
Connect to serverModerateMEDIUMDefault URL is localhost:3000 which will not work on physical devices
View live dataHighLOWSimulation fallback ensures something is always displayed
Check vitalsHighLOWGauges, sparklines, and classification render smoothly
Manage zonesModerateLOWHeatmap visualization is functional
Configure settingsHighLOWServer URL validation, test connection, save workflow is solid

Key Findings:

  • The default serverUrl in settingsStore.ts is http://localhost:3000, which will fail on a physical device where the server runs on a different machine; a first-run setup wizard would improve this
  • Connection state management is well-implemented with three visible states: LIVE STREAM, SIMULATED DATA, and DISCONNECTED via ConnectionBanner.tsx
  • The simulation fallback (generateSimulatedData()) activates automatically when WebSocket connection fails, ensuring the app never shows a blank screen
  • The MAT (Mass Casualty Assessment Tool) screen seeds a training scenario on first load, which may confuse users who expect a clean state
  • ErrorBoundary provides crash recovery with a "Retry" button, but the error message is the raw JavaScript error (error.message) without user-friendly context

4. API Experience Analysis

4.1 Endpoint Structure (Score: 82/100)

The API follows RESTful conventions with clear resource paths:

GET  /health/health       - System health
GET  /health/ready        - Readiness probe
GET  /health/live         - Liveness probe
GET  /health/metrics      - System metrics (auth required for detailed)
GET  /health/version      - Version info

GET  /api/v1/pose/current - Current pose estimation
POST /api/v1/pose/analyze - Custom analysis (auth required)
GET  /api/v1/pose/zones/{zone_id}/occupancy - Zone occupancy
GET  /api/v1/pose/zones/summary - All zones summary
POST /api/v1/pose/historical - Historical data (auth required)
GET  /api/v1/pose/activities - Recent activities
POST /api/v1/pose/calibrate - Start calibration (auth required)
GET  /api/v1/pose/calibration/status - Calibration status
GET  /api/v1/pose/stats - Statistics

WS   /api/v1/stream/pose  - Real-time pose stream
WS   /api/v1/stream/events - Event stream

Issues Found:

  • GET /health/health is redundant path nesting; the health router is mounted at /health prefix, making the full path /health/health. This should be /health (root of the health router) or the prefix should be / for the health router
  • POST /api/v1/pose/historical uses POST for a read operation. While this is common for complex queries, it violates REST conventions. A GET with query parameters or a POST /api/v1/pose/query would be clearer
  • The root endpoint (GET /) exposes feature flags (authentication, rate_limiting) which could leak security posture information

4.2 Error Handling (Score: 85/100)

The ErrorHandler class in archive/v1/src/middleware/error_handler.py is well-designed:

Strengths:

  • Structured error responses with consistent format: { "error": { "code": "...", "message": "...", "timestamp": "...", "request_id": "..." } }
  • Request ID tracking via X-Request-ID header for debugging
  • Environment-aware: tracebacks included in development, hidden in production
  • Specialized handlers for HTTP, validation, Pydantic, database, and external service errors
  • Custom exception classes (BusinessLogicError, ResourceNotFoundError, ConflictError, ServiceUnavailableError) with domain context

Issues Found:

  • The ErrorHandlingMiddleware class exists but is commented out (line 432-434 in error_handler.py), meaning errors are handled by setup_error_handling() exception handlers instead. The middleware class and the exception handlers use different ErrorHandler instances, creating potential inconsistency if one is changed without the other
  • The _is_database_error() check uses string matching on module names (line 355-373), which is fragile. "ConnectionError" will match aiohttp.ConnectionError (an external service error), not just database connection errors
  • Error responses do not include a documentation_url field that could guide users to relevant docs

4.3 Rate Limiting UX (Score: 72/100)

Strengths:

  • Dual algorithm support: sliding window counter and token bucket
  • Per-endpoint rate limiting with per-user differentiation
  • Standard X-RateLimit-* headers on all responses
  • Retry-After header on 429 responses
  • Health/docs/metrics paths exempted from rate limiting
  • Configurable presets for development, production, API, and strict modes

Issues Found:

  • The 429 response body is "Rate limit exceeded" (a plain string). No structured error response with the ErrorResponse format is used. The rate limit middleware raises HTTPException directly rather than using CustomHTTPException or ErrorResponse
  • No information about which rate limit bucket was exhausted (per-IP vs per-user vs per-endpoint)
  • No rate limit dashboard or endpoint to check current rate limit status without making a request
  • The RateLimitConfig presets (development, production, api, strict) are defined but there is no CLI command or API endpoint to switch between them

4.4 WebSocket Experience (Score: 80/100)

Strengths:

  • Connection confirmation message with client ID and configuration on connect
  • Structured message protocol with type field (ping, update_config, get_status)
  • Invalid JSON is handled gracefully with an error message back to client
  • Stale connection cleanup every 60 seconds with 5-minute timeout
  • Zone-based and stream-type-based filtering for broadcasts
  • Client-side config updates without reconnection via update_config message

Issues Found:

  • Authentication is checked after websocket.accept() (line 80-93 in stream.py), meaning unauthenticated clients briefly hold a connection before being closed. This wastes resources and leaks the existence of the endpoint
  • The handle_websocket_message function handles unknown message types with an error, but does not suggest valid message types: "Unknown message type: foo" should list valid options
  • No heartbeat/keepalive mechanism initiated from the server. The client must send ping messages. If the client does not ping, the connection will be considered stale after 5 minutes even if data is flowing
  • Close codes are not documented for clients to handle reconnection logic

4.5 API Documentation & Discoverability (Score: 58/100)

Issues Found:

  • Swagger UI (/docs) and ReDoc (/redoc) are disabled in production (line 146-148 of main.py): docs_url=settings.docs_url if not settings.is_production else None
  • No alternative documentation hosting for production environments
  • The GET / root endpoint and GET /api/v1/info endpoint provide feature information but no link to documentation
  • Pydantic models have good Field(description=...) annotations, which would generate useful OpenAPI docs -- but only visible in development
  • No API changelog or versioning documentation beyond the version field

5. CLI Experience Analysis

5.1 Command Structure (Score: 70/100)

The CLI uses Click with a nested group structure:

wifi-densepose [--config FILE] [--verbose] [--debug]
  start   [--host] [--port] [--workers] [--reload] [--daemon]
  stop    [--force] [--timeout]
  status  [--format text|json] [--detailed]
  db
    init      [--url]
    migrate   [--revision]
    rollback  [--steps]
  tasks
    run       [--task cleanup|monitoring|backup]
    status
  config
    show
    validate
    failsafe  [--format text|json]
  version

Strengths:

  • Logical grouping of commands (server, db, tasks, config)
  • Global options --config, --verbose, --debug available on all commands
  • --daemon mode with PID file management and stale PID detection
  • JSON output format option on status and failsafe for scripting

Issues Found:

  • No shell completion support (Click supports it but it is not configured)
  • No init or setup command to generate a default configuration file
  • No logs command to tail or search server logs
  • The tasks status subcommand shadows the parent status command in Click's namespace (line 347-348 in cli.py defines def status(ctx): under the tasks group), which works but creates confusion
  • No --quiet option for scripting (opposite of --verbose)
  • Error output goes through logger.error() which depends on logging configuration; if logging is misconfigured, errors are silently lost

5.2 Error Messages (Score: 60/100)

Issues Found:

  • Errors from start command show the raw exception: "Failed to start server: {e}" where {e} is the Python exception string
  • No suggestion for common failure scenarios. For example, if the database connection fails during start, the error is "Database connection failed: [psycopg2 error]" with no guidance like "Check your DATABASE_URL setting" or "Run 'wifi-densepose db init' first"
  • The config validate command outputs check-style messages ("X Database connection: FAILED - {e}") which is helpful, but the X and checkmark characters use Unicode that may not render in all terminals
  • The stop command handles "Server is not running" gracefully, which is good
  • Missing: error codes that users could search for in documentation

5.3 Help Text (Score: 65/100)

Strengths:

  • Each command has a one-line description
  • Options have help text and defaults documented

Issues Found:

  • No examples in help text. The argparse epilog pattern used in provision.py is good practice but is not used in the Click CLI
  • No --help examples showing common workflows like "Start a development server", "Deploy to production", or "Initialize a fresh installation"
  • Command descriptions are terse: "Start the WiFi-DensePose API server" does not mention prerequisites

5.4 Configuration Workflow (Score: 68/100)

Strengths:

  • config show displays the full configuration without secrets
  • config validate checks database, Redis, and directory access
  • config failsafe shows SQLite fallback and Redis degradation status
  • Settings can be loaded from a file via --config flag

Issues Found:

  • No config init to generate a template configuration file
  • No config set KEY VALUE to modify individual settings
  • No environment variable listing showing which variables affect configuration
  • The config show output dumps JSON but does not annotate which values are defaults vs user-configured

6. Mobile App UX Analysis

6.1 Screen Flow Architecture (Score: 82/100)

The app uses a bottom tab navigator with five screens:

Live (wifi icon) -> Vitals (heart) -> Zones (grid) -> MAT (shield) -> Settings (gear)

Strengths:

  • Lazy loading of all screens with React.lazy and suspense fallbacks showing loading indicator with screen name
  • Fallback placeholder screens for any screen that fails to load: "{label} screen not implemented yet" with a "Placeholder shell" subtitle
  • MAT screen badge showing alert count in the tab bar
  • Icon mapping is clear and semantically appropriate

Issues Found:

  • MainTabs.tsx line 130: component={() => <Suspended component={component} />} creates a new function reference on every render. This should be refactored to a stable component reference to prevent unnecessary tab re-renders
  • No deep linking support for navigating directly to a screen from a notification or external URL
  • No screen transition animations configured; the default tab switch is abrupt
  • Tab labels use fontFamily: 'Courier New' which may not be available on all devices, with no fallback font specified

6.2 Connection Handling (Score: 88/100)

The WebSocket connection strategy in ws.service.ts is well-designed:

Strengths:

  • Exponential backoff reconnection: delays of 1s, 2s, 4s, 8s, 16s
  • Maximum 10 reconnection attempts before falling back to simulation
  • Simulation mode provides continuous data display even when disconnected
  • Connection status propagated to all screens via Zustand store
  • Clean disconnect with close code 1000
  • Auto-connect on app mount via usePoseStream hook
  • URL validation before attempting connection

Issues Found:

  • When reconnecting, the simulation timer starts immediately during the backoff delay, which means the user briefly sees "SIMULATED DATA" then "LIVE STREAM" then potentially "SIMULATED DATA" again if the reconnect fails. This creates a flickering experience
  • No user notification when switching between live and simulated modes beyond the banner color change
  • The WebSocket URL construction in buildWsUrl() hardcodes the path /ws/sensing, but the API server expects /api/v1/stream/pose. This path mismatch (WS_PATH = '/api/v1/stream/pose' in constants/websocket.ts vs /ws/sensing in ws.service.ts) is a potential connection failure point
  • No explicit ping/pong keepalive from the client; relies on the WebSocket protocol's built-in mechanism

6.3 Loading & Error States (Score: 78/100)

Strengths:

  • LoadingSpinner component with smooth rotation animation using react-native-reanimated
  • ErrorBoundary wraps the LiveScreen with crash recovery
  • LiveScreen shows a dedicated error state with "Live visualization failed", the error message, and a "Retry" button
  • Retry increments a viewerKey to force component remount
  • ConnectionBanner provides three distinct visual states with semantic colors (green/amber/red)

Issues Found:

  • The ErrorBoundary shows error.message directly, which may be a technical JavaScript error string like "Cannot read property 'x' of undefined". A user-friendly message mapping would improve the experience
  • No timeout handling on loading states. If the GaussianSplat WebView never fires onReady, the loading spinner displays indefinitely
  • The VitalsScreen shows N/A for features when no data is available, but the gauges (BreathingGauge, HeartRateGauge) behavior at zero/null values is not guarded in the screen code
  • No skeleton loading states; screens jump from blank to fully rendered

6.4 State Management (Score: 85/100)

Strengths:

  • Zustand stores are well-structured with clear separation: poseStore (real-time data), settingsStore (configuration), matStore (MAT data)
  • settingsStore uses persist middleware with AsyncStorage for cross-session persistence
  • poseStore uses a RingBuffer for RSSI history, capping at 60 entries to prevent memory growth
  • Clean reset() method on poseStore to clear all state

Issues Found:

  • poseStore is not persisted, so all historical data is lost on app restart. For a monitoring application, this is a significant gap
  • The handleFrame method updates 6 state properties atomically in one set() call, which is correct, but the rssiHistory is computed from a module-level RingBuffer that exists outside the store, creating a potential synchronization issue during hot reload
  • No state migration strategy for settingsStore -- if the schema changes between app versions, persisted state may cause errors

6.5 Server Configuration UX (Score: 82/100)

The ServerUrlInput component in the Settings screen provides:

Strengths:

  • Real-time URL validation with validateServerUrl() showing error messages inline
  • "Test Connection" button that measures and displays response latency
  • Visual feedback: border turns red on invalid URL, test result shows checkmark/X with timing
  • "Save" button separated from "Test" to allow testing before committing

Issues Found:

  • Default server URL http://localhost:3000 will never work on a physical device. The first-run experience should prompt for the server address or attempt auto-discovery via mDNS/Bonjour
  • No QR code scanner to configure server URL (common in IoT companion apps)
  • Test result is ephemeral -- it disappears when navigating away and returning
  • No validation of port range or IP address format beyond URL syntax
  • Save does not confirm success to the user; the connection simply restarts silently

7. Developer Experience (DX) Analysis

7.1 Build Process (Score: 65/100)

Issues Found:

  • Four separate build systems: Python (pip/poetry), Rust (cargo), Node.js (npm), and ESP-IDF for firmware
  • No unified Makefile, Taskfile, or just file to abstract build commands
  • CLAUDE.md lists build commands but they are mixed with AI agent configuration
  • Docker support is mentioned in the pre-merge checklist but no docker-compose.yml for local development was found
  • The Rust workspace has 15 crates with a specific publishing order -- this dependency chain is documented but not automated

7.2 Testing Experience (Score: 72/100)

Strengths:

  • Rust workspace has 1,031+ tests with a single command: cargo test --workspace --no-default-features
  • Deterministic proof verification via python archive/v1/data/proof/verify.py with SHA-256 hash checking
  • Mobile app has comprehensive test coverage with tests for components, hooks, screens, services, stores, and utilities
  • Witness bundle verification with VERIFY.sh providing 7/7 pass/fail attestation

Issues Found:

  • No unified test runner across codebases
  • Python test command (python -m pytest tests/ -x -q) requires proper environment setup first
  • Mobile tests require additional setup (jest, React Native testing libraries)
  • No integration test suite that tests the full stack (API + WebSocket + Mobile)
  • No test coverage reporting configured for the Python codebase

7.3 Documentation Quality (Score: 62/100)

Strengths:

  • 43 Architecture Decision Records (ADRs) in docs/adr/
  • Domain-Driven Design documentation in docs/ddd/
  • Comprehensive hardware audit in ADR-028 with witness bundle
  • User guide at docs/user-guide.md

Issues Found:

  • No quickstart guide for first-time contributors
  • CLAUDE.md is 500+ lines but is primarily an AI agent configuration file, not a developer guide
  • No API reference documentation beyond the auto-generated Swagger (which is disabled in production)
  • No architecture diagram showing how the Python API, Rust core, mobile app, and ESP32 firmware interact
  • Missing: changelog is referenced in the pre-merge checklist but its location is not specified

7.4 Error Messages for Developers (Score: 70/100)

Strengths:

  • FastAPI validation errors return field-level details with type, message, and location
  • Rust crate errors use typed error types (wifi-densepose-core)
  • Middleware error handler includes traceback in development mode

Issues Found:

  • Python API errors in handlers use f-string formatting with raw exception messages: f"Pose estimation failed: {str(e)}". These are user-facing but contain internal details
  • No error code catalog or error reference documentation
  • Startup validation errors print checkmarks but do not provide remediation steps

7.5 Configuration Management (Score: 68/100)

Strengths:

  • Pydantic Settings class with environment variable support
  • Configuration file loading via --config CLI flag
  • Database failsafe with SQLite fallback
  • Redis optional with graceful degradation

Issues Found:

  • No .env.example or .env.template file to guide environment variable setup
  • No configuration schema documentation beyond code inspection
  • Sensitive settings (database URL, JWT secret) are validated but error messages do not specify which environment variables to set
  • The config show command redacts secrets but does not explain where secrets should be configured

8. Hardware Integration UX Analysis

8.1 ESP32 Provisioning Flow (Score: 65/100)

The provision.py script in firmware/esp32-csi-node/ handles WiFi credential and mesh configuration:

Strengths:

  • Clear --help text with usage examples in the argparse epilog
  • Parameter validation: TDM slot/total must be specified together, channel ranges validated, MAC format validated
  • --dry-run option to generate binary without flashing
  • Fallback CSV generation when NVS binary generation fails, with manual flash instructions
  • Password masked in output: "WiFi Password: ****"
  • Multiple NVS generator discovery methods (Python module, ESP-IDF bundled script)

Issues Found:

  • No auto-detection of serial port. The --port is required, but users may not know which port their ESP32 is on. A --port auto option using serial.tools.list_ports would help
  • No verification step after flashing to confirm the provisioned values were written correctly
  • Error when esptool or nvs_partition_gen is not installed is a raw Python exception. A friendlier message like "Required tool 'esptool' not found. Install with: pip install esptool" would be better
  • The script name is provision.py but it is invoked as python firmware/esp32-csi-node/provision.py, which is a long path. A CLI subcommand like wifi-densepose hw provision would integrate better
  • 22 command-line arguments is overwhelming; grouped parameter presets (e.g., --profile basic, --profile mesh, --profile edge) would simplify common use cases
  • No interactive mode for guided provisioning

8.2 Serial Monitoring (Score: 55/100)

Issues Found:

  • Serial monitoring is done via python -m serial.tools.miniterm COM7 115200, which is a raw tool with no structured log parsing
  • No custom monitoring tool that parses ESP32 output, highlights errors, or shows CSI data visualization
  • No documentation on what serial output to expect during normal operation vs error conditions
  • Baud rate (115200) must be known; no auto-baud detection

8.3 Firmware Update Process (Score: 60/100)

Issues Found:

  • Firmware flashing uses idf.py flash which requires the full ESP-IDF toolchain
  • No OTA (Over-The-Air) update workflow documented for field deployments
  • The ota_data_initial.bin is listed in the release process but OTA update instructions are not provided
  • No firmware version reporting from the device to verify the update was successful
  • 8MB and 4MB builds require different sdkconfig.defaults files with manual copying

9. Cross-Cutting Quality Concerns

9.1 Error Handling Quality Across Touchpoints (Score: 73/100)

TouchpointError FormatUser GuidanceRecovery Path
API RESTStructured JSON with code, message, request_idNo documentation linksRetry logic needed by client
API WebSocketJSON { type: "error", message: "..." }Lists valid message types: NoReconnect
CLILogger output to stderrNo remediation suggestionsExit code 1
MobileErrorBoundary with retry, ConnectionBannerRaw error messagesRetry button, reconnect
ProvisioningPython exceptionsFallback CSV on failureManual flash instructions

Key Gap: Error message styles differ between API (structured JSON) and CLI (logger strings). A unified error taxonomy would improve consistency.

9.2 Feedback Loops (Score: 72/100)

ActionFeedback MechanismTimelinessQuality
API requestHTTP status + response bodyImmediateGood
WebSocket connectconnection_established messageImmediateGood
CLI startLog messages to stdoutReal-timeAdequate
CLI stop"Server stopped gracefully"After completionGood
Calibration startReturns calibration_id and estimated_duration_minutesImmediateIncomplete (no progress stream)
Mobile connectBanner color change~1s delayGood
Firmware flashprint() statementsReal-timeAdequate
Settings saveNo confirmationSilentPoor

9.3 Recovery Paths (Score: 68/100)

Failure ScenarioRecovery PathAutomated?Documentation
Database connection failsSQLite failsafe fallbackYesconfig failsafe command
Redis unavailableContinues without Redis, logs warningYesMentioned in startup output
WebSocket disconnectsExponential backoff reconnection, simulation fallbackYesNot documented
Stale PID fileDetected and cleaned up on start/stopYesNot documented
API server crashNo automatic restartNoNo systemd/supervisor config
Mobile app crashErrorBoundary with retryPartialNot documented
Firmware flash failsFallback CSV with manual instructionsPartialInline help
Calibration failsNo documented recoveryNoNot documented

9.4 Accessibility (Score: 45/100)

Issues Found:

  • Mobile app uses hardcoded hex colors throughout (e.g., '#0F141E', '#0F6B2A', '#8A1E2A') with no high-contrast mode support
  • No accessibilityLabel or accessibilityRole props on interactive components in the mobile app
  • ConnectionBanner relies on color alone to distinguish states (green/amber/red). The text labels (LIVE STREAM, SIMULATED DATA, DISCONNECTED) help, but there is no screen reader announcement on state change
  • CLI status output uses emoji (checkmarks, X marks, weather symbols) as semantic indicators with no text-only fallback
  • API documentation (when available) has no known accessibility testing
  • No ARIA landmarks or roles in the sensing server web UI (if any)
  • Font sizes are fixed in the mobile theme with no dynamic type/accessibility sizing support

10. Oracle Problems Detected

Oracle Problem 1 (HIGH): Production API Documentation vs Security

Type: User Need vs Business Need Conflict

  • User Need: API consumers need documentation to discover and integrate with endpoints
  • Business Need: Hiding Swagger/ReDoc in production reduces attack surface
  • Conflict: Disabling docs entirely (docs_url=None when is_production=True) leaves production API consumers without any discoverability mechanism

Failure Modes:

  1. Developers working against production endpoints cannot discover available APIs
  2. Third-party integrators have no self-service documentation
  3. Internal teams must maintain separate documentation that can drift from the actual API

Resolution Options:

OptionUser ScoreSecurity ScoreRecommendation
Keep docs disabled2095Current state
Auth-gated docs endpoint8580Recommended
Separate docs site from OpenAPI spec export9090Best but more effort
Rate-limited docs with no auth7060Compromise

Oracle Problem 2 (MEDIUM): Simulation Fallback vs Data Integrity

Type: User Experience vs Data Accuracy Conflict

  • User Need: The app should always show something; blank screens feel broken
  • Business Need: Users should know when they are seeing real vs simulated data
  • Conflict: Automatic simulation fallback means users may not realize they lost their real data feed

Failure Modes:

  1. Operator monitors "activity" that is actually simulated, missing real events
  2. MAT (Mass Casualty Assessment) screen shows simulated survivor data during a real incident
  3. Vitals screen displays simulated breathing/heart rate data, creating false confidence

Resolution Options:

OptionUX ScoreSafety ScoreRecommendation
Current: auto-simulate with banner8050Risky for safety-critical screens
Disable simulation on MAT/Vitals screens6085Recommended
Prominent modal overlay for simulated mode7080Good compromise
Require user confirmation to enter simulation5590Safest

Oracle Problem 3 (MEDIUM): WebSocket Path Mismatch

Type: Missing Information / Implementation Inconsistency

  • Evidence: The mobile app's ws.service.ts constructs the WebSocket URL as /ws/sensing (line 104), while constants/websocket.ts defines WS_PATH = '/api/v1/stream/pose'. The API server serves WebSocket on /api/v1/stream/pose (stream router). These paths do not match.
  • Impact: The actual connection behavior depends on which path the sensing server uses (the lightweight Axum server may use /ws/sensing), but the inconsistency creates confusion and potential silent connection failures
  • Resolution: Align the WebSocket paths across the mobile app and server, or make the path configurable

11. Prioritized Recommendations

Priority 1 -- Critical (address before next release)

#RecommendationEffortImpactPersona
1.1Add auth-gated API documentation endpoint for productionLowHighDeveloper, Operator
1.2Resolve WebSocket path mismatch between ws.service.ts and constants/websocket.tsLowHighEnd-User
1.3Disable automatic simulation fallback on MAT screen (safety-critical)LowHighEnd-User, Operator
1.4Fix MainTabs.tsx inline arrow function causing unnecessary re-renders (line 130)LowMediumEnd-User
1.5Include structured error body in 429 rate limit responses using ErrorResponse formatLowMediumDeveloper

Priority 2 -- High (next sprint)

#RecommendationEffortImpactPersona
2.1Add wifi-densepose init command to scaffold default configurationMediumHighOperator
2.2Change default mobile serverUrl from localhost:3000 to empty string with first-run setup promptMediumHighEnd-User
2.3Add terminal capability detection to CLI for emoji/unicode fallbackMediumMediumOperator
2.4Add calibration progress WebSocket stream or polling endpoint with step-by-step updatesMediumMediumOperator, Developer
2.5Create a CONTRIBUTING.md with quickstart for each codebaseMediumHighDeveloper
2.6Map ErrorBoundary error messages to user-friendly stringsLowMediumEnd-User
2.7Add loading timeout to LiveScreen WebView initializationLowMediumEnd-User

Priority 3 -- Medium (next quarter)

#RecommendationEffortImpactPersona
3.1Create unified Makefile or Taskfile for cross-codebase builds and testsHighHighDeveloper
3.2Add --port auto to provisioning script with serial port auto-detectionMediumMediumOperator
3.3Add accessibility labels to mobile app interactive componentsMediumMediumEnd-User
3.4Create architecture diagram showing component interactionsMediumHighDeveloper
3.5Add .env.example file documenting all environment variablesLowMediumDeveloper, Operator
3.6Implement wifi-densepose doctor for self-diagnosisHighMediumOperator
3.7Add wifi-densepose logs command with filtering and formattingMediumMediumOperator
3.8Persist poseStore RSSI history for post-restart analysisMediumLowEnd-User
3.9Add provisioning parameter presets (--profile basic/mesh/edge)MediumMediumOperator
3.10Authenticate WebSocket before websocket.accept()LowLowDeveloper

12. Heuristic Scoring Summary

Problem Analysis (H1)

HeuristicScoreFinding
H1.1: Understand the Problem75/100The system addresses WiFi-based pose estimation well but the quality experience varies significantly across touchpoints. The core problem (sensing and display) is well-solved; the surrounding experience (setup, configuration, debugging) needs work.
H1.2: Identify Stakeholders70/100Three personas (developer, operator, end-user) are implicitly served but not explicitly designed for. The mobile app targets end-users well; the CLI targets operators adequately; developer experience is the weakest.
H1.3: Define Quality Criteria65/100Health checks define "healthy/degraded/unhealthy" but no SLA or quality thresholds are documented. Rate limits are configurable but default values are not justified.
H1.4: Map Failure Modes72/100Database failsafe, Redis degradation, and WebSocket reconnection cover major failure modes. Missing: calibration failure recovery, firmware flash failure recovery, mobile app state corruption.

User Needs (H2)

HeuristicScoreFinding
H2.1: Task Completion78/100Core tasks (view live data, check vitals, manage zones) are completable. Setup tasks (install, configure, provision) have friction.
H2.2: Error Recovery68/100Some automated recovery (database failsafe, WebSocket reconnect). Missing recovery paths for calibration failure and firmware issues.
H2.3: Learning Curve60/100Steep onboarding across four codebases. No quickstart guide. Mobile app is the most intuitive touchpoint.
H2.4: Feedback Clarity72/100API provides structured feedback. CLI provides log-style feedback. Mobile provides visual feedback. Calibration progress is the biggest gap.
H2.5: Consistency70/100Error formats differ between API (JSON) and CLI (logger). Mobile is internally consistent. Naming conventions mostly aligned.

Business Needs (H3)

HeuristicScoreFinding
H3.1: Reliability76/100Health checks, failsafes, and reconnection strategies demonstrate reliability focus. No documented SLAs or uptime targets.
H3.2: Security Posture72/100Authentication framework exists but JWT validation is not implemented. Rate limiting is configurable. Production docs are hidden. Secrets redacted in config output.
H3.3: Scalability68/100Multi-worker support, WebSocket connection management, per-endpoint rate limiting. No load testing results or capacity planning documented.
H3.4: Maintainability74/100Well-separated crates, clear module boundaries, typed interfaces. Pre-merge checklist ensures documentation updates. ADR process is mature.

Balance (H4)

HeuristicScoreFinding
H4.1: UX vs Security65/100Production API docs disabled for security, but no alternative provided. Authentication errors are informative without leaking implementation details.
H4.2: Simplicity vs Capability68/100Provisioning script has 22 parameters. CLI has good grouping but missing convenience features. API has comprehensive endpoints.
H4.3: Consistency vs Flexibility72/100Error handling is structured but not uniform across touchpoints. Settings are flexible (env vars + config file + CLI flags).

Impact (H5)

HeuristicScoreFinding
H5.1: Visible Impact (GUI/UX)76/100Mobile app provides clear visual states. CLI status output is detailed. API responses are informative.
H5.2: Invisible Impact (Performance)70/100cpu_percent(interval=1) in health check blocks for 1 second per request. Rate limiting uses async locks correctly. RingBuffer prevents memory growth.
H5.3: Safety Impact62/100MAT screen auto-simulation is a safety concern. Simulated vitals data could mislead operators. No data provenance indicator beyond the connection banner.
H5.4: Data Integrity72/100Pydantic validation on all inputs. Zone ID existence checks. Time range validation on historical queries. Deterministic proof verification for core pipeline.

Creativity (H6)

HeuristicScoreFinding
H6.1: Novel Testing Approaches68/100Witness bundle verification is creative. Deterministic proof with SHA-256 is strong. No mutation testing or property-based testing.
H6.2: Alternative Perspectives65/100The simulation fallback is creative but creates oracle problems. Database failsafe is a pragmatic solution.
H6.3: Cross-Domain Insights70/100WiFi CSI for pose estimation is inherently cross-domain (RF + computer vision + IoT). The mobile app's GaussianSplat visualization is innovative.

Methodology

This Quality Experience analysis was performed by examining source code across all touchpoints of the WiFi-DensePose system. Files analyzed include:

API Layer (9 files):

  • archive/v1/src/api/main.py -- FastAPI application setup, middleware configuration, exception handlers
  • archive/v1/src/api/routers/health.py -- Health check endpoints
  • archive/v1/src/api/routers/pose.py -- Pose estimation endpoints
  • archive/v1/src/api/routers/stream.py -- WebSocket streaming endpoints
  • archive/v1/src/api/websocket/connection_manager.py -- WebSocket connection lifecycle
  • archive/v1/src/api/dependencies.py -- Dependency injection, authentication, authorization
  • archive/v1/src/middleware/error_handler.py -- Error handling middleware
  • archive/v1/src/middleware/rate_limit.py -- Rate limiting middleware

CLI Layer (4 files):

  • archive/v1/src/cli.py -- Click CLI entry point
  • archive/v1/src/commands/start.py -- Server start command
  • archive/v1/src/commands/stop.py -- Server stop command
  • archive/v1/src/commands/status.py -- Server status command

Mobile Layer (15 files):

  • ui/mobile/src/screens/LiveScreen/index.tsx -- Live visualization screen
  • ui/mobile/src/screens/VitalsScreen/index.tsx -- Vitals monitoring screen
  • ui/mobile/src/screens/ZonesScreen/index.tsx -- Zone occupancy screen
  • ui/mobile/src/screens/MATScreen/index.tsx -- Mass casualty assessment screen
  • ui/mobile/src/screens/SettingsScreen/index.tsx -- Settings screen
  • ui/mobile/src/screens/SettingsScreen/ServerUrlInput.tsx -- Server URL configuration
  • ui/mobile/src/navigation/MainTabs.tsx -- Tab navigation
  • ui/mobile/src/components/ErrorBoundary.tsx -- Error boundary
  • ui/mobile/src/components/ConnectionBanner.tsx -- Connection status banner
  • ui/mobile/src/components/LoadingSpinner.tsx -- Loading indicator
  • ui/mobile/src/services/ws.service.ts -- WebSocket service
  • ui/mobile/src/services/api.service.ts -- HTTP API service
  • ui/mobile/src/stores/poseStore.ts -- Real-time data store
  • ui/mobile/src/stores/settingsStore.ts -- Persisted settings store
  • ui/mobile/src/utils/urlValidator.ts -- URL validation
  • ui/mobile/src/hooks/usePoseStream.ts -- Pose data stream hook
  • ui/mobile/src/constants/websocket.ts -- WebSocket constants

Hardware Layer (1 file):

  • firmware/esp32-csi-node/provision.py -- ESP32 provisioning script

The analysis applied 23 QX heuristics across 6 categories (Problem Analysis, User Needs, Business Needs, Balance, Impact, Creativity) and identified 3 oracle problems where quality criteria conflict across stakeholders.