docs/adr/ADR-102-edge-module-registry.md
The Cognitum app ecosystem publishes a canonical app store catalog at:
https://storage.googleapis.com/cognitum-apps/app-registry.json
As of v2.1.0 (2026-05-13) the registry advertises 105 cogs across 11 categories (health, security, building, retail, industrial, research, ai, swarm, signal, network, developer). Each entry carries id, name, category, version, description, size_kb, difficulty, sha256, binary_size, and a config[] schema describing the runtime parameters the appliance offers when installing the cog.
RuView today has no live awareness of this catalog. The README.md capability table is hand-curated; the UI surfaces only the capabilities the dashboard's HTML knows about; nothing in wifi-densepose-sensing-server references the registry. Result: when Cognitum ships a new cog (the registry was last updated 6 days ago — a fast cadence), RuView stays unaware until someone manually edits the README. Customers running the RuView dashboard against a real appliance see a 10-capability bag in the UI while the appliance is actually capable of installing 105 cogs.
Today's [email protected] release (PRs #642 / #643, ADR-100, ADR-101) is the first cog this repo ships to that registry. We need the discovery side to match.
wifi-densepose-sensing-server will fetch app-registry.json on demand, cache it in process memory with a TTL, and serve it back through a new endpoint:
GET /api/v1/edge/registry
GET /api/v1/edge/registry?refresh=1 (force-bypass cache, log if abused)
The registry is passively surfaced, not modified. RuView is a presentation layer for the canonical Cognitum catalog; it never re-signs entries or re-hosts binaries.
v2/crates/wifi-densepose-sensing-server/src/edge_registry.rs — small, ~150 lines.
pub struct EdgeRegistry {
cached: RwLock<Option<CachedEntry>>,
ttl: Duration,
upstream_url: String,
}
struct CachedEntry {
payload: serde_json::Value,
fetched_at: Instant,
upstream_sha256: String,
}
Cache semantics:
?refresh=1 bypasses the cache but writes a debug log so accidental abuse is visible.stale: true marker in the response and a 200 status (preserve UI), not a 5xx.{
"fetched_at": 1779200000, // server-side fetch timestamp
"ttl_seconds": 3600,
"stale": false, // true when serving past TTL because upstream is down
"upstream_url": "https://storage.googleapis.com/cognitum-apps/app-registry.json",
"upstream_sha256": "<sha256-of-payload-bytes>",
"registry": { /* full canonical JSON as returned upstream */ }
}
The registry field is the upstream JSON inlined verbatim so consumers don't need to make a second hop. upstream_sha256 lets a paranoid consumer compare against a pinned hash.
binary_sha256 + binary_signature (ADR-100) are the trust roots for installs. The registry itself is not signed today.New page ui/edge-modules.html renders the registry into category sections with cog cards. Each card links out to the Cognitum V0 appliance's /cogs page (http://cognitum-v0:9000/cogs#<id>) for the install action — RuView itself never installs.
The existing dashboard's "Capabilities" section continues to show RuView-native sensing capabilities (presence, breathing, pose, etc. — the things RuView itself runs); the new edge-modules page shows the broader Cognitum cog catalog. The two are distinct surfaces and shouldn't be merged.
| Scenario | Behaviour |
|---|---|
| Upstream returns 200 with valid JSON | Cache it, return it. |
| Upstream returns 200 with invalid JSON | Treat as failure; serve stale if available else 503. Log the upstream sha + the parse error. |
| Upstream returns 4xx / 5xx | Same as JSON-invalid: serve stale if available else 503. |
| TLS / DNS / timeout error | Same. |
| Upstream is permanently moved | Operator updates the upstream_url config (CLI flag added). No code change required to migrate registries. |
--edge-registry-url <URL> — override the default (default: https://storage.googleapis.com/cognitum-apps/app-registry.json)--edge-registry-ttl-secs <N> — override the cache TTL (default: 3600)--no-edge-registry — disable the endpoint entirely (returns 404). For air-gapped deployments.--no-edge-registry.stale: true + fetched_at in the response so the UI can render a "registry possibly out of date" badge.cognitum-apps is deleted or replaced, the endpoint goes dark. The --edge-registry-url flag lets operators repoint without a code change. Long-term, RuView could mirror the registry into its own GCS bucket if the relationship requires it.A real review of the attack surface this endpoint introduces.
| # | Threat | Mitigation in this ADR |
|---|---|---|
| T1 | SSRF — operator-supplied --edge-registry-url redirects fetches to an internal target | Flag is operator-only (CLI / env) — there is no API endpoint to mutate it at runtime. Operators are already trusted (they control the binary). |
| T2 | Outbound dependency reveals deployment — a passive observer of the egress sees the appliance phoning home to GCS | Documented in the docstring + the runtime startup log. Operators wanting offline deployments use --no-edge-registry. |
| T3 | Malicious upstream registry — Cognitum's GCS bucket is breached and a poisoned app-registry.json is served | Two layers absorb this: (a) the registry's role is discovery only — installs verify the per-cog binary_sha256 + binary_signature (ADR-100); a wrong description string can mislead a human, but a wrong binary still has to pass Ed25519 against COGNITUM_OWNER_SIGNING_KEY. (b) The endpoint exposes upstream_sha256 so a paranoid operator can pin the expected registry hash externally and alert on drift. |
| T4 | Response inflation — upstream returns a multi-GB payload to exhaust memory | MAX_PAYLOAD_BYTES = 8 MiB cap (current registry is ~50–200 KB). Exceeding cap returns an error without buffering past the cap. |
| T5 | Slow upstream blocking server threads — Slowloris-style stall on the fetch | 10-second wire timeout via ureq::AgentBuilder. Per-handler fetch runs inside tokio::task::spawn_blocking so a stalled fetch never blocks the async runtime. |
| T6 | Denial via ?refresh=1 abuse — unauthenticated callers force-bypass the cache repeatedly | Cache lives in process; ?refresh=1 triggers a single upstream fetch behind a synchronous code path. A flood of refresh requests is rate-limited by the upstream's own throttling (GCS) and locally serialised by Rust's RwLock. Refresh requests are logged at debug so abuse is visible. Follow-up: add per-IP rate-limit middleware if seen abused (separate PR; tracked in #574-style follow-up). |
| T7 | JSON deserialisation panics — malformed registry triggers a Rust panic | Payload is parsed as serde_json::Value (opaque untyped tree) — never coerced into a strongly-typed struct that could panic. Failure is propagated as FetcherError::Network which the handler maps to 503. |
| T8 | Stale-on-error masks outages from operators | Response carries stale: true + fetched_at (unix timestamp). UI rendering MUST surface this badge — encoded as an explicit field, not an implicit silence. |
| T9 | TLS downgrade / MITM on the fetch | ureq is built with the tls feature (rustls) by default. No --insecure flag exists. If the upstream uses LetsEncrypt the cert chain is system-trusted; certificate pinning is out of scope (would block the bucket from rotating certs). |
| T10 | Unauthenticated access exposes ‘what cogs exist’ | The registry is canonical-public information (already public-read on GCS via anonymous HTTP GET). Surfacing it on a local LAN HTTP API does not increase its disclosure. The endpoint stays under the project's existing RUVIEW_API_TOKEN Bearer auth — when set, the registry is gated like other /api/v1/* routes. |
| T11 | Configuration injection via env var — RUVIEW_EDGE_REGISTRY_URL set to a malicious URL by an attacker who controls the process environment | If an attacker controls the env, they own the process; this is not a new threat surface. Documented in the CLI help. |
| T12 | Cache mutation across threads / poisoning | The cache is RwLock<Option<CachedEntry>>. Writes go through cached.write() once per fetch. Snapshot reads clone() the CachedEntry (cheap — Value is reference-counted internally for large strings) so concurrent readers don't share mutable state. Tests cover the multi-call path; no unsafe is used. |
.json.sig), a follow-up ADR will require it. Today the per-cog binary signature (ADR-100) is the actual trust root for installs; the registry is metadata.?refresh=1 — relies on the upstream's own throttling. If we see abuse we'll add a token-bucket middleware; not needed for v0.0.1.| Test | What it verifies |
|---|---|
first_call_hits_upstream_and_caches | Single fetch, then cache hit |
ttl_expiry_triggers_refetch | Cache TTL bound respected |
force_refresh_bypasses_fresh_cache | ?refresh=1 semantics |
stale_serve_on_upstream_failure_after_cached_success | T8 explicit (stale: true returned) |
no_cache_no_upstream_returns_error | T3/T5 — error propagated cleanly when nothing to fall back on |
upstream_invalid_json_is_treated_as_error | T7 — malformed payload doesn't panic |
upstream_sha256_is_deterministic | T3 — hash field is reliable for external pinning |
All 7 tests in src/edge_registry.rs::tests pass.
ui/edge-modules.html and link from index.html.README.md and replace with a small "see the appliance for the full catalog" pointer.docs/benchmarks/pose-estimation-cog.md: the per-cog benchmark format this ADR's response shape complements.