Back to Netbird

management/handlers + wiring — HTTP API + gRPC delivery

docs/agent-networks/modules/22-management-handlers-wiring.md

0.74.015.9 KB
Original Source

management/handlers + wiring — HTTP API + gRPC delivery

Risk level: Medium — the surface is mostly additive, but two changes are load-bearing: injectAllProxyPolicies runs on every per-peer compute, and shallowCloneMapping must round-trip Private (a missed field silently breaks every MODIFIED). Backward-compat impact: Additive on the wire (new routes, new RPCs, new proto fields, new gorm column on AccessLogEntry). One management-internal break: nbhttp.NewAPIHandler gains a trailing agentNetworkManager parameter; nil is tolerated and silently skips route registration.

Module boundary

This module is the seam between the public Agent Network HTTP API and the proxy fleet that serves agent traffic. North side: a /api/agent-network/* surface (providers, policies, guardrails, budget rules, settings, consumption) on the existing gorilla router, delegating to agentnetwork.Manager. Handlers are thin — they translate api.*types.*, validate shape, forward. RBAC and event emission stay inside the manager (manager.go:680-682).

South side: ProxyServiceServer (proxy.go) learns to (a) ship synth services to a proxy on initial snapshot, (b) resolve agent-network domains in getServiceByDomain for OIDC/session/tunnel-peer flows, (c) gate LLM requests via CheckLLMPolicyLimits + RecordLLMUsage, (d) preserve Private through shallowCloneMapping so per-proxy live updates don't silently flip services public. The network_map controller prepends synth services to account.Services on every per-peer compute; accesslogentry.go gains an indexed AgentNetwork column so the dashboard can filter cheaply.

Files

PathRole
handlers/agentnetwork/providers_handler.goCatalog + provider CRUD + central AddEndpoints
handlers/agentnetwork/policies_handler.goPolicy CRUD + shared validatePolicy*
handlers/agentnetwork/guardrails_handler.goGuardrail CRUD
handlers/agentnetwork/budget_handler.goAccount-level budget rule CRUD
handlers/agentnetwork/settings_handler.goGET (200+null if unbootstrapped) + PUT toggles
handlers/agentnetwork/consumption_handler.goRead-only consumption rows
handlers/agentnetwork/handlers_test.goReal-store fixture; wire round-trip + validation
handlers/agentnetwork/budget_handler_test.goBudget-rule + settings toggles
server/http/handler.goNew agentNetworkManager arg; conditional AddEndpoints
server/permissions/modules/module.goNew AgentNetwork module key
internals/server/boot.goWires synthesiser adapter + limits service into proxy server
internals/server/modules.goAgentNetworkManager() lazy-create node
internals/controllers/network_map/controller/controller.goinjectAllProxyPolicies replaces 4 InjectProxyPolicies calls
internals/controllers/network_map/controller/repository.goSynthesizeAgentNetworkServices repo method
internals/modules/reverseproxy/service/service.goMiddlewareConfig, capture limits, AgentNetwork, DisableAccessLog + proto
internals/modules/reverseproxy/accesslogs/accesslogentry.goIndexed AgentNetwork bool from proto
internals/shared/grpc/proxy.goSynth wiring, 2 RPCs, domain fallback, Private in clone
internals/shared/grpc/proxy_clone_test.goLocks every ProxyMapping field minus AuthToken
server/activity/codes.go13 new activity codes (125-137)

HTTP routes added

All routes inherit the platform's auth middleware. Perms enforced inside agentnetwork.Manager.requirePermission (manager.go:680-682) on modules.AgentNetwork. Permission column shows the op passed to requirePermission — read = Read, etc.

MethodPathPermHandler
GET/agent-network/catalog/providersauthn onlyproviders_handler.go:43
GET/agent-network/providersreadproviders_handler.go:57
POST/agent-network/providerscreateproviders_handler.go:97
GET/agent-network/providers/{providerId}readproviders_handler.go:77
PUT/agent-network/providers/{providerId}updateproviders_handler.go:132
DELETE/agent-network/providers/{providerId}deleteproviders_handler.go:172
GET/agent-network/policiesreadpolicies_handler.go:32
POST/agent-network/policiescreatepolicies_handler.go:72
GET/agent-network/policies/{policyId}readpolicies_handler.go:52
PUT/agent-network/policies/{policyId}updatepolicies_handler.go:102
DELETE/agent-network/policies/{policyId}deletepolicies_handler.go:142
GET/agent-network/guardrailsreadguardrails_handler.go:25
POST/agent-network/guardrailscreateguardrails_handler.go:65
GET/agent-network/guardrails/{guardrailId}readguardrails_handler.go:45
PUT/agent-network/guardrails/{guardrailId}updateguardrails_handler.go:95
DELETE/agent-network/guardrails/{guardrailId}deleteguardrails_handler.go:135
GET/agent-network/budget-rulesreadbudget_handler.go:24
POST/agent-network/budget-rulescreatebudget_handler.go:64
GET/agent-network/budget-rules/{ruleId}readbudget_handler.go:44
PUT/agent-network/budget-rules/{ruleId}updatebudget_handler.go:95
DELETE/agent-network/budget-rules/{ruleId}deletebudget_handler.go:135
GET/agent-network/settingsreadsettings_handler.go:53 (200+null if no row)
PUT/agent-network/settingsupdatesettings_handler.go:27
GET/agent-network/consumptionreadconsumption_handler.go:21

gRPC RPCs added (or modified)

RPCDirectionTrigger
CheckLLMPolicyLimitsproxy→mgmt unaryPre-flight gate; returns allow/deny, selected policy, attribution group, window, deny code+reason (proxy.go:259-301). Unimplemented when limits service is nil.
RecordLLMUsageproxy→mgmt unaryPost-flight write of tokens+cost against policy-window dimensions + every applicable account budget rule (proxy.go:303-349). window_seconds==0 ⇒ no policy cap, only account fan-out runs.
GetMappingUpdate/SendServiceUpdate (stream)mgmt→proxySnapshot (proxy.go:752-780) now appends SynthesizeServicesForCluster. Live updates use SendServiceUpdateToCluster + shallowCloneMapping.

Architecture & flow

HTTP request lifecycle

mermaid
sequenceDiagram
    participant DB as Dashboard
    participant R as gorilla.Router (/api)
    participant H as handler (agentnetwork)
    participant M as agentnetwork.Manager
    participant S as store.Store
    participant AM as accountManager (StoreEvent)

    DB->>R: POST /api/agent-network/providers
    R->>H: createProvider (auth mw sets UserAuth)
    H->>H: GetUserAuthFromContext + validate(req)
    H->>M: CreateProvider(userID, provider, bootstrapCluster)
    M->>M: requirePermission(AgentNetwork, Create)
    M->>S: SaveAgentNetworkProvider
    M->>AM: StoreEvent(AgentNetworkProviderCreated)
    M-->>H: created provider
    H-->>DB: 200 + api.AgentNetworkProvider JSON

Synth-service delivery via gRPC

mermaid
sequenceDiagram
    participant P as Proxy
    participant G as ProxyServiceServer
    participant SM as service.Manager (persisted)
    participant SA as synthesizerAdapter
    participant AN as SynthesizeServicesForCluster
    participant ST as store.Store

    Note over P,G: Initial snapshot
    P->>G: GetMappingUpdate (stream open)
    G->>SM: GetServicesForCluster(conn.address)
    SM-->>G: persisted []*Service
    G->>SA: SynthesizeServicesForCluster(conn.address)
    SA->>AN: SynthesizeServicesForCluster(store, clusterAddr)
    AN->>ST: walk every account; read providers/policies/settings
    AN-->>SA: in-memory []*Service
    SA-->>G: []*Service
    G->>P: response (persisted + synth)

    Note over G,P: Per-request live update
    G->>G: SendServiceUpdateToCluster(update, clusterAddr)
    G->>G: shallowCloneMapping(update)   %% Private MUST survive
    G->>P: response with single mapping

End-to-end: HTTP write persists rows and emits an activity event; the manager then triggers proxyController.SendServiceUpdate so proxies re-render. The snapshot path is the only one that calls into the synthesiser — on stream open it pulls persisted services then appends synth services for the cluster. Synth services are never persisted. For OIDC/session/tunnel-peer flows, getServiceByDomain falls back to SynthesizeServicesForCluster(clusterFromDomain(domain)) when persisted lookup misses (proxy.go:1763-1793). The network_map contribution is orthogonal: per-peer compute prepends the same synth services to account.Services before InjectProxyPolicies.

Permissions model added

  • permissions/modules/module.go:22 adds AgentNetwork Module = "agent_network", registered in All (module.go:42). Standard operations.{Read,Create,Update,Delete} matrix.
  • Handlers don't call permissionsManager directly — they extract UserAuth and delegate to agentnetwork.Manager, which gates every mutation through requirePermission (manager.go:168, 308, 549, etc.). Confirm your role-set provider has agent_network rows for owner/admin/user/billing-admin before merging.
  • getCatalogProviders (providers_handler.go:43) intentionally skips RBAC — catalog is global static data.

Activity codes added

activity/codes.go:244-274 adds Activities 125-137 + string/code mappings (codes.go:428-444), following <domain>.<resource>.<action> (e.g., agent_network.provider.create). Audit-log exporters / SIEM forwarders need to know the new codes.

Invariants

  • Synth services are never persisted. Snapshot appends after serviceManager.GetServicesForCluster (proxy.go:761-770); network_map prepends before InjectProxyPolicies (controller.go:117-126).
  • shallowCloneMapping must round-trip every ProxyMapping field except AuthTokenproxy_clone_test.go:50-58 enforces via gproto.Equal. The bug it guards: a missing Private made every MODIFIED arrive private=false, the proxy skipped ValidateTunnelPeer, UserGroups stayed empty, llm_router denied no_authorised_provider; a restart "fixed" it because the snapshot uses the original mapping.
  • Limit-window floor is 60s (policies_handler.go:189-220); enabled cap with both per-group and per-user at zero is rejected. Budget rules reuse the same validator (budget_handler.go:170).
  • Manager is optional at boot. NewAPIHandler registers routes only when non-nil (handler.go:129); ProxyServiceServer returns Unimplemented from both RPCs when limits service is unwired (proxy.go:262-265, 306-309).
  • Settings GET on an unbootstrapped account returns 200 + null (settings_handler.go:65-72) — not 404.

Things to scrutinize

Correctness

  • injectAllProxyPolicies runs on every per-peer compute: controller.go:163, 309, 415, 681. sendUpdateAccountPeers is the target of the buffered fan-out — synth runs once per debounced account-update tick and once per direct UpdateAccountPeer. Cost is O(providers + policies × users-per-group) per account under LockingStrengthNone. No per-account synth cache — verify it fits the buffer interval for your largest tenant.
  • clusterFromDomain strips at the first . (proxy.go:1784-1792). A zero-dot domain returns "" and the synth call walks every account. Confirm no path reaches this with a malformed/internal domain.
  • Account-budget RecordConsumption fans out even when window_seconds == 0 (proxy.go:341-348) — intentional. Verify the proxy never sends RecordLLMUsage for a request that wasn't actually allowed.

Security

  • Every handler extracts UserAuth via nbcontext.GetUserAuthFromContext before any work. Routes live behind the standard /api mux; bypass list is not extended.
  • CheckLLMPolicyLimits / RecordLLMUsage ride the existing proxy → mgmt gRPC connection auth. No additional token check inside the RPCs — they trust the connection. Confirm the proxy-side token-verification interceptor in this package gates both.
  • RecordLLMUsage only validates account_id != "" (proxy.go:317-319). A compromised proxy can attribute cost to any account in its cluster — was already true for prior RPCs but is louder now that data drives denials.

Concurrency

  • SetAgentNetworkSynthesizer / SetAgentNetworkLimitsService write under s.mu.Lock; read paths copy the interface under read lock (proxy.go:236-247, 260-263, 304-307). Same pattern as existing serviceManager/proxyController setters.
  • Manager writes use LockingStrengthUpdate; synth reads use LockingStrengthNone — read-after-write via the proxy snapshot can observe a stale view by up to one fan-out tick.
  • Network_map controller is single-threaded per account; cross-account is parallel.

Backward compatibility

  • proxy_clone_test.go is the regression net; any new ProxyMapping field must be cloned or explicitly nulled in the test.
  • AccessLogEntry adds indexed AgentNetwork bool — implicit AutoMigrate; deploy story must handle table-rewrite cost on high-volume access-log tables.
  • TargetOptions gains seven omitempty JSON fields (service.go:69-94); on-wire shape stays compatible. targetOptionsToProto tests all fields when deciding nil (service.go:551-556).
  • NewAPIHandler signature changes — every caller must pass agentNetworkManager; nil is supported.

Observability

  • 13 new activity codes via accountManager.StoreEvent in the manager — confirm dashboard's audit-log UI maps them.
  • AccessLogEntry.AgentNetwork is indexed for the dashboard's agent-network log filter.
  • New RPCs log at error level on store/selector failures (proxy.go:284, 327, 332, 348). Snapshot synth failures degrade to warnings — stream is not aborted (proxy.go:765).

Test coverage

TestLocks down
handlers_test.go::TestPolicyHandler_WindowSecondsRoundTripGET carries window_seconds; legacy window_hours/window_days absent.
handlers_test.go::TestPolicyHandler_RejectsSubMinuteWindowPOST <60s returns 4xx.
handlers_test.go::TestConsumptionHandler_EmptyAccountReturnsArray/consumption returns [] — never null.
handlers_test.go::TestConsumptionHandler_PopulatedAccountListsRowsRecordConsumption×2 surfaces both with correct tokens/cost/window.
budget_handler_test.go::TestBudgetRuleHandler_RoundTripTargets + PolicyLimits shape round-trip.
budget_handler_test.go::TestBudgetRuleHandler_ListReturnsArrayEmpty-list shape.
budget_handler_test.go::TestBudgetRuleHandler_{RejectsMissingName,RejectsSubMinuteWindow}Validation rejections are 4xx.
budget_handler_test.go::TestSettingsHandler_GetExposesCollectionTogglesAll four toggles + computed Endpoint.
proxy_clone_test.go::TestShallowCloneMapping_PreservesAllFieldsExceptAuthTokenFuture-proofs clone; every field round-trips, AuthToken dropped.

Handler tests use a real sqlite store + real manager + always-allow permissions mock (handlers_test.go:53-75). Create/update/delete success paths flow through accountManager.StoreEvent which the fixture doesn't wire — covered by manager-level no-mock tests outside this module.

Known limitations / explicit non-goals

  • No pagination on any list endpoint; no bulk endpoints.
  • Synth result is not cached — every snapshot and every per-peer compute repeats the store walk.
  • getSettings returning 200 + null is a deliberate dashboard concession.
  • No rate-limiting beyond the global /api rate limiter.

Cross-references