docs/agent-networks/modules/21-management-agentnetwork.md
Risk level: High — central business logic + budget enforcement + the source of every middleware-chain change the proxy executes. Backward-compat impact: Additive within the agent-network surface; one behavioural difference for opted-out accounts in parser capture (the capture flag is stamped explicitly false instead of being absent — see capture-pointer semantics below). Non-agent-network proxy services are untouched (the synth chain only ships on
agent-net-svc-*targets).
management/server/agentnetwork owns every agent-network entity (providers, policies, guardrails, account budget rules, per-account settings, consumption rows) and translates them into the in-memory *rpservice.Service that the reverse-proxy controller turns into proto.ProxyMappings and pushes to clusters. It is the only writer of the agent-network middleware chain.
Inside the package: manager.go is the CRUD + permissions-gated facade; synthesizer.go walks settings + providers + policies + guardrails and emits the per-account service plus every middleware's JSON config; policyselect.go runs per-request attribution (min-wins account ceiling, then "drain bigger pool first"); reconcile.go diffs successive synth outputs and emits precise Create/Update/Delete proxy-mapping updates plus a peer-map refresh. labelgen/ mints DNS-safe subdomain labels; catalog/ is the static provider catalogue; types/ carries gorm entity structs. The _realstack_test.go files in the parent management/server/ directory exercise the manager + network-map controller end-to-end with no mocks.
| Path | Role |
|---|---|
agentnetwork/manager.go | Manager interface + CRUD + permission gates + bootstrap-settings + reconcile trigger |
agentnetwork/synthesizer.go | Settings/policy → wire-format synthesis; sole writer of the proxy middleware chain |
agentnetwork/policyselect.go | Per-request policy attribution + account-budget ceiling (min-wins) |
agentnetwork/reconcile.go | Per-account synth diff vs in-memory cache → Create/Update/Delete |
agentnetwork/catalog/catalog.go | Static provider catalogue (auth headers, identity-injection shapes) |
agentnetwork/labelgen/{labelgen,words}.go | DNS-safe subdomain picker + curated wordlist |
agentnetwork/types/provider.go | Provider entity + APIKey + Models + ExtraValues + SessionKeys |
agentnetwork/types/policy.go | Policy entity + PolicyLimits (token + budget) |
agentnetwork/types/guardrail.go | Guardrail entity (ModelAllowlist, PromptCapture) |
agentnetwork/types/budgetrule.go | AccountBudgetRule (reuses PolicyLimits) |
agentnetwork/types/settings.go | Per-account Settings (Cluster, Subdomain, 3 toggles) |
agentnetwork/types/consumption.go | Consumption row + WindowStart aligner |
agentnetwork/{synthesizer,policyselect,reconcile,wire_shape}_*test.go | See test coverage table |
agentnetwork/types/consumption_test.go | WindowStart alignment proofs |
agentnetwork/labelgen/labelgen_test.go | Deterministic picks + exhaustion + fallback |
management/server/agentnetwork_realstack_test.go | No-mock provider CRUD → network-map fan-out |
management/server/agentnetwork_budgetrule_realstack_test.go | No-mock budget-rule CRUD + settings preserve-immutable |
flowchart TD
A[Mutation: provider/policy/guardrail/settings] --> B[managerImpl.reconcile accountID]
B --> C{proxyController nil?}
C -- yes --> D[accountManager.UpdateAccountPeers only]
C -- no --> E[SynthesizeServices]
E --> F[loadSettings — NotFound returns ok=false, no synth]
F --> G[filterEnabledProviders sorted by CreatedAt]
G --> H[filterEnabledPolicies]
H --> I[backfillProviderSessionKeys if missing]
I --> J[indexProviderGroups: providerID -> sorted source groups]
J --> K[buildRouterConfigJSON drops orphan providers]
J --> L[buildIdentityInjectConfigJSON per catalog entry]
H --> M[mergeGuardrails: union allowlist, OR redact]
M --> N[applyAccountCollectionControls account toggle = SOLE capture control]
N --> O[marshalGuardrailConfig]
K --> P[buildMiddlewareChain 8 middleware entries]
L --> P
O --> P
P --> Q[buildAccountService: AccessGroups=union source groups, noop.invalid target]
Q --> R[reconcile.diffMappings vs cache]
R --> S[SendServiceUpdateToCluster CREATE/MODIFY/REMOVE]
R --> T[accountManager.UpdateAccountPeers — fans synth ACLs into network map]
flowchart TD
A[SelectPolicyForRequest in] --> B[checkAccountBudget — runs FIRST, independent of policies]
B --> C[GetAccountAgentNetworkBudgetRules]
C --> D{for each enabled rule}
D --> E{budgetRuleApplies?}
E -- no --> D
E -- yes --> F[attrGroup = lowestIntersect TargetGroups, in.GroupIDs]
F --> G{Token cap enabled?}
G -- yes --> H[evalTokenCap user dim + group dim]
H --> I{exhausted?}
I -- yes --> J[DENY: llm_account.token_cap_exceeded - STOP]
I -- no --> K{Budget cap enabled?}
G -- no --> K
K -- yes --> L[evalBudgetCap user dim + group dim]
L --> M{exhausted?}
M -- yes --> N[DENY: llm_account.budget_cap_exceeded - STOP]
M -- no --> D
K -- no --> D
D --> O[All rules passed -> fall through to per-policy selection]
Key invariant: rules are checked sequentially and ANY exhausted rule denies (all-must-pass / min-wins). Untargeted rules (len(TargetGroups)==0 && len(TargetUsers)==0) apply to every caller (policyselect.go:393).
flowchart TD
A[Account-budget gate passed] --> B[GetAccountAgentNetworkPolicies]
B --> C[filterApplicablePolicies enabled + provider match + group intersect]
C --> D{candidates empty?}
D -- yes --> E[Allow, empty SelectedPolicyID]
D -- no --> F[scoreCandidates -> scoreOne per policy]
F --> G[scoreOne: attrGroup + window]
G --> H{any cap exhausted?}
H -- yes --> I[Drop policy; record last deny code]
H -- no --> K[Keep as live candidate]
F --> L{live candidates exist?}
L -- no --> M[Deny with last exhaustion code]
L -- yes --> N[Sort: uncapped wins -> larger group token -> group budget -> user token -> user budget -> oldest CreatedAt]
N --> O[winner = scored 0]
O --> P[Allow + SelectedPolicyID + AttributionGroupID + WindowSeconds]
End-to-end: a mutation calls managerImpl.reconcile(ctx, accountID) (manager.go:205,239,...). Reconcile defers an accountManager.UpdateAccountPeers so the network-map controller re-runs and injectAllProxyPolicies picks up the new access groups; with a proxyController wired, it re-synthesizes the service, diffs against reconcileCache[accountID] (guarded by reconcileMu), and emits proto mappings to the cluster derived from the mapping's domain (reconcile.go:120). Synthesis is stateless and idempotent. Sole persistent side effect: backfillProviderSessionKeys (synthesizer.go:249) mints ed25519 keys on legacy provider rows and writes them back.
At request time the path is independent: the proxy calls SelectPolicyForRequest (policyselect.go:56); account-budget ceiling first, then per-policy scoring. Token + budget caps share evalTokenCap / evalBudgetCap — same primitive for account rules and policy limits, label differentiates the deny reason. After a served request, RecordAccountBudgetUsage (policyselect.go:415) fans deltas to every applicable rule's distinct (dim_kind, dim_id, window) tuple, deduplicating to prevent double-count when two rules share target+window.
Manager interface (manager.go:48-80): CRUD for Providers/Policies/Guardrails/BudgetRules; GetSettings/UpdateSettings (cluster + subdomain immutable, only the three toggles mutate); ListConsumption/RecordConsumption(account, kind, dimID, windowSec, in, out, USD); RecordAccountBudgetUsage(account, user, groups, in, out, USD); SelectPolicyForRequest(ctx, PolicySelectionInput) → *PolicySelectionResult{Allow, SelectedPolicyID, AttributionGroupID, WindowSeconds, DenyCode, DenyReason}.
PolicySelectionInput (manager.go:85-90): {AccountID, UserID, GroupIDs, ProviderID} — populated by the proxy from CapturedData + llm_router resolution.
Synthesized middleware chain (synthesizer.go:576-657), order load-bearing — response slot runs reverse-of-slice:
| Slot | Idx | ID | ConfigJSON shape | CanMutate |
|---|---|---|---|---|
| on_request | 0 | llm_request_parser | {"capture_prompt": <bool>, "redact_pii"?: true} | – |
| on_request | 1 | llm_router | {"providers":[{id, models[], upstream_*, auth_header_*, allowed_group_ids[]}]} | true |
| on_request | 2 | llm_limit_check | {} | – |
| on_request | 3 | llm_identity_inject | {"providers":[{provider_id, header_pair?, json_metadata?, extra_headers?}]} | true |
| on_request | 4 | llm_guardrail | {"model_allowlist"?, "prompt_capture":{enabled,redact_pii}} | – |
| on_response | 5 | llm_limit_record | {} (runs LAST at runtime) | – |
| on_response | 6 | cost_meter | {} | – |
| on_response | 7 | llm_response_parser | {"capture_completion": <bool>, "redact_pii"?: true} | – |
Synthesized service shape (synthesizer.go:739): Mode=HTTP, Private=true, Domain=<subdomain>.<cluster>, AccessGroups=unionSourceGroups(enabledPolicies), one TargetTypeCluster target with Host=noop.invalid:443 (router rewrites per request), Options.{DirectUpstream,AgentNetwork}=true, DisableAccessLog=!settings.EnableLogCollection, CaptureMax{Req,Resp}Bytes=1<<20, CaptureContentTypes=["application/json","text/event-stream"].
checkAccountBudget, policyselect.go:353): every applicable enabled rule is checked; first exhausted cap denies. Untargeted rules bind every caller.applyAccountCollectionControls (synthesizer.go:701) sets merged.PromptCapture.Enabled = settings.EnablePromptCollection unconditionally.EnableLogCollection ↔ DisableAccessLog is the only access-log toggle (synthesizer.go:770). Default off ⇒ access log suppressed.RedactPii flows verbatim to BOTH parsers (synthesizer.go:584-585) and is OR'd into the merged guardrail (synthesizer.go:706).UpdateSettings reloads existing row and overlays only the three toggles (manager.go:558-561).synthesizer.go:351-357); skipped from identity_inject for symmetry.api_key (manager.go:175); deletion refuses while any policy still references it (manager.go:265-273).manager.go:226-228) — server-managed, copied through every UpdateProvider, never API-surfaced.*bool vs bool. Three states, owned by separate sides:
buildParserConfigJSON (synthesizer.go:678-693) always stamps the capture field. Agent-network targets ship "capture_prompt": false or "capture_prompt": true — never absent. Same for "capture_completion". The happy-path test pins {"capture_prompt":false} (synthesizer_test.go:174).*bool. Matrix:
nil (field absent) → legacy default = emit. Preserved for non-agent-network callers and pre-existing tests (the backward-compat hook).false (field present, value false) → suppress emission entirely. The behaviour for opted-out agent-network accounts. Without this, enable_log_collection=true + enable_prompt_collection=false would leak raw user input AND raw model output to the access log.true → emit normally.json.Marshal error fallback at synthesizer.go:687 degrades to {} — comment-claimed unreachable, but if ever fired re-introduces the leak. Consider fail-closed (return literal {"capture_prompt":false}) instead.scoreCandidates non-cumulative deny code. Only the last exhausted policy's deny code survives (policyselect.go:188-190). Iteration order is store's natural order. Auth signal is len(scored)==0, so this is informational only — verify no UI depends on "first exhausted policy" semantics.effectiveWindowSeconds token-wins tiebreak. When both halves are enabled with different windows, token's window wins (policyselect.go:482). Verify RecordLLMUsage increments against the winning window only.RecordAccountBudgetUsage dedup. Two rules with the same (kind, dim_id, window) would double-count without the tuples map (policyselect.go:434-449). Key includes all three dimensions — correct.synthesizer.go:794-796) or empty API key (synthesizer.go:801-803) drops the entire account's synth, not just the bad provider. Confirm matches operator UX.RedactPii = account OR guardrail (synthesizer.go:706). Parser-side flag is settings.RedactPii only, NOT the OR — a guardrail-only opt-in does not propagate to parsers. Correct because the account toggle gates capture, but worth noting on the proxy side.accountID (policyselect.go:73, 286, 298, 322, 334, 354); lowestIntersect uses caller's claimed groups only (policyselect.go:494). Risk surface is upstream (handler populates in.GroupIDs).UpdateSettings preserves immutable Cluster + Subdomain (manager.go:558). A client can't rebind the cluster.SaveAgentNetworkProvider (synthesizer.go:256) from a read-shaped call. Idempotent → worst case is a wasted write under concurrent reconcile + snapshot.reconcileMu guards reconcileCache. Lock window is narrow — compute diff inside, send outside (reconcile.go:56-68).labelRngMu guards labelRng because math/rand.Source is unsafe for concurrent use (manager.go:638-640).store.NewTestStoreFromSQL with t.TempDir() per test — no shared state, no t.Parallel().RecordAccountBudgetUsage dedup tuples map is per-call; concurrent calls fan out fully — correct (each request's tokens book once per applicable rule).UpdateAccountPeers runs inline after the proxy push (reconcile.go:28-35); a slow call stretches CRUD response time.capture_* value.TestSynthesizeServices_HappyPath was updated: request-parser config moved from {} to {"capture_prompt":false} (synthesizer_test.go:174). External snapshot tests against synth output need updating.MergedGuardrails retains zeroed TokenLimits/Budget/Retention even though Policy.Limits carries the real values now; llm_limit_check is the authoritative enforcement. Comment at synthesizer.go:940-948 calls this out.SynthesizeServices runs on every controller tick / mutation reconcile. Cost: 4 store reads + optional per-provider keypair backfill. Sort + index + merge are O(N log N) / O(P × G); dominant cost is JSON marshalling. No nested loops escape these dimensions.reconcile.diffMappings is O(N + M) with N=M=1 per account today — effectively constant.SynthesizeServicesForCluster (synthesizer.go:71) walks every account on a cluster; per-account failures are swallowed (synthesizer.go:91-93) so a single misconfigured account doesn't drop the cluster. Runs per proxy reconnect.AgentNetwork{Provider,Policy,Guardrail,BudgetRule}{Created,Updated,Deleted}; AgentNetworkSettingsUpdated with log_collection/prompt_collection/redact_pii payload (manager.go:567-571). No activity code for SelectPolicyForRequest denies — surfaced via proxy access log only (likely intentional given volume).llm_policy.{token,budget}_cap_exceeded, llm_account.{token,budget}_cap_exceeded (policyselect.go:18-26).reconcile.go:42-44). Persistent synth failures (e.g. unknown catalog id) silently keep the proxy out of sync — consider a manager-level synth-health surface if this becomes a support burden.| Test file | Locks down |
|---|---|
synthesizer_test.go | Mock-store: HappyPath (8-mw chain ordering, {"capture_prompt":false} baseline); No{Settings,Providers}; Disabled{Provider,Policy}_NoService; RouterConfigOrdering; PolicyCheckConfig_UnionsSourceGroups; OrphanProvider_HasEmptyAllowedGroups; identity-inject for LiteLLM / Bifrost (overrides + partial disable) / Cloudflare / Portkey / Vercel / OpenRouter / generic non-customizable; GuardrailMerge_AllowlistUnion_LimitsRestrictive; BackfillsMissingSessionKeys; HTTPUpstream_KeepsExplicitPort; UpstreamURLPath_FlowsToRouter; UnknownProviderID_FailsClosed; EmptyAPIKey_FailsClosed. |
synthesizer_realstore_test.go | Real-sqlite: SurvivesStatusToggle reproduces the disable/re-enable 403 regression; Reconcile_RealStore_PushesPrivateAfterStatusToggle extends through reconcile push. |
synthesizer_guardrail_realstore_test.go | PromptCaptureAccountIsSoleControl; PromptCaptureFlowsWhenAccountOptsIn; AccountRedactWithoutGuardrailRedact; NoGuardrail_CaptureOff. |
synthesizer_log_collection_realstore_test.go | LogCollection{Off_SuppressesAccessLog,On_PermitsAccessLog} — verifies DisableAccessLog propagation through ToProtoMapping. |
synthesizer_parser_redact_realstore_test.go | Capture-pointer regression suite: ParserConfigsCarryRedactPii; ParserConfigsSuppressCaptureWhenLogCollectionOnly (log=on/prompt=off ⇒ both capture flags false); ParserConfigsOmitRedactPiiWhenOff. |
policyselect_test.go | Mock-store: NoApplicablePolicies; AllowWithLowestGroupAttribution; LargerPoolWinsAcrossUsageLevels; StaysOnLargerPoolAfterPartialDrain; FallsThroughToSmallerPoolWhenLargerExhausted; TiebreakBy{LargerGroupPool,CreatedAt}; DeniesWhenAllExhausted; UncappedPolicyAlwaysWinsAgainstCapped; DisabledPolicyIgnored; StoreErrorPropagates; RejectsEmptyAccount; SharesGroupCounterAcrossPolicies; AntiFallThroughOnLowestGroup; BudgetOnlyExhaustionDenies; BudgetTighterThanTokenWins. |
policyselect_realstore_test.go | Real-sqlite regression guard: NoApplicablePolicies; AllowAndLowestGroupAttribution; LargerPoolWins_FallsThroughWhenExhausted; BudgetCapDenies; GroupCounterSharedAcrossPolicies; DisabledPolicyIgnored. |
policyselect_account_realstore_test.go | Account budget rules: AccountCeilingBindsEvenWithUncappedPolicy (min-wins); AccountGroupCeiling; AccountTargetUsersBindsOnlyThatUser; AccountRuleRecordsToOwnWindow. |
reconcile_test.go | FirstSynth_EmitsCreate; NoChange_EmitsNothingExtra (re-push as Modified — verify desired); PolicyRemoved_EmitsDelete; NilProxyController_NoOp; EmptyAccountID_NoOp; ClusterFromMapping. |
wire_shape_test.go | TestSynthesizedService_WireShape — proto-shape lockdown via ToProtoMapping. Catches "service not matching" (mapping reaches proxy but no SNI/HTTP route). Asserts ID, Domain, Mode, AuthToken, Private, Auth.Oidc=false, one path / + https://noop.invalid/, 8 middlewares with correct slot enums, router config auth_header_value="Bearer sk-test-key". |
labelgen/labelgen_test.go | PickUnique_{DeterministicWithSeededRng,AvoidsTakenWordsWhenMostAreReserved,FallsBackWhenAllReserved}; UniqueWords_DropsDuplicates. |
types/consumption_test.go | WindowStart_{AlignedToUnixEpoch,WithinWindowConverges,AcrossWindowsDiverges,DifferentWindowsHaveDifferentBuckets,SubMinuteAndMinuteAlignment,ZeroWindowReturnsInputUTC}. Bucket alignment so multi-node reads converge. |
agentnetwork_realstack_test.go | ProviderCRUD_FansOutToProxyAndClientPeers — no-mock end-to-end through real account manager + network-map + agentnetwork: provider create propagates the updated map to both proxy peer and client peer with the synth DNS surface. |
agentnetwork_budgetrule_realstack_test.go | BudgetRuleCRUD_RealManager; UpdateSettings_PreservesImmutableAndTogglesCollection. |
MergedGuardrails.TokenLimits/Budget/Retention emit at zero (synthesizer.go:940-948); real enforcement is Policy.Limits via llm_limit_check. Future cleanup implied.pickServiceSessionKeys, synthesizer.go:270). Existing session cookies survive provider edits only while the first-by-CreatedAt provider stays in place. Document for operators.reconcile.go:42-44). Persistent failures keep the proxy out of sync until the next reconcile.scoreCandidates exposes only the LAST exhaustion's deny code when multiple policies are exhausted.bootstrapSettingsIfNeeded failure is non-fatal to provider create (manager.go:200): provider lands, synth is no-op until the next provider create retries the bootstrap.manager.go:476-477). Request-time evaluation only; new rules take effect on the next request without a proxy push.service/proxy/sessionkey packages, management/server/permissions + activity.injectAllProxyPolicies fan-out).