docs/agent-networks/modules/50-path-routed-providers.md
This guide pulls the path-routed provider story together in one place
because it crosses the catalog, the synthesiser, the request parser, and the
router. The relevant building blocks are the llm_router /
llm_request_parser middlewares
(31-proxy-middleware-builtin.md), the
per-provider parser surface (32-proxy-llm-parsers.md),
and the synthesiser's catalog → ProviderRoute mapping
(21-management-agentnetwork.md).
Sibling modules: 31-proxy-middleware-builtin.md (router + request parser) and 32-proxy-llm-parsers.md (Bedrock parser + pricing).
Most catalog providers carry the model in the request body ({"model": …}),
so llm_router selects an upstream by matching the model name against each
provider's Models claim. Two providers instead carry the model in the URL
path, so they are routed by path before the model/vendor table is consulted:
| Catalog id | Style flag | Request path shape |
|---|---|---|
vertex_ai_api | IsVertexPathStyle → ProviderRoute.Vertex | /v1/projects/{project}/locations/{region}/publishers/{publisher}/models/{model}:{action} |
bedrock_api | IsBedrockPathStyle → ProviderRoute.Bedrock | /model/{modelId}/{action} (optionally behind /bedrock) |
The catalog declares the style with
catalog.IsVertexPathStyle / catalog.IsBedrockPathStyle
and the synthesiser copies the result onto the router route as the Vertex /
Bedrock booleans
(synthesizer.go:450-451).
On the request leg llm_router.Invoke dispatches isVertexPath / isBedrockPath
before the model lookup
(llm_router/middleware.go:138-216)
so a model the parser extracted from the path can't be claimed by a same-vendor
body-routed provider (e.g. claude-* on api.anthropic.com).
vertex_ai_api)KindProvider, parser surface left unset on the catalog entry — the request
parser picks the parser from the URL publisher segment, not from
ParserID. Upstream host is <region>-aiplatform.googleapis.com
(https://aiplatform.googleapis.com for the global location). The catalog
lists the Claude-on-Vertex lineup (claude-opus-4-*, claude-sonnet-4-*,
claude-haiku-4-5, claude-fable-5) at the same per-token rates as the
first-party Anthropic entry
(catalog.go:333-363).
keyfile::)Vertex does not accept a static API key. The operator sets the provider
api_key to:
keyfile::<base64 of the GCP service-account JSON key>
The synthesiser recognises the keyfile:: prefix in providerAuthHeader
(synthesizer.go:897-903),
emits no static auth value, and carries the base64 key material on the
route as GCPServiceAccountKeyB64
(factory.go:56-61).
At request time the router mints a short-lived OAuth2 access token from the key
(cloud-platform scope) and injects Authorization: Bearer <access-token> —
never the key itself
(llm_router/middleware.go:621-692):
oauth2.TokenSource is cached per key (keyed by a
SHA-256 of the base64 material), so token minting happens once and refreshes
amortise across requests.gcpTokenTimeout) so
a slow Google token endpoint can't hang the request.llm_policy.upstream_auth_failed at HTTP 502 (an upstream problem, not a
policy denial) — see denyUpstreamAuth.The request parser extracts {publisher, model, action} from the path
(parseVertexPath, llm_request_parser/middleware.go:237-263),
strips the @version suffix from the model, and maps the publisher to a parser
surface via vertexPublisherVendor:
anthropic → llm.provider="anthropic" → metered through the Anthropic
parser, priced under the anthropic block in defaults_pricing.yaml
(the parser emits the standard Anthropic provider label, so Vertex Claude
reuses first-party Anthropic prices).openai → llm.provider="openai" (reserved; not in the catalog lineup
today).google / Gemini) → empty vendor → no parser.Gemini is intentionally denied as unmeterable. When the parser emits no
llm.provider for a Vertex publisher, llm_router returns
llm_policy.unmeterable_publisher (403) rather than forwarding the request
uncounted — serving it would bypass token / budget metering
(llm_router/middleware.go:144-162, 712-728).
A Gemini parser would lift this restriction; until then the google publisher
is omitted from the catalog.
Caveat: cross-region inference profiles in
eu/apaccarry a ~10% price premium that the base per-token rates do not model — cost annotations for those regions read low. Operators who need exact regional billing override the affected entries inpricing.yaml.
bedrock_api)KindProvider, upstream host bedrock-runtime.<region>.amazonaws.com. Metered
models are the Anthropic-on-Bedrock lineup (anthropic.claude-*) plus Amazon
Nova and Llama 3.3 entries
(catalog.go:300-332).
Anthropic-on-Bedrock reuses the first-party Claude prices (with additive cache
buckets); Nova / Llama report no cache, so cost is input + output.
Bedrock uses the AWS Bedrock API key as a static bearer. The operator sets
the provider api_key directly (no keyfile:: prefix); the catalog template
is Authorization: Bearer ${API_KEY}
(catalog.go:306-307).
No token minting — the synthesiser substitutes the key into the template and
the router injects the resulting Authorization header after stripping inbound
vendor auth (including client-supplied AWS SigV4 material: X-Amz-Date,
X-Amz-Security-Token, X-Amz-Content-Sha256, see strippedAuthHeaders).
Bedrock model ids in the request path must be the cross-region
inference-profile form, e.g.
eu.anthropic.claude-sonnet-4-5-20250929-v1:0. The bare
anthropic.claude-… id is rejected by AWS. normalizeBedrockModel
(llm_request_parser/middleware.go:398-414)
strips the region prefix (us. / eu. / apac. / global.), an optional ARN
wrapper, and the -YYYYMMDD-vN[:N] version/throughput suffix so the normalised
id (anthropic.claude-sonnet-4-5) matches the catalog/pricing key.
/model/{modelId}/{action} where action ∈ invoke,
invoke-with-response-stream, converse, converse-stream
(llm_request_parser/middleware.go:363-390).
invoke / converse are non-streaming; the -stream actions set the streaming
flag.
"anthropic_version":"bedrock-2023-05-31" and snake_case usage with additive
cache buckets.totalTokens.BedrockParser reads both shapes on the response leg
(bedrock.go); the request parser
doesn't need to distinguish them (ParseRequest is a no-op — model + stream
come from the path).The -stream actions return application/vnd.amazon.eventstream (the AWS
binary event-stream framing), and streaming is metered.
accumulateBedrockStream
(llm_response_parser/streaming_bedrock.go)
decodes the frames with aws-sdk-go-v2/aws/protocol/eventstream:
chunk frames wrap a base64 {"bytes":…} payload carrying a
vendor-native (Anthropic) stream event — folded through the shared Anthropic
stream accumulator.contentBlockDelta frames carry text; the trailing metadata frame
carries the final usage block./bedrock gateway-namespace prefixClients may place an optional /bedrock prefix before the native path
(/bedrock/model/{modelId}/{action}) to disambiguate Bedrock from other
providers that also use /model/.... Both the request parser
(trimBedrockNamespace) and the router (splitBedrockNamespace) accept it.
When the prefix is present, the router sets
RewriteUpstream.StripPathPrefix = "/bedrock" so the native path
(/model/...) is what reaches bedrock-runtime.<region>.amazonaws.com
(llm_router/middleware.go:168-184, 320-348).
Because the model lives in the URL rather than the body, a path-routed provider
credential could otherwise be used for any model the upstream supports. The
router still enforces the route's Models allowlist via matchPathRoute
(llm_router/middleware.go:370-416):
Vertex / Bedrock).AllowedGroupIDs authorise the caller's groups
(else no_authorised_provider).Models list = catch-all (serve any model);
a non-empty list serves only the listed models (else model_not_routable).UpstreamPath prefix match.So an operator who lists explicit models on a Vertex/Bedrock provider gets a
hard allowlist; an operator who leaves Models empty accepts every model the
upstream serves (still subject to the unmeterable-publisher gate on Vertex).
Model-less OpenAI endpoints (GET /v1/models) are never routed to a
Vertex/Bedrock provider — matchModelless skips path-routed routes
(llm_router/middleware.go:427-462)
so a model-listing call can't be rewritten onto an upstream that would 404 it.
Catalog prices and context windows are cross-checked against LiteLLM's
model_prices_and_context_window.json. The proxy's embedded
defaults_pricing.yaml covers every metered first-party model the catalog
enumerates — guarded by
TestDefaultTable_FirstPartyModelCoverage
(pricing/defaults_coverage_test.go),
which fails if a catalog model has no embedded price. Bedrock entries are keyed
by the normalised id the request parser emits (region prefix + version
suffix stripped). Vertex Claude carries no Bedrock-style prefix, so it prices
straight off the anthropic block.
Security. The Vertex service-account key is never forwarded — only a minted
short-lived bearer. Confirm the key material stays out of access logs (it lives
on ProviderRoute.GCPServiceAccountKeyB64, not in any emitted metadata key).
The unmeterable-publisher deny is the only thing standing between an
operator-misconfigured Vertex provider and unmetered Gemini traffic; verify
vertexPublisherVendor stays conservative (deny by default for unknown
publishers).
Correctness. normalizeBedrockModel is the join between the wire id and the
pricing key — a model that normalises to something not in defaults_pricing.yaml
meters at cost.skipped=unknown_model rather than failing the request. The
/bedrock prefix strip must run on both the parser side (so the model is
extracted) and the router side (so the upstream path is native); a regression in
either silently breaks the other.
Metering caveats. eu/apac cross-region Bedrock + Vertex profiles carry a
~10% premium not modelled by base pricing — flagged in both the catalog comment
and defaults_pricing.yaml. Operators needing exact regional billing override
the relevant entries.
keyfile:: handling: 21-management-agentnetwork.md