docs/content/features/cloud-proxy.md
+++ title = "Cloud passthrough proxy" weight = 28 toc = true description = "Forward requests to OpenAI, Anthropic, or any compatible provider" tags = ["Proxy", "Cloud", "Routing", "Advanced"] categories = ["Features"] +++
LocalAI can forward chat-completion and Anthropic Messages requests to an
external provider instead of running them through the local gRPC backend
pipeline. Configure a model with backend: cloud-proxy and a proxy.upstream_url,
and LocalAI bypasses templating, MCP injection, and the local model loader
entirely — the upstream sees the body the client sent (with only the top-level
model field optionally rewritten).
The streaming PII filter still runs over the upstream's SSE stream, so cloud egress remains subject to the same redaction rules a local model would apply.
/v1/chat/completions (OpenAI-shaped) or
/v1/messages (Anthropic-shaped).cloud-proxy backend in passthrough mode and
loads the cloud-proxy gRPC backend, which owns the outbound HTTP.proxy.upstream_url with provider-aware
authentication, then streams the SSE response back to core.Passthrough mode is wire-format-faithful — it does not translate request shapes between providers. A client posting an OpenAI-shaped body to an Anthropic upstream will get a confused upstream. Use the matching wire format, or switch to translate mode (below).
The cloud-proxy backend has one knob — the provider it should authenticate against — and two modes:
proxy.mode | What it does | When to use |
|---|---|---|
passthrough (default) | Forwards the request body verbatim to upstream_url. Client must speak the upstream's wire format. | Same wire format on both ends. |
translate | Backend converts internal proto to the upstream's wire format. Client can speak OpenAI-shaped requests to an Anthropic upstream, etc. | Cross-format adaptation. |
proxy.provider selects the auth scheme and (in translate mode) the wire
format. Supported values: openai, anthropic.
API keys are loaded from either an environment variable (api_key_env) or a
file (api_key_file). The key never appears in the config file or the admin
UI; pick whichever fits your secret-management setup.
name: gpt-4o-proxy
backend: cloud-proxy
# When set, replaces the client's "model" field before forwarding.
# Useful when the LocalAI alias differs from the upstream's canonical name.
proxy:
mode: passthrough
provider: openai
upstream_url: https://api.openai.com/v1/chat/completions
api_key_env: OPENAI_API_KEY
upstream_model: gpt-4o
request_timeout_seconds: 120
# PII filtering defaults to ON for cloud-proxy backends. Override by setting
# pii.enabled: false explicitly. Per-pattern action overrides go in
# pii.patterns; see the Middleware admin page or the Middleware feature doc.
pii:
enabled: true
Then start LocalAI with the API key in the environment:
export OPENAI_API_KEY=sk-...
local-ai run
Clients hit http://localhost:8080/v1/chat/completions with "model": "gpt-4o-proxy"
and the request lands on OpenAI's API.
name: claude-sonnet-proxy
backend: cloud-proxy
proxy:
mode: passthrough
provider: anthropic
upstream_url: https://api.anthropic.com/v1/messages
api_key_env: ANTHROPIC_API_KEY
upstream_model: claude-3-5-sonnet-20241022
request_timeout_seconds: 300
pii:
enabled: true
# Block — not just mask — leaked credentials before they reach the upstream.
patterns:
- id: api_key_prefix
action: block
Anthropic clients hit http://localhost:8080/v1/messages with
"model": "claude-sonnet-proxy".
Most third-party providers (Together, Groq, DeepInfra, OpenRouter, …) speak
the OpenAI chat-completions wire format. Use provider: openai with the
provider's URL and API key:
name: llama-3-70b-via-together
backend: cloud-proxy
proxy:
mode: passthrough
provider: openai
upstream_url: https://api.together.xyz/v1/chat/completions
api_key_env: TOGETHER_API_KEY
upstream_model: meta-llama/Llama-3-70b-chat-hf
In translate mode the cloud-proxy backend converts LocalAI's internal proto to the provider's wire format. This lets a client speak one shape (e.g. OpenAI Chat Completions) against an upstream that expects another (e.g. Anthropic Messages).
name: claude-via-openai-clients
backend: cloud-proxy
proxy:
mode: translate
provider: anthropic
upstream_url: https://api.anthropic.com/v1/messages
api_key_env: ANTHROPIC_API_KEY
upstream_model: claude-3-5-sonnet-20241022
Translate mode currently routes only pure-text completions — tool calls,
image blocks, and per-request usage tokens are dropped through the
internal Predict() signature. Use passthrough mode when your clients need
the upstream's full feature set.
api_key_file is an alternative to api_key_env when your secret manager
mounts keys as files (e.g. Kubernetes secrets, Docker secrets, Vault Agent):
proxy:
api_key_file: /run/secrets/openai_api_key
The file is read at backend load time and trimmed of surrounding whitespace.
api_key_env and api_key_file are mutually exclusive.
A router model can spread traffic across local and cloud candidates. The score classifier reads the policy descriptions and routes per request:
name: smart-router
router:
classifier: score
classifier_model: arch-router-1.5b
fallback: qwen-3-7b-local
activation_threshold: 0.40
policies:
- label: casual
description: small talk, greetings, short answers
- label: code
description: writing or debugging code in any programming language
- label: heavy-reasoning
description: long-form analysis, complex math, multi-step reasoning
candidates:
- model: qwen-3-7b-local
labels: [casual]
- model: gpt-4o-proxy
labels: [casual, code]
- model: claude-sonnet-proxy
labels: [casual, code, heavy-reasoning]
The router rewrites input.Model to the chosen candidate; per-model PII,
ACLs, and the cloud-proxy fork all run against the resolved target.
See [Middleware: PII filtering and intelligent routing]({{< relref "middleware.md" >}}) for the full router and PII-filter reference.
mode: translate (with
the constraints documented above) or send requests that match the upstream's
format.502 Bad Gateway.usage field when present.request_timeout_seconds defensively — a hung upstream otherwise ties
up an HTTP handler until the client disconnects.