terraform/litellm/gcp/README.md
The button above opens the DeployStack installer in Cloud Shell, walks you through TUTORIAL.md, and runs terraform apply once you've answered the prompts. The rest of this README is the manual / advanced path.
Deploys the componentized LiteLLM proxy on GCP:
GCS_BUCKET_NAMELITELLM_MASTER_KEY and DATABASE_PASSWORDgateway (port 4000), backend (port 4001),
and ui (port 3000), all using a shared runtime service accountlitellm-migrations) that runs prisma migrate deploy from the dedicated ghcr.io/berriai/litellm-migrations imagegatewayuibackendThere are four images: litellm-gateway, litellm-backend, litellm-ui,
and litellm-migrations (slim image used only by the one-off Cloud Run
Job — runs prisma migrate deploy against the writer DB and exits).
Bump them together when bumping LiteLLM.
Required override. The image_registry default (ghcr.io/berriai)
does not work as-is — Cloud Run only accepts images from Artifact
Registry, [region.]gcr.io, or docker.io, and rejects ghcr.io URIs
at apply time. Every deploy (including HCP Terraform 1-click) must
supply either image_registry pointed at an Artifact Registry remote
repo backed by GHCR, or full per-component *_image URIs against
images you've already mirrored. The default is present only so
terraform plan succeeds during local iteration.
One-time setup (per project): create a remote repo and let Cloud Run pull through it.
gcloud artifacts repositories create litellm \
--repository-format=docker \
--location=us-central1 \
--mode=remote-repository \
--remote-repo-config-desc="GitHub Container Registry passthrough" \
--remote-docker-repo=https://ghcr.io
Then point the stack at it via image_registry:
image_registry = "us-central1-docker.pkg.dev/my-gcp-project/litellm/berriai"
image_tag = "v1.86.0-dev"
The four litellm-<component>:${image_tag} URIs are composed from those
two vars. Set gateway_image / backend_image / ui_image /
migrations_image only if you need a per-component override (custom
build, different tag).
Two further notes:
The runtime SAs the stack creates do not need
roles/artifactregistry.reader — Cloud Run pulls images using the
per-project serverless agent
(service-<project-num>@serverless-robot-prod.iam.gserviceaccount.com),
not the runtime SA.
For a fully air-gapped option, mirror the images into a regular AR repository instead of a remote repo:
for c in gateway backend ui migrations; do
docker pull ghcr.io/berriai/litellm-$c:<tag>
docker tag ghcr.io/berriai/litellm-$c:<tag> \
us-central1-docker.pkg.dev/$PROJECT/litellm/$c:<tag>
docker push us-central1-docker.pkg.dev/$PROJECT/litellm/$c:<tag>
done
then set image_registry = "us-central1-docker.pkg.dev/$PROJECT/litellm"
(drop the /berriai suffix — the mirrored layout has no org segment).
LiteLLM's init_iam_db_url_from_env() mints AWS RDS tokens via boto3 —
it doesn't speak GCP IAM. To IAM-auth against Cloud SQL from Cloud Run you'd
need the Cloud SQL Auth Proxy as a sidecar, which complicates the service
spec. This stack therefore uses password authentication:
<name>-db-password).DATABASE_PASSWORD via
value_source.secret_key_ref.DATABASE_URL (and
DATABASE_URL_READ_REPLICA) from DATABASE_HOST / DATABASE_PASSWORD
before exec'ing uvicorn — so the password never appears in the service
spec or in logs.If you need GCP-native IAM auth later, add cloud-sql-proxy as a sidecar
container under template.template.containers (Cloud Run v2 supports
multiple containers) and replace the password-based URL with the proxy's
Unix socket.
proxy_configMirrors the helm chart's gateway.config.proxy_config. The map is
YAML-encoded and uploaded to a dedicated GCS bucket as config.yaml, then
mounted read-only into the gateway and backend at /etc/litellm via Cloud
Run v2's gcsfuse volume. CONFIG_FILE_PATH points at the mount path. A
hash of the YAML rides along as an env var so an edit to proxy_config
forces a new Cloud Run revision; without it the new file would sit in the
bucket unread until the next unrelated revision rollover. The migrations
job doesn't get the config (it only runs prisma migrate deploy).
proxy_config = {
model_list = [
{
model_name = "gpt-4o"
litellm_params = {
model = "openai/gpt-4o"
api_key = "os.environ/OPENAI_API_KEY"
}
},
]
general_settings = {
master_key = "os.environ/LITELLM_MASTER_KEY"
database_url = "os.environ/DATABASE_URL"
}
}
LiteLLM resolves os.environ/<NAME> references against the container
environment. Provider API keys belong in *_extra_secrets and are
referenced from the YAML by env-var name.
Non-sensitive env vars:
gateway_extra_env = {
LANGFUSE_HOST = "https://us.cloud.langfuse.com"
}
Sensitive values — create the secret in Secret Manager first, then reference its resource ID:
echo -n "sk-proj-..." | gcloud secrets create openai-api-key --data-file=-
gateway_extra_secrets = {
OPENAI_API_KEY = "projects/my-gcp-project/secrets/openai-api-key"
}
The Cloud Run runtime SA auto-gains roles/secretmanager.secretAccessor on
every secret referenced. Pass the bare secret resource ID only —
projects/.../secrets/openai-api-key, never the version-suffixed form
projects/.../secrets/openai-api-key/versions/3. The Cloud Run
secret_key_ref binding and the stack's IAM secret_id grant both
reject the version suffix; version is always resolved as latest. If
you need a pinned version, edit local.gateway_extra_secret_kv in
cloudrun.tf directly to set version = "3" for the entry in question.
OTel v2 (https://docs.litellm.ai/docs/observability/opentelemetry_v2) is
opt-in and gated entirely on otel_endpoint. Empty (default) and nothing
OTel-related lands in the container env. Set it and both gateway and
backend gain LITELLM_OTEL_V2=true plus the OTEL_* block, with
OTEL_SERVICE_NAME stamped per component (${tenant}-litellm-${env}-gateway
and -backend) so spans land tagged with the right hop. Any OTEL_* key
set in gateway_extra_env / backend_extra_env overrides the default for
that service (Cloud Run rejects duplicate env names, so the override is
predictable).
otel_endpoint = "https://otel.example.com:4318"
otel_exporter = "otlp_http" # or otlp_grpc
otel_environment_name = "prod" # default: var.env
otel_headers_secret = "projects/my-gcp-project/secrets/otel-headers"
OTEL_HEADERS is wired as a Secret Manager secret_key_ref since it
typically carries the collector's auth token; create the secret with the
literal header string, e.g. Authorization=Bearer <token>.
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT defaults to
no_content; flip otel_capture_message_content = "prompt_and_completion"
only after auditing what lands in the backend, since prompts and
completions are typically sensitive.
Behavior matches the AWS stack 1:1; the only naming differences are
otel_headers_secret (a Secret Manager resource ID) vs AWS's
otel_headers_secret_arn (a Secrets Manager ARN).
Every resource the stack creates is named ${tenant}-litellm-${env} (or
that plus a per-resource suffix), so multiple tenants and multiple
environments coexist in the same project as long as the (tenant, env)
pair differs:
tenant | env | Example resource name |
|---|---|---|
acme | stage | acme-litellm-stage-gateway |
acme | prod | acme-litellm-prod-master-key |
globex | dev | globex-litellm-dev-license |
For a per-tenant instance via the example root, the only inputs that change are the tenant slug, env, and the two pre-issued secrets:
cd terraform/litellm/gcp/examples/default
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE
terraform apply \
-var "project_id=my-gcp-project" \
-var "region=us-central1" \
-var "tenant=acme" \
-var "env=stage"
To run many tenants from a single config, call the module with
for_each instead of one root per tenant — only possible because the
module declares no provider block (see "Using as a module").
Both litellm_master_key and litellm_license are optional:
litellm_master_key → the stack auto-generates a random sk-…
value (trial/dev path).litellm_license → no license secret is created and gateway/
backend run without LITELLM_LICENSE (OSS-only).Use TF_VAR_* env vars rather than tfvars files for these — values
written to a tfvars file end up in terraform.tfstate and any committed
example files.
cd terraform/litellm/gcp/examples/default
cp terraform.tfvars.example terraform.tfvars
# Edit: project, region, tenant, env, image_registry, proxy_config, gateway_extra_secrets.
terraform init
terraform apply
examples/default/ is a thin root that configures the google /
google-beta providers and calls the module (../../). It exposes a
curated variable surface; for advanced knobs (per-component
CPU/memory/instances, Cloud SQL tier/edition, Memorystore tier,
per-component image pins) set them on the module "litellm" block in
examples/default/main.tf, or call the module from your own config — see
"Using as a module" below.
That single apply provisions everything, runs the prisma schema migration via
the Cloud Run job (auto-triggered by bootstrap.tf), and only then starts the
gateway/backend services. When it returns, the stack is serving traffic.
terraform output lb_url
# UI login: admin / <master key>
gcloud secrets versions access latest --secret="$(terraform output -raw master_key_secret_id)"
The migration_run_command output is preserved for break-glass manual re-runs.
Prerequisite: gcloud must be authenticated (gcloud auth login) and the
required APIs must be enabled (run, sqladmin, redis, secretmanager,
vpcaccess, compute, servicenetworking, storage, artifactregistry).
terraform plan refuses to provision an HTTP-only LB by default — TLS
is the supported posture. Two paths:
Production / staging — set lb_domains:
terraform apply once with allow_plaintext_lb = true (intentional
chicken-and-egg escape hatch) to provision the LB and read the anycast
IP from terraform output -raw lb_ip.lb_domains = ["proxy.example.com"] and remove
allow_plaintext_lb; re-apply.Result: a 443 forwarding rule with a Google-managed cert covering each
listed domain; the 80 forwarding rule is rewritten to serve a permanent
301 redirect to HTTPS, so HTTP clients are automatically upgraded. The
managed cert sits in PROVISIONING for ~15-60 min on first apply until
DNS propagation completes — gcloud compute ssl-certificates describe <tenant>-litellm-<env>-cert shows the state.
Trial / dev — explicitly opt into HTTP-only:
Set allow_plaintext_lb = true and leave lb_domains = []. Without the
flag, plan fails with a clear error pointing at the precondition.
Intended for short-lived trial / dev stacks only.
The directory itself is a module with no provider block — the caller
owns provider config. You can call it directly with for_each (many
tenants from one config), count, depends_on, or providers configured
to impersonate a service account / target a different project:
provider "google" {
project = "my-gcp-project"
region = "us-central1"
}
provider "google-beta" {
project = "my-gcp-project"
region = "us-central1"
}
module "litellm" {
source = "github.com/BerriAI/litellm//terraform/litellm/gcp?ref=<tag>"
project = "my-gcp-project"
region = "us-central1"
tenant = "acme"
env = "prod"
# ...any of the inputs in variables.tf...
}
Both the default google and google-beta configs are inherited by the
module automatically through the call; declare both in the caller.
Labels: the module stamps its own litellm-stack and managed-by labels
onto every label-supporting resource (Cloud Run services and the
migrations job, Cloud SQL writer and reader, Memorystore, Secret Manager
entries, GCS buckets, the LB global address and forwarding rules) and
merges var.labels on top. Use the labels input for per-deployment
labels; mirrors the AWS stack's tags input.
for_each shares one provider config. The module's versions.tf declares
google / google-beta without configuration_aliases, so it only ever
receives the caller's single default (unaliased) google / google-beta
providers. That's deliberate — it keeps the one-command path simple — but it
means a for_each over the module runs every instance against the same
project, region, and credentials. Use for_each for many tenants in one
project (distinct tenant/env); it cannot fan out across projects or regions
on its own. To deploy into separate projects/regions, give each its own root
with its own provider config (one examples/default-style root per project),
or fork the module to add configuration_aliases and pass per-instance
providers = { ... }.
Two opt-in tripwires guard against accidental data loss on
terraform destroy:
cloudsql_deletion_protection (Cloud SQL writer + reader;
default true) — destroy fails with a clear error rather than
dropping the database.gcs_force_destroy (GCS bucket holding request log archives,
/v1/files content, and the GCS cache backend; default false) —
terraform destroy against a non-empty bucket fails.Flip cloudsql_deletion_protection to false or gcs_force_destroy to
true only for ephemeral / CI stacks where you accept losing the data.
Memorystore runs with transit_encryption_mode = "SERVER_AUTHENTICATION",
so the proxy connects via rediss://. The instance's self-signed CA cert
(server_ca_certs[0].cert) is shipped to gateway + backend as
REDIS_CA_PEM_B64; their entrypoint shell decodes it to /tmp/redis-ca.pem
before uvicorn starts and points REDIS_SSL_CA_CERTS at that path. No
extra config needed — but if you ever swap Memorystore for an external
Redis, override REDIS_HOST/REDIS_PORT and either drop these env vars
or point them at your own CA.
| File | What's in it |
|---|---|
versions.tf | Terraform + required_providers constraints (module declares no provider config) |
examples/default/ | Thin root: google / google-beta providers + a call to the module. The one-command deploy path. |
variables.tf | All input variables |
locals.tf | Path-prefix lists (mirror of helm/.../ingress.yaml) + proxy_config helpers |
network.tf | VPC, subnet, PSA range, Serverless VPC connector |
secrets.tf | Secret Manager entries + random master_key |
cloudsql.tf | Cloud SQL writer + read replica + app user + password secret |
redis.tf | Memorystore Redis (private IP) |
gcs.tf | GCS bucket + objectAdmin binding |
iam.tf | Runtime SA + Cloud SQL client + Secret Manager accessor |
cloudrun.tf | 3 Cloud Run services + Cloud Run Job for migrations |
load_balancer.tf | External HTTPS LB, serverless NEGs, URL map for path routing |
outputs.tf | LB IP, service URLs, secret IDs, migration execute command |