Back to Litellm

LiteLLM Terraform stacks

terraform/litellm/README.md

1.88.111.6 KB
Original Source

LiteLLM Terraform stacks

Two self-contained Terraform root modules that deploy the componentized LiteLLM proxy — the gateway, backend, and UI as three independent containers (see helm/litellm/ for the canonical chart with the same split).

StackComputeDatabase (writer + reader)CacheObject storePublic entrypoint
aws/ECS FargateAurora Postgres (IAM auth)ElastiCacheS3Application LB
gcp/Cloud RunCloud SQL Postgres (password auth)MemorystoreGCSExternal HTTPS LB

Each stack creates its own VPC and managed data stores — drop in a tfvars file and run terraform apply. Both stacks support a typed proxy_config input (mirrors helm/litellm's gateway.config.proxy_config) and per-component extra env vars / secret-manager refs.

Components

The proxy is split into three deployables:

ComponentDefault imagePortRole
gatewayghcr.io/berriai/litellm-gateway:main-stable4000LLM data plane (/v1/chat/completions, /v1/embeddings, …)
backendghcr.io/berriai/litellm-backend:main-stable4001Management API (/key/*, /user/*, /team/*, /model/*, …)
uighcr.io/berriai/litellm-ui:main-stable3000Static Next.js dashboard served by nginx

The load balancer routes gateway path prefixes (mirrored verbatim from gateway/routes/allowlist.py) to the gateway, UI asset paths (/, /litellm-asset-prefix/*, /_next/*, /favicon.ico) to the UI, and everything else to the backend.

Architecture

AWS (terraform/litellm/aws/)

                        ┌───────────────────────────────────────┐
                        │            Public Internet            │
                        └─────────────────┬─────────────────────┘
                                          │ HTTP/80
                          ┌───────────────▼───────────────┐
                          │   Application Load Balancer   │
                          │   (path-routing listener)     │
                          └─┬─────────────┬─────────────┬─┘
                            │             │             │
            UI assets, /    │  /v1/chat,  │   /key/*    │
            /_next/*, …     │  /v1/embed, │   /user/*   │
                            │  …          │   …         │
              ┌─────────────▼───┐  ┌──────▼──────┐  ┌───▼──────────────┐
              │    ECS Service  │  │ ECS Service │  │   ECS Service    │
              │       (ui)      │  │  (gateway)  │  │    (backend)     │
              │   Fargate :3000 │  │ Fargate:4000│  │  Fargate :4001   │
              └─────────────────┘  └──────┬──────┘  └────────┬─────────┘
                                          │                  │
              ┌─── private subnets (one per AZ) ──────────────────────┐
              │                                                       │
              │   ┌────────────────────────┐    ┌────────────────┐   │
              │   │  Aurora Postgres       │    │  ElastiCache   │   │
              │   │  cluster (IAM auth)    │    │  Redis (1 node)│   │
              │   │  ┌───────┐  ┌───────┐  │    └────────────────┘   │
              │   │  │writer │  │reader │  │                         │
              │   │  └───────┘  └───────┘  │    ┌────────────────┐   │
              │   └────────────────────────┘    │  S3 bucket     │   │
              │                                  │  (versioned)   │   │
              │   ┌────────────────────────┐    └────────────────┘   │
              │   │  Secrets Manager       │                         │
              │   │  • LITELLM_MASTER_KEY  │    ┌────────────────┐   │
              │   │  • DB master password  │    │ One-off ECS    │   │
              │   │  • user-supplied API   │    │ task: prisma   │   │
              │   │    keys (referenced)   │    │ migrate deploy │   │
              │   └────────────────────────┘    └────────────────┘   │
              │                                                       │
              └─── VPC ───────────────────────────────────────────────┘
                          │ NAT gateway in one public subnet
                          ▼
                    egress to LLM providers

GCP (terraform/litellm/gcp/)

                        ┌───────────────────────────────────────┐
                        │            Public Internet            │
                        └─────────────────┬─────────────────────┘
                                          │ HTTP/80
                          ┌───────────────▼───────────────┐
                          │ External HTTPS Load Balancer  │
                          │   (global, URL map routing)   │
                          └─┬─────────────┬─────────────┬─┘
                            │             │             │
                            │ Serverless NEGs (one per service)
                            │             │             │
              ┌─────────────▼───┐  ┌──────▼──────┐  ┌───▼──────────────┐
              │   Cloud Run     │  │  Cloud Run  │  │    Cloud Run     │
              │      (ui)       │  │  (gateway)  │  │    (backend)     │
              │      :3000      │  │   :4000     │  │      :4001       │
              └─────────────────┘  └──────┬──────┘  └────────┬─────────┘
                                          │                  │
                                          │ Serverless VPC Access connector
              ┌─── VPC (private services access range) ──────────────────┐
              │                                                          │
              │   ┌────────────────────────┐    ┌──────────────────┐    │
              │   │  Cloud SQL Postgres    │    │  Memorystore     │    │
              │   │  ┌───────┐  ┌───────┐  │    │  Redis           │    │
              │   │  │writer │  │reader │  │    └──────────────────┘    │
              │   │  └───────┘  └───────┘  │                            │
              │   └────────────────────────┘    ┌──────────────────┐    │
              │                                  │  GCS bucket      │    │
              │   ┌────────────────────────┐    │  (versioned)     │    │
              │   │  Secret Manager        │    └──────────────────┘    │
              │   │  • LITELLM_MASTER_KEY  │                            │
              │   │  • DB password         │    ┌──────────────────┐    │
              │   │  • user-supplied API   │    │ Cloud Run Job:   │    │
              │   │    keys (referenced)   │    │ prisma migrate   │    │
              │   └────────────────────────┘    │ deploy           │    │
              │                                  └──────────────────┘    │
              └──────────────────────────────────────────────────────────┘

Images

Both stacks take per-component image references as variables. The defaults point at the public ghcr.io/berriai/litellm-<component>:main-stable images, so the stack is runnable end-to-end without pre-flight setup — pin to a specific tag for production:

  • AWS can pull from any registry the task execution role can reach. The role gets AmazonECSTaskExecutionRolePolicy attached, which grants ECR pull permissions for repositories in the same account.

  • GCP Cloud Run can only pull from Artifact Registry or gcr.io-style registries. To use images hosted elsewhere, mirror them into Artifact Registry first.

Migrations

LiteLLM's proxy runs prisma migrate deploy at startup, but on first apply the gateway/backend can race the empty database. Both stacks expose a one-off migration task that runs python litellm/proxy/prisma_migration.py against the backend image:

  • AWS: an aws_ecs_task_definition (litellm-migrations). Run with aws ecs run-task — the command is printed in terraform output.
  • GCP: a google_cloud_run_v2_job (litellm-migrations). Run with gcloud run jobs execute — the command is printed in terraform output.

Run the migration job once after the first terraform apply and before the gateway/backend services start serving traffic.

What's not included

  • TLS certificates / custom domains. Both stacks expose plain-HTTP load balancers; bring your own ACM cert (AWS) or managed cert (GCP) and wire it into the LB resource.
  • Remote state backends. Default local state — add an s3 or gcs backend block to versions.tf when graduating to a team environment.
  • Observability beyond the cloud provider's defaults (CloudWatch logs on AWS, Cloud Logging on GCP). Wire your own Prometheus / Datadog / Langfuse via the *_extra_env variables.