Back to Eliza

Railway deploy story

packages/cloud-infra/cloud/RAILWAY.md

2.0.14.7 KB
Original Source

Railway deploy story

Where each piece of the Eliza Cloud backend actually runs today, and where it is heading.

Topology (current)

SurfaceRuntimeRepo pathConfig
cloud-frontend (dashboard SPA)Cloudflare Pagespackages/cloud-frontend/Wrangler / Pages project
cloud-api (REST + auth + billing)Cloudflare Workerpackages/cloud-api/apps/api/wrangler.toml (env vars, secrets via wrangler secret)
headscale (Tailscale coordination server for customer tunnels)Railwaypackages/cloud-services/headscale/railway.toml, Dockerfile
tunnel-proxy (public HTTPS -> tailnet bridge)Railwaypackages/cloud-services/tunnel-proxy/railway.toml, Dockerfile
gateway-discordCloudflare Workerpackages/cloud-services/gateway-discord/own wrangler.toml
gateway-webhookCloudflare Workerpackages/cloud-services/gateway-webhook/own wrangler.toml
agent-server (per-customer agent runtime)Hetzner containerspackages/cloud-services/agent-server/provisioned via container-control-plane
container-control-plane (provisioning API)Hetzner / VPSpackages/cloud-services/container-control-plane/env-driven
Database migrationsGitHub Actions -> Neon (Postgres)packages/cloud-api/db/.github/workflows/cloud-deploy-backend.yml

The deprecated agent VPS deploy still exists behind the deploy_legacy_vps workflow_dispatch input on cloud-deploy-backend.yml. It is off by default and only runs when an operator explicitly opts in. New code should not target it.

Railway services in detail

headscale

  • Builder: Dockerfile (pinned headscale v0.28.0).
  • Healthcheck: GET /health on listen_addr (port 8080). Headscale v0.28 serves this natively.
  • Volume: /var/lib/headscale (SQLite db + generated keys).
  • Public domain: headscale.elizacloud.ai.
  • Provisioning runbook: packages/cloud-services/headscale/DEPLOY.md.

tunnel-proxy

  • Builder: Dockerfile (Go binary).
  • Healthcheck: GET /health (served by main.go line 117).
  • Volume: /var/lib/tunnel-proxy (tsnet node identity).
  • Public domain: tunnel.elizacloud.ai + wildcard *.tunnel.elizacloud.ai.
  • Provisioning runbook: packages/cloud-services/headscale/DEPLOY.md (covers both services).

Where Railway is heading

The strategic direction is to retire AWS and move central services to Railway, with container-based workloads provisioned on Hetzner via the container-control-plane. Concretely:

  • Anything new that needs a long-running stateful HTTP service should target Railway. Add a railway.toml next to its Dockerfile, point the healthcheck at a real endpoint the service serves, and document it here.
  • Anything new that is per-customer compute or GPU-bound should target Hetzner via container-control-plane.
  • Anything that fits the edge model (stateless REST, low-latency, JWT-gated) should stay on Cloudflare Workers.
  • AWS resources (legacy gateway-discord on AWS Lambda, S3 buckets, etc.) are being phased out. New AWS dependencies should not be added.

AWS retirement summary

Full classification, plan, owners, and outstanding items live in AWS_RETIREMENT.md. Quick map:

AWS thingStatusTarget
@aws-sdk/client-s3 (cloud-shared)KeepCloudflare R2 / Supabase / generic S3 endpoint — SDK is provider-agnostic
@aws-sdk/client-kms (cloud-shared encryption)Keep (optional)LocalKMSProvider (AES-256-GCM with SECRETS_MASTER_KEY) is the default. AWS KMS provider only fires when AWS_KMS_KEY_ID is set
legacy-gateway-discord-aws/ terraformDeletedn/a — was a stale duplicate
cloud-services/gateway-discord/terraform/ (EKS)RetireGateway-discord is a Docker/Bun service; redeploy on Railway / Hetzner. Terraform + CI workflow kept until Railway path lands.
packages/examples/aws/ Lambda exampleKeepDocumentation example for users who want to deploy elizaOS on Lambda. Not part of Eliza Cloud infra.
AWS ECR/ECS codeAlready removedReplaced by container-control-plane + Hetzner. README references are stale and have been pruned.

Removed: legacy fullstack railway.toml

packages/cloud-infra/cloud/railway.toml used to deploy the old Next.js fullstack cloud app to Railway. Its healthcheck pointed at /login, a Next.js page route. That deployment is gone: cloud-frontend is a Vite SPA on Cloudflare Pages and cloud-api is a Cloudflare Worker. The file has been removed; nothing in the repo or in CI referenced it.