packages/cloud-infra/cloud/AWS_RETIREMENT.md
Living audit of AWS dependencies in this repo, what they do, what they were migrated to (or kept and redirected), and what is still outstanding.
Pair this with RAILWAY.md (where central services run today)
and the storage notes in
../../cloud-shared/src/lib/storage/s3-compatible-client.ts
(how the S3 SDK is pointed at Cloudflare R2 / Supabase / generic S3
endpoints).
We are retiring AWS as a primary backend. The replacements per surface:
| AWS service | Replacement |
|---|---|
| S3 | Cloudflare R2 (via @aws-sdk/client-s3 against R2 endpoint) |
| KMS | LocalKMSProvider (AES-256-GCM with SECRETS_MASTER_KEY) |
| ECS / EKS containers | Hetzner via container-control-plane |
| Lambda | Cloudflare Workers (cloud-api, gateway-*) |
| RDS | Neon Postgres |
| ElastiCache | Upstash Redis (managed) / redis package |
| CloudFront / Route53 | Cloudflare (Pages + DNS) |
| SQS / SNS | Not currently used in core services |
| Dependency | Where | Why kept |
|---|---|---|
@aws-sdk/client-s3 | packages/cloud-shared | S3 wire protocol is the de-facto standard for object storage. s3-compatible-client.ts resolves STORAGE_PROVIDER (r2, supabase, s3) and points the client at R2 (*.r2.cloudflarestorage.com), self-hosted Supabase, or any generic S3 endpoint. There is no AWS S3 account in the production path. |
@aws-sdk/client-kms | packages/cloud-shared/src/lib/services/secrets/encryption.ts | Lazy-loaded inside AWSKMSProvider. Default is LocalKMSProvider (AES-256-GCM with SECRETS_MASTER_KEY). The AWS provider is only constructed when AWS_KMS_KEY_ID is set — kept so existing deployments with a provisioned KMS key continue to decrypt their secrets. New deployments use LocalKMSProvider. |
packages/examples/aws/ | Examples package | A documentation example showing how to run an elizaOS worker on AWS Lambda + API Gateway. Not part of Eliza Cloud infrastructure. Keep as a user-facing example. |
s3-storage plugin registry entry (AWS_* config fields) | packages/app-core/src/registry/entries/plugins/s3-storage.json | Generic user-facing S3 plugin. The AWS_REGION / AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_S3_ENDPOINT schema fields are how the broader S3 ecosystem labels these creds, regardless of provider. Keep. |
plugin-sql/src/config.toml references to AWS_ACCESS_KEY_ID | Generic Supabase/S3 storage config | Same reason — config keys are the standard names. Keep. |
plugin-registry env-var prefix allowlist (AWS_) | plugins/plugin-registry/src/api/app-plugins-routes.ts line 321 | Catch-all prefix used to identify infra credentials for any provider that uses AWS_* env names (R2, MinIO, etc). Keep. |
| Dependency | Where | Target | Status |
|---|---|---|---|
gateway-discord EKS deployment | packages/cloud-services/gateway-discord/terraform/ and .github/workflows/cloud-gateway-discord.yml | Railway (railway.toml + Dockerfile). Service is already a Bun/Docker app. | Done. terraform/ and chart/ removed; railway.toml added; workflow stripped of all AWS jobs (terraform plan/apply/destroy, EKS update-kubeconfig, ECR/Helm deploy). Workflow now only runs tests; Railway auto-deploys on push. |
gateway-webhook EKS deployment | packages/cloud-services/gateway-webhook/ | Railway. | Done. No terraform/ directory existed; railway.toml is in place; workflow has no AWS jobs. |
TERRAFORM_AWS_ROLE_ARN / GATEWAY_AWS_ROLE_ARN GitHub secrets | CI vars | Remove from the gateway-dev / gateway-prd environments now that the workflow no longer references them. | Ready for removal. Workflow no longer reads either variable; the env-vars in the GitHub environments themselves should be deleted by a maintainer with org-admin access. |
| Path | Lines | Why | Status |
|---|---|---|---|
packages/cloud-infra/cloud/terraform/legacy-gateway-discord-aws/ | 19 files / ~1.9k LOC | Explicitly named "legacy"; duplicate of cloud-services/gateway-discord/terraform/; quarantined per the parent README. | Done. |
packages/cloud-services/gateway-discord/terraform/ and chart/ | Active EKS terraform + Helm chart | Replaced by railway.toml. | Done. |
AWS ECR/ECS client code (formerly packages/cloud-shared/src/.../ecr.ts, ecs.ts) | n/a | No client-ecr / client-ecs imports remain. | Done. |
cloud-shared/README.md stale AWS ECS/ECR/EKS/CloudFormation/CloudWatch refs | Documentation only | Replaced with Hetzner / container-control-plane / Railway language. | Done. |
| Item | Reason kept | Plan | Owner |
|---|---|---|---|
@aws-sdk/client-kms retention | Existing deployments may have provisioned KMS keys | AWSKMSProvider is now marked @deprecated and emits a logger.warn on first use. Rotate all known production secrets through SecretsEncryptionService.rotate() under LocalKMSProvider, then remove the class and drop the dep. | cloud-shared |
TERRAFORM_AWS_ROLE_ARN / GATEWAY_AWS_ROLE_ARN environment vars in GitHub | CI vars in the gateway-dev / gateway-prd environments | Workflow no longer references them; delete from GitHub environment settings. | cloud-infra |
Stage 0 — Audit + quarantine cleanup (done):
legacy-gateway-discord-aws/ quarantined terraform copy.cloud-infra/cloud/terraform/README.md to reflect the deletion.RAILWAY.md with the AWS retirement table.AWS_RETIREMENT.md.Stage 1 — gateway-discord on Railway (done):
packages/cloud-services/gateway-discord/railway.toml in place.Stage 2a — gateway-webhook on Railway (done):
packages/cloud-services/gateway-webhook/railway.toml in place.cloud-gateway-webhook.yml workflow contains no AWS jobs.Stage 2b — gateway-discord cutover (done):
terraform/ and chart/ directories deleted from cloud-services/gateway-discord/..github/workflows/cloud-gateway-discord.yml
(terraform plan/apply/destroy, AWS OIDC, EKS update-kubeconfig, Helm deploy,
image build job that fed the EKS path).TERRAFORM_AWS_ROLE_ARN and GATEWAY_AWS_ROLE_ARN from the gateway-dev /
gateway-prd GitHub environments. No code references them.Stage 3 — KMS sunset (in progress):
AWSKMSProvider is now annotated @deprecated with a migration recipe in
the JSDoc and emits a logger.warn on first construction pointing to this
document.SecretsEncryptionService.rotate()
once SECRETS_MASTER_KEY is set, then remove AWSKMSProvider and drop
@aws-sdk/client-kms from cloud-shared/package.json.Stage 4 — paper cleanup (done):
packages/cloud-shared/README.md (TOC, feature list, architecture
diagrams, schema docs, troubleshooting, additional-resources links).cloud-services/container-control-plane (Hetzner) and RAILWAY.md.The Phase 5 checks from the audit:
# Remaining AWS SDK deps — all justified in this doc.
rg "@aws-sdk/" --type json | grep -v node_modules | grep -v "/dist/"
# Remaining hard AWS env vars — only the three KMS-related ones in
# encryption.ts (justified above) and stale workflow/README references
# (tracked in the outstanding table).
rg "AWS_(ACCESS_KEY_ID|SECRET_ACCESS_KEY|REGION)" -g '!node_modules' \
-g '!dist' -g '!*.lock' --type ts --type toml --type yml
Each remaining hit corresponds to a row in this document.