Back to Eliza

AWS retirement plan

packages/cloud-infra/cloud/AWS_RETIREMENT.md

2.0.18.0 KB
Original Source

AWS retirement plan

Living audit of AWS dependencies in this repo, what they do, what they were migrated to (or kept and redirected), and what is still outstanding.

Pair this with RAILWAY.md (where central services run today) and the storage notes in ../../cloud-shared/src/lib/storage/s3-compatible-client.ts (how the S3 SDK is pointed at Cloudflare R2 / Supabase / generic S3 endpoints).

TL;DR

We are retiring AWS as a primary backend. The replacements per surface:

AWS serviceReplacement
S3Cloudflare R2 (via @aws-sdk/client-s3 against R2 endpoint)
KMSLocalKMSProvider (AES-256-GCM with SECRETS_MASTER_KEY)
ECS / EKS containersHetzner via container-control-plane
LambdaCloudflare Workers (cloud-api, gateway-*)
RDSNeon Postgres
ElastiCacheUpstash Redis (managed) / redis package
CloudFront / Route53Cloudflare (Pages + DNS)
SQS / SNSNot currently used in core services

Classification

(K) Keep — provider-agnostic, points at non-AWS backend

DependencyWhereWhy kept
@aws-sdk/client-s3packages/cloud-sharedS3 wire protocol is the de-facto standard for object storage. s3-compatible-client.ts resolves STORAGE_PROVIDER (r2, supabase, s3) and points the client at R2 (*.r2.cloudflarestorage.com), self-hosted Supabase, or any generic S3 endpoint. There is no AWS S3 account in the production path.
@aws-sdk/client-kmspackages/cloud-shared/src/lib/services/secrets/encryption.tsLazy-loaded inside AWSKMSProvider. Default is LocalKMSProvider (AES-256-GCM with SECRETS_MASTER_KEY). The AWS provider is only constructed when AWS_KMS_KEY_ID is set — kept so existing deployments with a provisioned KMS key continue to decrypt their secrets. New deployments use LocalKMSProvider.
packages/examples/aws/Examples packageA documentation example showing how to run an elizaOS worker on AWS Lambda + API Gateway. Not part of Eliza Cloud infrastructure. Keep as a user-facing example.
s3-storage plugin registry entry (AWS_* config fields)packages/app-core/src/registry/entries/plugins/s3-storage.jsonGeneric user-facing S3 plugin. The AWS_REGION / AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_S3_ENDPOINT schema fields are how the broader S3 ecosystem labels these creds, regardless of provider. Keep.
plugin-sql/src/config.toml references to AWS_ACCESS_KEY_IDGeneric Supabase/S3 storage configSame reason — config keys are the standard names. Keep.
plugin-registry env-var prefix allowlist (AWS_)plugins/plugin-registry/src/api/app-plugins-routes.ts line 321Catch-all prefix used to identify infra credentials for any provider that uses AWS_* env names (R2, MinIO, etc). Keep.

(M) Migrate — actual AWS dependency that should leave

DependencyWhereTargetStatus
gateway-discord EKS deploymentpackages/cloud-services/gateway-discord/terraform/ and .github/workflows/cloud-gateway-discord.ymlRailway (railway.toml + Dockerfile). Service is already a Bun/Docker app.Done. terraform/ and chart/ removed; railway.toml added; workflow stripped of all AWS jobs (terraform plan/apply/destroy, EKS update-kubeconfig, ECR/Helm deploy). Workflow now only runs tests; Railway auto-deploys on push.
gateway-webhook EKS deploymentpackages/cloud-services/gateway-webhook/Railway.Done. No terraform/ directory existed; railway.toml is in place; workflow has no AWS jobs.
TERRAFORM_AWS_ROLE_ARN / GATEWAY_AWS_ROLE_ARN GitHub secretsCI varsRemove from the gateway-dev / gateway-prd environments now that the workflow no longer references them.Ready for removal. Workflow no longer reads either variable; the env-vars in the GitHub environments themselves should be deleted by a maintainer with org-admin access.

(D) Delete — legacy / unreachable

PathLinesWhyStatus
packages/cloud-infra/cloud/terraform/legacy-gateway-discord-aws/19 files / ~1.9k LOCExplicitly named "legacy"; duplicate of cloud-services/gateway-discord/terraform/; quarantined per the parent README.Done.
packages/cloud-services/gateway-discord/terraform/ and chart/Active EKS terraform + Helm chartReplaced by railway.toml.Done.
AWS ECR/ECS client code (formerly packages/cloud-shared/src/.../ecr.ts, ecs.ts)n/aNo client-ecr / client-ecs imports remain.Done.
cloud-shared/README.md stale AWS ECS/ECR/EKS/CloudFormation/CloudWatch refsDocumentation onlyReplaced with Hetzner / container-control-plane / Railway language.Done.

Outstanding AWS dependencies

ItemReason keptPlanOwner
@aws-sdk/client-kms retentionExisting deployments may have provisioned KMS keysAWSKMSProvider is now marked @deprecated and emits a logger.warn on first use. Rotate all known production secrets through SecretsEncryptionService.rotate() under LocalKMSProvider, then remove the class and drop the dep.cloud-shared
TERRAFORM_AWS_ROLE_ARN / GATEWAY_AWS_ROLE_ARN environment vars in GitHubCI vars in the gateway-dev / gateway-prd environmentsWorkflow no longer references them; delete from GitHub environment settings.cloud-infra

Staged retirement plan

  1. Stage 0 — Audit + quarantine cleanup (done):

    • Deleted legacy-gateway-discord-aws/ quarantined terraform copy.
    • Updated cloud-infra/cloud/terraform/README.md to reflect the deletion.
    • Extended RAILWAY.md with the AWS retirement table.
    • Published this AWS_RETIREMENT.md.
  2. Stage 1 — gateway-discord on Railway (done):

    • packages/cloud-services/gateway-discord/railway.toml in place.
    • Existing Dockerfile builds on Railway.
  3. Stage 2a — gateway-webhook on Railway (done):

    • packages/cloud-services/gateway-webhook/railway.toml in place.
    • cloud-gateway-webhook.yml workflow contains no AWS jobs.
  4. Stage 2b — gateway-discord cutover (done):

    • terraform/ and chart/ directories deleted from cloud-services/gateway-discord/.
    • All AWS-specific jobs removed from .github/workflows/cloud-gateway-discord.yml (terraform plan/apply/destroy, AWS OIDC, EKS update-kubeconfig, Helm deploy, image build job that fed the EKS path).
    • Remaining task: maintainer with org-admin to remove TERRAFORM_AWS_ROLE_ARN and GATEWAY_AWS_ROLE_ARN from the gateway-dev / gateway-prd GitHub environments. No code references them.
  5. Stage 3 — KMS sunset (in progress):

    • AWSKMSProvider is now annotated @deprecated with a migration recipe in the JSDoc and emits a logger.warn on first construction pointing to this document.
    • Outstanding: rotate any remaining production secrets that were encrypted under an AWS KMS-derived DEK through SecretsEncryptionService.rotate() once SECRETS_MASTER_KEY is set, then remove AWSKMSProvider and drop @aws-sdk/client-kms from cloud-shared/package.json.
  6. Stage 4 — paper cleanup (done):

    • All AWS ECS/ECR/EKS/CloudFormation/CloudWatch references pruned from packages/cloud-shared/README.md (TOC, feature list, architecture diagrams, schema docs, troubleshooting, additional-resources links).
    • Container deployment documentation now points at cloud-services/container-control-plane (Hetzner) and RAILWAY.md.

Verification

The Phase 5 checks from the audit:

bash
# Remaining AWS SDK deps — all justified in this doc.
rg "@aws-sdk/" --type json | grep -v node_modules | grep -v "/dist/"

# Remaining hard AWS env vars — only the three KMS-related ones in
# encryption.ts (justified above) and stale workflow/README references
# (tracked in the outstanding table).
rg "AWS_(ACCESS_KEY_ID|SECRET_ACCESS_KEY|REGION)" -g '!node_modules' \
  -g '!dist' -g '!*.lock' --type ts --type toml --type yml

Each remaining hit corresponds to a row in this document.