docs/craft/infra/todos.md
Things to codify so enabling Craft on a new cluster becomes "set ENABLE_CRAFT=true in values.yaml" instead of following a manual setup guide. Each item is independent — ship in any order.
Render everything in the sandbox namespace required for Craft via the Helm chart when ENABLE_CRAFT=true:
eks.amazonaws.com/skip-containers=sandbox annotation so only the sidecar container receives cloud credentials.pods, pods/exec, services verbs.Source identifiers (IAM role ARN, bound SA names) from a configurable values block and mark them required so a misconfigured deploy fails fast.
This removes the need for manual kubectl annotate and kubectl create rolebinding steps when onboarding a new cluster. Existing clusters whose Role is currently shipped via raw manifests / external GitOps need a one-time cleanup so the chart becomes the single source of truth.
A shared Terraform module that provisions the cloud-side prerequisites for Craft on a given cluster:
role_arn, bucket_name) to wire into the cluster's Helm valuesExisting buckets/roles on already-deployed clusters need to be imported into module state, not recreated.
A dedicated sandbox node group must carry the same set of security groups that the cluster's regular managed node groups carry — typically the EKS cluster SG plus the shared node SG. If the launch template only attaches the cluster SG, pods on sandbox nodes can't reach pods on the regular node group (DNS, in-cluster service calls, etc.) because the shared node SG's ingress is self-referential.
Acceptance: the Terraform launch template for the sandbox node group attaches both SGs by default, matching how EKS managed node groups normally provision.
Enforce IMDSv2 with hop-limit 1 on the sandbox node group via Terraform so containers can't reach the instance metadata service. If a cluster's node group is currently managed outside Terraform, converting it is a prerequisite.
Acceptance: from inside any sandbox pod, a curl to the metadata service times out.
Replicate the production network-firewall setup in every region that runs Craft. The firewall should:
0.0.0.0/0 at the firewall endpointAcceptance: from inside a sandbox pod, RFC1918 + metadata-service requests fail; normal outbound HTTPS to LLM providers still works.
Onboarding a new Craft cluster becomes:
terraform apply against the cluster (provisions bucket + role, sets metadata hop-limit).role_arn and bucket_name from terraform outputs into the cluster's Helm values alongside ENABLE_CRAFT: "true".helm upgrade (creates namespace, SA, network policy).Item 5 is independent and bolts on to any cluster after the rest is in place.