Back to Tuist

CloudNativePG

infra/cnpg/README.md

4.195.155.3 KB
Original Source

CloudNativePG

In-cluster Postgres operated by the CloudNativePG operator. Replaces the Supabase-managed Postgres the managed deployment previously connected to.

The chart-rendered Cluster CR (infra/helm/tuist/templates/postgresql-cnpg.yaml) owns instance count, storage, sync replication, role lifecycle, and the WAL archive. This directory holds the SQL files that grant per-table privileges that don't fit managed.roles[].inRoles — they run once per fresh cluster bootstrap as the cluster's superuser.

Files

  • tuist-processor-grants.sqloban_jobs / oban_peers write grants for the tuist_processor role. The role itself is created declaratively by CNPG via managed.roles[]; this file adds the per-table privileges the worker needs.
  • tuist-ops-ro-grants.sqlCONNECT on the application database for the tuist_ops_ro role, plus an explicit REVOKE of write privileges on public (defense-in-depth against a future grant-by-default change in Postgres widening pg_read_all_data). The role is for ad-hoc operator psql access; the /ops/db LiveView uses Tuist.Repo and enforces read-only at the app layer (see Tuist.Ops.Database.execute/2).

When to run

  • Once per env, immediately after the CNPG Cluster reports phase: Cluster in healthy state — the cluster has bootstrapped its primary, ESO has synced the managed-role password Secrets, and CNPG has created the roles themselves. The oban_jobs table only exists after the first Ecto migration, so run these after the migration Job ran for the first time too.
  • After a fresh cluster restore from a backuppg_basebackup-style restores re-create role objects but not the per-table GRANT state, so the SQL re-runs are needed.

The files use GRANT … TO <role> against pre-existing tables and roles, so re-running them on an existing cluster is a no-op outside of explicit grant changes.

How to run

The cluster's postgres superuser Secret (<cluster-name>-superuser) is generated by CNPG and only readable by an operator with secrets/get on the cluster's namespace. The kubectl-cnpg plugin gives you a one-liner that opens a psql session as that superuser inside an ephemeral pod on the cluster's network:

bash
ENV=staging  # or canary | production
NAMESPACE=tuist-$ENV
CLUSTER=tuist-tuist-pg

# -d tuist switches psql to the application database that CNPG creates
# via `bootstrap.initdb.database`. The maintenance database (`postgres`)
# the cluster's superuser defaults to does not have the application's
# schema, so GRANTs against `oban_jobs`/`accounts`/`projects` need the
# right -d on the psql side.
kubectl cnpg psql -n "$NAMESPACE" "$CLUSTER" -- -d tuist -f - \
  < infra/cnpg/tuist-processor-grants.sql

kubectl cnpg psql -n "$NAMESPACE" "$CLUSTER" -- -d tuist -f - \
  < infra/cnpg/tuist-ops-ro-grants.sql

Each file ends with a sanity-check SELECT … information_schema.role_table_grants … query that prints the exact privilege set the role holds after the run. A clean run shows the expected arwd/postgres shape on the granted tables.

Why not an Ecto migration

Same three reasons as the old infra/supabase/ directory — CREATE ROLE is superuser-only, role state is infra rather than app schema, and CNPG's declarative role surface keeps the future operator-driven path open. CNPG also offers bootstrap.initdb.postInitApplicationSQL for SQL to run on the very first cluster bootstrap, but it only fires once and never re-runs (e.g., not on a backup restore), so we keep the grants in a re-runnable file instead of inlining them in the Cluster CR.

Password rotation

bash
ENV=staging   # or canary | production
ROLE=tuist_processor  # or tuist_ops_ro

NEW_PW="$(openssl rand -base64 32 | tr -d '/+=')"
op item edit "op://tuist-k8s-$ENV/$(echo "$ROLE" | tr '[:lower:]' '[:upper:]')_PASSWORD" password="$NEW_PW"

# Force ESO to re-sync before CNPG's next reconcile, otherwise the
# rotation takes up to `refreshInterval` to land.
kubectl -n tuist-$ENV annotate externalsecret \
  tuist-tuist-pg-$ROLE force-sync=$(date +%s) --overwrite

CNPG sees the updated Secret on its next reconcile and runs ALTER ROLE … PASSWORD … itself; no manual psql step.

Backup configuration

Continuous WAL archiving + daily 03:00 UTC base backups, both targeting the per-env Tigris bucket (tuist-{stag,can,prod}-pg-backups). Retention is 30 days; older base backups (and the WALs they pin) are pruned by barman on the next backup run.

The Tigris key used for backups is separate from the workload's S3_CREDENTIALS — a dedicated S3_BACKUP_CREDENTIALS item per env, scoped to the env's backup bucket only. Rotate it independently from the workload key.

Restore-validation drill (run quarterly, or before any operation that depends on backups being usable):

bash
ENV=staging
NAMESPACE=tuist-$ENV
CLUSTER=tuist-tuist-pg

# Create a one-shot cluster restored to the latest available recovery
# point. CNPG provisions a new primary, replays from the WAL archive,
# and reports the recovery target it landed on.
kubectl cnpg backup-status -n "$NAMESPACE" "$CLUSTER"
# Then build a restore manifest using `kubectl cnpg restore` (see the
# CNPG docs); destroy it once the validation queries finish.