hosting/k8s/helm/README.md
This Helm chart deploys Trigger.dev v4 self-hosting stack to Kubernetes.
# Build Helm dependencies (required for Bitnami charts)
helm dependency build
# Extract dependency charts for local template testing
for file in ./charts/*.tgz; do echo "Extracting $file"; tar -xzf "$file" -C ./charts; done
# Alternative: Use --dependency-update flag for template testing
helm template trigger . --dependency-update
# Deploy with default values (testing/development only)
helm install trigger .
# Deploy to specific namespace
helm install trigger . -n trigger --create-namespace
# Deploy with custom values for production
helm install trigger . -f values-production.yaml -n trigger --create-namespace
# Upgrade existing release
helm upgrade trigger .
# Upgrade with new values
helm upgrade trigger . -f values-production.yaml
kubectl port-forward svc/trigger-webapp 3040:3030 --address 0.0.0.0
Dashboard: http://localhost:3040/
# The --push arg is required when testing locally
npx trigger.dev@latest deploy --push
IMPORTANT: The default secrets are for TESTING ONLY and must be changed for production.
All secrets must be exactly 32 hexadecimal characters (16 bytes):
sessionSecret - User authentication sessionsmagicLinkSecret - Passwordless login tokensencryptionKey - Sensitive data encryptionmanagedWorkerSecret - Worker authenticationfor i in {1..4}; do openssl rand -hex 16; done
# values-production.yaml
secrets:
sessionSecret: "your-generated-secret-1"
magicLinkSecret: "your-generated-secret-2"
encryptionKey: "your-generated-secret-3"
managedWorkerSecret: "your-generated-secret-4"
objectStore:
accessKeyId: "your-s3-access-key"
secretAccessKey: "your-s3-secret-key"
This chart deploys the following components:
webapp:
# Application URLs
appOrigin: "https://trigger.example.com"
loginOrigin: "https://trigger.example.com"
apiOrigin: "https://trigger.example.com"
# Bootstrap mode (auto-creates worker group)
bootstrap:
enabled: true # Enable for combined setups
workerGroupName: "bootstrap"
Use external managed services instead of bundled components:
# External PostgreSQL
postgres:
deploy: false
external:
host: "your-postgres.rds.amazonaws.com"
port: 5432
database: "trigger"
username: "trigger_user"
password: "your-password"
# External Redis
redis:
deploy: false
external:
host: "your-redis.cache.amazonaws.com"
port: 6379
password: "your-password"
# External Docker Registry (e.g., Kind local registry)
registry:
deploy: true
external:
host: "localhost"
port: 5001
username: ""
password: ""
# Webapp ingress
webapp:
ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: trigger.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: trigger-tls
hosts:
- trigger.example.com
# Registry ingress
registry:
ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: registry.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: registry-tls
hosts:
- registry.example.com
resources:
webapp:
limits:
cpu: 2000m
memory: 4Gi
requests:
cpu: 1000m
memory: 2Gi
postgres:
primary:
resources:
limits:
cpu: 1000m
memory: 2Gi
All services support persistent storage and allow you to control the storage class globally or per service. Our internal services (Registry) now support the full Bitnami persistence configuration pattern:
global:
storageClass: "fast-ssd" # Default for all services
# Bitnami chart services (simplified configuration)
postgres:
primary:
persistence:
enabled: true
size: 10Gi
storageClass: "postgres-nvme" # Optional: override for PostgreSQL
redis:
master:
persistence:
enabled: true
size: 5Gi
storageClass: "redis-ssd" # Optional: override for Redis
clickhouse:
persistence:
enabled: true
size: 10Gi
storageClass: "analytics-hdd" # Optional: override for ClickHouse
s3:
persistence:
enabled: true
size: 10Gi
storageClass: "objectstore-ssd" # Optional: override for S3
Our internal services (Registry) support the complete Bitnami persistence configuration pattern:
# Registry - Full persistence configuration options
registry:
persistence:
enabled: true
# Name to assign the volume
volumeName: "data"
# Name of an existing PVC to use
existingClaim: ""
# The path the volume will be mounted at
mountPath: "/var/lib/registry"
# The subdirectory of the volume to mount to
subPath: ""
# PVC Storage Class for Registry data volume
storageClass: "registry-ssd"
# PVC Access Mode for Registry volume
accessModes:
- "ReadWriteOnce"
# PVC Storage Request for Registry volume
size: 10Gi
# Annotations for the PVC
annotations:
backup.velero.io/backup-volumes: "data"
# Labels for the PVC
labels:
app.kubernetes.io/component: "storage"
# Selector to match an existing Persistent Volume
selector:
matchLabels:
tier: "registry"
# Custom PVC data source
dataSource:
name: "registry-snapshot"
kind: "VolumeSnapshot"
apiGroup: "snapshot.storage.k8s.io"
# Shared persistent volume for worker token file
persistence:
shared:
enabled: true
size: 5Mi
accessMode: ReadWriteOnce
# accessMode: ReadWriteMany # Use for cross-node deployment
storageClass: ""
retain: true # Prevents deletion on uninstall
Health checks are configured for all services:
All non-Bitnami services support configurable health probes:
# Webapp health probes
webapp:
livenessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 5
successThreshold: 1
readinessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 5
successThreshold: 1
startupProbe:
enabled: false
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 60
successThreshold: 1
# Supervisor health probes
supervisor:
livenessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 5
successThreshold: 1
readinessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 5
successThreshold: 1
startupProbe:
enabled: false
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 60
successThreshold: 1
# Electric health probes
electric:
livenessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 5
successThreshold: 1
readinessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 5
successThreshold: 1
startupProbe:
enabled: false
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 60
successThreshold: 1
# Registry health probes
registry:
livenessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 5
successThreshold: 1
readinessProbe:
enabled: true
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 5
successThreshold: 1
startupProbe:
enabled: false
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 60
successThreshold: 1
ServiceMonitors are available for webapp and supervisor services:
webapp:
serviceMonitor:
enabled: true
interval: "30s"
path: "/metrics"
labels:
release: prometheus-stack
supervisor:
serviceMonitor:
enabled: true
interval: "30s"
path: "/metrics"
labels:
release: prometheus-stack
When you need to force all pods to restart (e.g., to pick up updated secrets or config):
# Force restart using timestamp annotation (Helm-native approach)
helm upgrade <release-name> . --set-string podAnnotations.restartedAt="$(date +%s)"
# Example
helm upgrade trigger . --set-string podAnnotations.restartedAt="$(date +%s)"
This approach:
After changing secrets or ConfigMaps in your values file:
# 1. Upgrade with new values
helm upgrade trigger . -f values-production.yaml
# 2. Force pod restart to pick up changes
helm upgrade trigger . -f values-production.yaml \
--set-string podAnnotations.restartedAt="$(date +%s)"
kubectl get pods -l app.kubernetes.io/name=trigger.dev
# Webapp logs
kubectl logs -l app.kubernetes.io/component=webapp
# Database logs
kubectl logs -l app.kubernetes.io/component=postgres
helm test trigger.dev
# Check Helm template syntax
helm template trigger.dev . --dry-run > /dev/null && echo "Template validation successful"
# Test webapp health endpoint (requires port forwarding)
curl -s -o /dev/null -w "%{http_code}" http://localhost:3040/healthcheck || echo "Connection failed"
# Port forward to access webapp locally
kubectl port-forward svc/trigger.dev-webapp 3040:3030 --address 0.0.0.0
npx trigger.dev@latest deploy --pushSee values-production-example.yaml for a complete production configuration example.
The Helm chart uses three types of versions:
Chart.yaml:version) - Helm chart packaging versionChart.yaml:appVersion) - Trigger.dev application versionvalues.yaml) - Individual service versions (Electric, ClickHouse, etc.)Update Chart Version for chart changes:
# Edit Chart.yaml
version: 4.1.0 # Increment for chart changes (semver)
Update App Version when Trigger.dev releases new version:
# Edit Chart.yaml
appVersion: "v4.1.0" # Match Trigger.dev release (v-prefixed image tag)
Release via GitHub:
# Tag and push
git tag helm-v4.1.0
git push origin helm-v4.1.0
# GitHub Actions will automatically build and publish to GHCR
# Install specific chart version
helm upgrade --install trigger \
oci://ghcr.io/triggerdotdev/charts/trigger.dev \
--version 4.1.0
# Install latest chart version
helm upgrade --install trigger \
oci://ghcr.io/triggerdotdev/charts/trigger.dev
# Override app version (advanced)
helm upgrade --install trigger . \
--set webapp.image.tag=v4.0.1
Generate unique secrets (never use defaults):
# Generate 4 secrets
for i in {1..4}; do openssl rand -hex 16; done
Configure security contexts:
webapp:
podSecurityContext:
fsGroup: 1000
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
Enable network policies (if supported by cluster)
Configure proper RBAC for supervisor
Use TLS ingress with cert-manager
Set resource limits and requests - for example:
webapp:
resources:
limits:
cpu: 2000m
memory: 4Gi
requests:
cpu: 1000m
memory: 2Gi
postgres:
primary:
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
redis:
master:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
# ClickHouse can be very resource intensive, so we recommend setting limits and requests accordingly
# Note: not doing this can cause OOM crashes which will cause issues across many different features
clickhouse:
resources:
limits:
cpu: 4000m
memory: 16Gi
requests:
cpu: 2000m
memory: 8Gi
supervisor:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
Configure persistent storage for all services - for example:
global:
storageClass: "fast-nvme" # Default for all services
postgres:
primary:
persistence:
size: 500Gi
redis:
master:
persistence:
size: 20Gi
clickhouse:
persistence:
size: 100Gi
s3:
persistence:
size: 200Gi
# Internal services support full Bitnami-style configuration
registry:
persistence:
enabled: true
size: 100Gi
storageClass: "registry-ssd"
annotations:
backup.velero.io/backup-volumes: "data"