stable/airflow/README.md
Airflow is a platform to programmatically author, schedule and monitor workflows.
(Helm 2) install the Airflow Helm Chart:
helm install stable/airflow \
--name "airflow" \
--version "X.X.X" \
--namespace "airflow" \
--values ./custom-values.yaml
(Helm 3) install the Airflow Helm Chart:
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo update
helm install "airflow" stable/airflow \
--version "X.X.X" \
--namespace "airflow" \
--values ./custom-values.yaml
Get the status of the Airflow Helm Chart:
helm status "airflow"
Uninstall the Airflow Helm Chart:
helm delete "airflow"
Run bash commands in the Airflow Webserver Pod:
# create an interactive bash session in the Webserver Pod
# use this bash session for commands like: `airflow create_user`
kubectl exec \
-it \
--namespace airflow \
--container airflow-web \
Deployment/airflow-web \
/bin/bash
Chart version numbers: Chart.yaml or Artifact Hub
Here are some starting points for your custom-values.yaml:
| Name | File | Description |
|---|---|---|
| (CeleryExecutor) Minimal | examples/minikube/custom-values.yaml | a non-production starting point |
| (CeleryExecutor) Google Cloud | examples/google-gke/custom-values.yaml | a production starting point for GKE on Google Cloud |
While we don't expose the airflow.cfg directly, you can use environment variables to set Airflow configs.
We expose the airflow.config value to make this easier:
airflow:
config:
## Security
AIRFLOW__CORE__SECURE_MODE: "True"
AIRFLOW__API__AUTH_BACKEND: "airflow.api.auth.backend.deny_all"
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: "False"
AIRFLOW__WEBSERVER__RBAC: "False"
## DAGS
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "30"
AIRFLOW__CORE__LOAD_EXAMPLES: "False"
## Email (SMTP)
AIRFLOW__EMAIL__EMAIL_BACKEND: "airflow.utils.email.send_email_smtp"
AIRFLOW__SMTP__SMTP_HOST: "smtpmail.example.com"
AIRFLOW__SMTP__SMTP_STARTTLS: "False"
AIRFLOW__SMTP__SMTP_SSL: "False"
AIRFLOW__SMTP__SMTP_PORT: "25"
AIRFLOW__SMTP__SMTP_MAIL_FROM: "[email protected]"
## Disable noisy "Handling signal: ttou" Gunicorn log messages
GUNICORN_CMD_ARGS: "--log-level WARNING"
We expose the scheduler.connections value to specify Airflow Connections, which will be automatically imported by the airflow-scheduler when it starts up.
By default, we will delete and re-create connections each time the airflow-scheduler restarts.
(If you want to manually modify a connection in the WebUI, you should disable this behaviour by setting scheduler.refreshConnections to false)
For example, to add a connection called my_aws:
scheduler:
connections:
- id: my_aws
type: aws
extra: |
{
"aws_access_key_id": "XXXXXXXX",
"aws_secret_access_key": "XXXXXXXX",
"region_name":"eu-central-1"
}
If you don't want to store connections in your values.yaml, use scheduler.existingSecretConnections to specify the name of an existing Kubernetes Secret containing an add-connections.sh script.
Note, your script will be run EACH TIME the airflow-scheduler Pod restarts, and scheduler.connections will not longer work.
Here is an example Secret you might create:
apiVersion: v1
kind: Secret
metadata:
name: my-airflow-connections
type: Opaque
stringData:
add-connections.sh: |
#!/usr/bin/env bash
# remove any existing connection
airflow connections --delete \
--conn_id "my_aws"
# re-add your custom connection
airflow connections --add \
--conn_id "my_aws" \
--conn_type "aws" \
--conn_extra "{\"aws_access_key_id\": \"XXXXXXXX\", \"aws_secret_access_key\": \"XXXXXXXX\", \"region_name\":\"eu-central-1\"}"
We expose the scheduler.variables value to specify Airflow Variables, which will be automatically imported by the airflow-scheduler when it starts up.
For example, to specify a variable called environment:
scheduler:
variables: |
{ "environment": "dev" }
We expose the scheduler.pools value to specify Airflow Pools, which will be automatically imported by the Airflow scheduler when it starts up.
For example, to create a pool called example:
scheduler:
pools: |
{
"example": {
"description": "This is an example pool with 2 slots.",
"slots": 2
}
}
We expose the airflow.extraEnv value to mount extra environment variables, this can be used to pass sensitive configs to Airflow.
For example, passing a Fernet key and LDAP password, (the airflow and ldap Kubernetes Secrets must already exist):
airflow:
extraEnv:
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: airflow
key: fernet-key
- name: AIRFLOW__LDAP__BIND_PASSWORD
valueFrom:
secretKeyRef:
name: ldap
key: password
We expose the airflow.extraConfigmapMounts value to mount extra Kubernetes ConfigMaps.
For example, a webserver_config.py file:
airflow:
extraConfigmapMounts:
- name: my-webserver-config
mountPath: /opt/airflow/webserver_config.py
configMap: my-airflow-webserver-config
readOnly: true
subPath: webserver_config.py
To create the my-airflow-webserver-config ConfigMap, you could use:
kubectl create configmap \
my-airflow-webserver-config \
--from-file=webserver_config.py \
--namespace airflow
We expose the airflow.extraPipPackages and web.extraPipPackages values to install Python Pip packages, these will work with any pip package that you can install with pip install XXXX.
For example, enabling the airflow airflow-exporter package:
airflow:
extraPipPackages:
- "airflow-exporter==1.3.1"
For example, you may be using flask_oauthlib to integrate with Okta/Google/etc for authorizing WebUI users:
web:
extraPipPackages:
- "apache-airflow[google_auth]==1.10.12"
This chart provides an optional Kubernetes Ingress resource, for accessing airflow-webserver and airflow-flower outside of the cluster.
If you already have something hosted at the root of your domain, you might want to place airflow under a URL-prefix:
In this example, would set these values:
web:
baseUrl: "http://example.com/airflow/"
flower:
urlPrefix: "/airflow/flower"
ingress:
web:
path: "/airflow"
flower:
path: "/airflow/flower"
We expose the ingress.web.precedingPaths and ingress.web.succeedingPaths values, which are before and after the default path respectively.
A common use-case is enabling https with the aws-alb-ingress-controller ssl-redirect, which needs a redirect path to be hit before the airflow-webserver one.
You would set the values of precedingPaths as the following:
ingress:
web:
precedingPaths:
- path: "/*"
serviceName: "ssl-redirect"
servicePort: "use-annotation"
We use a Kubernetes StatefulSet for the Celery workers, this allows the webserver to requests logs from each workers individually, with a fixed DNS name.
Celery workers can be scaled using the Horizontal Pod Autoscaler.
To enable autoscaling, you must set workers.autoscaling.enabled=true, then provide workers.autoscaling.maxReplicas, and workers.replicas for the minimum amount.
Assume every task a worker executes consumes approximately 200Mi memory, that means memory is a good metric for utilisation monitoring.
For a worker pod you can calculate it: WORKER_CONCURRENCY * 200Mi, so for 10 tasks a worker will consume ~2Gi of memory.
In the following config if a worker consumes 80% of 2Gi (which will happen if it runs 9-10 tasks at the same time), an autoscaling event will be triggered, and a new worker will be added.
If you have many tasks in a queue, Kubernetes will keep adding workers until maxReplicas reached, in this case 16.
workers:
# the initial/minimum number of workers
replicas: 2
resources:
requests:
memory: "2Gi"
podDisruptionBudget:
enabled: true
## prevents losing more than 20% of current worker task slots in a voluntary disruption
maxUnavailable: "20%"
autoscaling:
enabled: true
maxReplicas: 16
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
celery:
instances: 10
## wait at most 9min for running tasks to complete before SIGTERM
## WARNING:
## - some cluster-autoscaler (GKE) will not respect graceful
## termination periods over 10min
gracefullTermination: true
gracefullTerminationPeriod: 540
## how many seconds (after the 9min) to wait before SIGKILL
terminationPeriod: 60
dags:
git:
gitSync:
resources:
requests:
## IMPORTANT! for autoscaling to work
memory: "64Mi"
We expose the workers.secrets value to allow mounting secrets at {workers.secretsDir}/<secret-name> in airflow-worker Pods.
For example, mounting password Secrets:
workers:
secretsDir: /var/airflow/secrets
secrets:
- redshift-user
- redshift-password
- elasticsearch-user
- elasticsearch-password
With the above configuration, you could read the redshift-user password from within a DAG or Python function using:
import os
from pathlib import Path
def get_secret(secret_name):
secrets_dir = Path('/var/airflow/secrets')
secret_path = secrets_dir / secret_name
assert secret_path.exists(), f'could not find {secret_name} at {secret_path}'
secret_data = secret_path.read_text().strip()
return secret_data
redshift_user = get_secret('redshift-user')
To create the redshift-user Secret, you could use:
kubectl create secret generic \
redshift-user \
--from-literal=redshift-user=MY_REDSHIFT_USERNAME \
--namespace airflow
We expose the extraManifests.[] value to add custom Kubernetes manifests to the chart.
For example, adding a BackendConfig resource for GKE:
extraManifests:
- apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: "{{ .Release.Name }}-test"
spec:
securityPolicy:
name: "gcp-cloud-armor-policy-test"
If the value scheduler.initdb is set to true (this is the default), the airflow-scheduler container will run airflow initdb as part of its startup script.
If the value scheduler.preinitdb is set to true, then we ALSO RUN airflow initdb in an init-container (retrying 5 times).
This is unusually NOT necessary unless your synced DAGs include custom database hooks that prevent airflow initdb from running.
PostgreSQL is the default database in this chart, because we use insecure username/password combinations by default, you should create secure credentials before installing the Helm chart.
Example bash command to create the required Kubernetes Secrets:
# set postgress password
kubectl create secret generic \
airflow-postgresql \
--from-literal=postgresql-password=$(openssl rand -base64 13) \
--namespace airflow
# set redis password
kubectl create secret generic \
airflow-redis \
--from-literal=redis-password=$(openssl rand -base64 13) \
--namespace airflow
Example values.yaml, to use those secrets:
postgresql:
existingSecret: airflow-postgresql
redis:
existingSecret: airflow-redis
While this chart comes with an embedded stable/postgresql, this is NOT SUITABLE for production.
You should make use of an external mysql or postgres database, for example, one that is managed by your cloud provider.
Example values for an external Postgres database, with an existing airflow_cluster1 database:
externalDatabase:
type: postgres
host: postgres.example.org
port: 5432
database: airflow_cluster1
user: airflow_cluster1
passwordSecret: "airflow-cluster1-postgres-password"
passwordSecretKey: "postgresql-password"
WARNING: Airflow requires that explicit_defaults_for_timestamp=1 in your MySQL instance, see here
Example values for an external MySQL database, with an existing airflow_cluster1 database:
externalDatabase:
type: mysql
host: mysql.example.org
port: 3306
database: airflow_cluster1
user: airflow_cluster1
passwordSecret: "airflow-cluster1-mysql-password"
passwordSecretKey: "mysql-password"
By default, logs from the airflow-web/scheduler/worker are written within the Docker container's filesystem, therefore any restart of the pod will wipe the logs. For a production deployment, you will likely want to persist the logs.
You must give airflow credentials for it to read/write on the remote bucket, this can be achieved with AIRFLOW__CORE__REMOTE_LOG_CONN_ID, or by using something like Workload Identity (GKE), or IAM Roles for Service Accounts (EKS).
Example, using AIRFLOW__CORE__REMOTE_LOG_CONN_ID (can be used with AWS too):
airflow:
config:
AIRFLOW__CORE__REMOTE_LOGGING: "True"
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "gs://<<MY-BUCKET-NAME>>/airflow/logs"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "google_cloud_airflow"
scheduler:
connections:
- id: google_cloud_airflow
type: google_cloud_platform
extra: |-
{
"extra__google_cloud_platform__num_retries": "5",
"extra__google_cloud_platform__keyfile_dict": "{...}"
}
Example, using Workload Identity (GKE):
airflow:
config:
AIRFLOW__CORE__REMOTE_LOGGING: "True"
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "gs://<<MY-BUCKET-NAME>>/airflow/logs"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "google_cloud_default"
serviceAccount:
annotations:
iam.gke.io/gcp-service-account: "<<MY-ROLE-NAME>>@<<MY-PROJECT-NAME>>.iam.gserviceaccount.com"
Example, using IAM Roles for Service Accounts (EKS):
airflow:
config:
AIRFLOW__CORE__REMOTE_LOGGING: "True"
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "s3://<<MY-BUCKET-NAME>>/airflow/logs"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "aws_default"
scheduler:
securityContext:
fsGroup: 65534
web:
securityContext:
fsGroup: 65534
workers:
securityContext:
fsGroup: 65534
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::XXXXXXXXXX:role/<<MY-ROLE-NAME>>"
logs:
persistence:
enabled: true
The service monitor is something introduced by the CoresOS Prometheus Operator. To be able to expose metrics to prometheus you need install a plugin, this can be added to the docker image. A good one is epoch8/airflow-exporter, which exposes dag and task based metrics from Airflow.
For more information, see the serviceMonitor section of values.yaml.
This method places a git sidecar in each worker/scheduler/web Kubernetes Pod, that perpetually syncs your git repo into the dag folder every dags.git.gitSync.refreshTime seconds.
WARNING: In the dags.git.secret the known_hosts file is present to reduce the possibility of a man-in-the-middle attack.
However, if you want to implicitly trust all repo host signatures set dags.git.sshKeyscan to true.
dags:
git:
url: ssh://[email protected]/example.git
repoHost: repo.example.com
secret: airflow-git-keys
privateKeyName: id_rsa
gitSync:
enabled: true
refreshTime: 60
You can create the dags.git.secret from your local ~/.ssh folder using:
kubectl create secret generic \
airflow-git-keys \
--from-file=id_rsa=~/.ssh/id_rsa \
--from-file=id_rsa.pub=~/.ssh/id_rsa.pub \
--from-file=known_hosts=~/.ssh/known_hosts \
--namespace airflow
This method stores your DAGs in a Kubernetes Persistent Volume Claim (PVC), you must use some external system to ensure this volume has your latest DAGs. For example, you could use your CI/CD pipeline system to preform a sync as changes are pushed to a git repo.
Since ALL Pods MUST HAVE the same collection of DAG files, it is recommended to create just one PVC that is shared.
To share a PVC with multiple Pods, the PVC needs to have accessMode set to ReadOnlyMany or ReadWriteMany (Note: different StorageClass support different access modes).
If you are using Kubernetes on a public cloud, a persistent volume controller is likely built in:
Amazon EKS,
Azure AKS,
Google GKE.
For example, to use the storage class called default:
dags:
persistence:
enabled: true
storageClass: default
accessMode: ReadOnlyMany
size: 1Gi
You may want to store DAGs and logs on the same volume and configure Airflow to use subdirectories for them.
WARNING: you must use a PVC which supports accessMode: ReadWriteMany
Here's an approach that achieves this:
airflow.extraVolume and airflow.extraVolumeMount to put a volume at /opt/airflow/efsdags.persistence.enabled and logs.persistence.enabled to be falsedags.path to be /opt/airflow/efs/dagslogs.path to be /opt/airflow/efs/logsWe expose the dags.installRequirements value to enable installing any requirements.txt found at the root of your dags.path folder as airflow-workers start.
WARNING: if you update the requirements.txt, you will have to restart your airflow-workers for changes to take effect
NOTE: you might also want to consider using airflow.extraPipPackages
Full documentation can be found in the comments of the values.yaml file, but a high level overview is provided here.
Global Values:
| Parameter | Description | Default |
|---|---|---|
airflow.image.* | configs for the docker image of the web/scheduler/worker | <see values.yaml> |
airflow.executor | the airflow executor type to use | CeleryExecutor |
airflow.fernetKey | the fernet key used to encrypt the connections/variables in the database | 7T512UXSSmBOkpWimFHIVb8jK6lfmSAvx4mO6Arehnc= |
airflow.config | environment variables for the web/scheduler/worker pods (for airflow configs) | {} |
airflow.podAnnotations | extra annotations for the web/scheduler/worker/flower Pods | {} |
airflow.extraEnv | extra environment variables for the web/scheduler/worker/flower Pods | [] |
airflow.extraConfigmapMounts | extra configMap volumeMounts for the web/scheduler/worker/flower Pods | [] |
airflow.extraContainers | extra containers for the web/scheduler/worker Pods | [] |
airflow.extraPipPackages | extra pip packages to install in the web/scheduler/worker Pods | [] |
airflow.extraVolumeMounts | extra volumeMounts for the web/scheduler/worker Pods | [] |
airflow.extraVolumes | extra volumes for the web/scheduler/worker Pods | [] |
Airflow Scheduler values:
| Parameter | Description | Default |
|---|---|---|
scheduler.resources | resource requests/limits for the scheduler Pods | {} |
scheduler.nodeSelector | the nodeSelector configs for the scheduler Pods | {} |
scheduler.affinity | the affinity configs for the scheduler Pods | {} |
scheduler.tolerations | the toleration configs for the scheduler Pods | [] |
scheduler.securityContext | the security context for the scheduler Pods | {} |
scheduler.labels | labels for the scheduler Deployment | {} |
scheduler.podLabels | Pod labels for the scheduler Deployment | {} |
scheduler.annotations | annotations for the scheduler Deployment | {} |
scheduler.podAnnotations | Pod Annotations for the scheduler Deployment | {} |
scheduler.safeToEvict | if we should tell Kubernetes Autoscaler that its safe to evict these Pods | true |
scheduler.podDisruptionBudget.* | configs for the PodDisruptionBudget of the scheduler | <see values.yaml> |
scheduler.connections | custom airflow connections for the airflow scheduler | [] |
scheduler.refreshConnections | if scheduler.connections are deleted and re-added after each scheduler restart | true |
scheduler.existingSecretConnections | the name of an existing Secret containing an add-connections.sh script to run on scheduler start | "" |
scheduler.variables | custom airflow variables for the airflow scheduler | "{}" |
scheduler.pools | custom airflow pools for the airflow scheduler | "{}" |
scheduler.numRuns | the value of the airflow --num_runs parameter used to run the airflow scheduler | -1 |
scheduler.initdb | if we run airflow initdb when the scheduler starts | true |
scheduler.preinitdb | if we run airflow initdb inside a special initContainer | false |
scheduler.initialStartupDelay | the number of seconds to wait (in bash) before starting the scheduler container | 0 |
scheduler.livenessProbe.* | configs for the scheduler liveness probe | <see values.yaml> |
scheduler.extraInitContainers | extra init containers to run before the scheduler pod | [] |
Airflow Webserver Values:
| Parameter | Description | Default |
|---|---|---|
web.resources | resource requests/limits for the airflow web pods | {} |
web.replicas | the number of web Pods to run | 1 |
web.nodeSelector | the number of web Pods to run | {} |
web.affinity | the affinity configs for the web Pods | {} |
web.tolerations | the toleration configs for the web Pods | [] |
web.securityContext | the security context for the web Pods | {} |
web.labels | labels for the web Deployment | {} |
web.podLabels | Pod labels for the web Deployment | {} |
web.annotations | annotations for the web Deployment | {} |
web.podAnnotations | Pod annotations for the web Deployment | {} |
web.safeToEvict | if we should tell Kubernetes Autoscaler that its safe to evict these Pods | true |
web.podDisruptionBudget.* | configs for the PodDisruptionBudget of the web Deployment | <see values.yaml> |
web.service.* | configs for the Service of the web pods | <see values.yaml> |
web.baseUrl | sets AIRFLOW__WEBSERVER__BASE_URL | http://localhost:8080 |
web.serializeDAGs | sets AIRFLOW__CORE__STORE_SERIALIZED_DAGS | false |
web.extraPipPackages | extra pip packages to install in the web container | [] |
web.initialStartupDelay | the number of seconds to wait (in bash) before starting the web container | 0 |
web.minReadySeconds | the number of seconds to wait before declaring a new Pod available | 5 |
web.readinessProbe.* | configs for the web Service readiness probe | <see values.yaml> |
web.livenessProbe.* | configs for the web Service liveness probe | <see values.yaml> |
web.secretsDir | the directory in which to mount secrets on web containers | /var/airflow/secrets |
web.secrets | the names of existing Kubernetes Secrets to mount as files at {workers.secretsDir}/<secret_name>/<keys_in_secret> | [] |
web.secretsMap | the name of an existing Kubernetes Secret to mount as files to {web.secretsDir}/<keys_in_secret> | "" |
Airflow Worker Values:
| Parameter | Description | Default |
|---|---|---|
workers.enabled | if the airflow workers StatefulSet should be deployed | true |
workers.resources | resource requests/limits for the airflow worker Pods | {} |
workers.replicas | the number of workers Pods to run | 1 |
workers.nodeSelector | the nodeSelector configs for the worker Pods | {} |
workers.affinity | the affinity configs for the worker Pods | {} |
workers.tolerations | the toleration configs for the worker Pods | [] |
workers.securityContext | the security context for the worker Pods | {} |
workers.labels | labels for the worker StatefulSet | {} |
workers.podLabels | Pod labels for the worker StatefulSet | {} |
workers.annotations | annotations for the worker StatefulSet | {} |
workers.podAnnotations | Pod annotations for the worker StatefulSet | {} |
workers.safeToEvict | if we should tell Kubernetes Autoscaler that its safe to evict these Pods | true |
workers.podDisruptionBudget.* | configs for the PodDisruptionBudget of the worker StatefulSet | <see values.yaml> |
workers.autoscaling.* | configs for the HorizontalPodAutoscaler of the worker Pods | <see values.yaml> |
workers.initialStartupDelay | the number of seconds to wait (in bash) before starting each worker container | 0 |
workers.celery.* | configs for the celery worker Pods | <see values.yaml> |
workers.terminationPeriod | how many seconds to wait after SIGTERM before SIGKILL of the celery worker | 60 |
workers.secretsDir | directory in which to mount secrets on worker containers | /var/airflow/secrets |
workers.secrets | the names of existing Kubernetes Secrets to mount as files at {workers.secretsDir}/<secret_name>/<keys_in_secret> | [] |
workers.secretsMap | the name of an existing Kubernetes Secret to mount as files to {web.secretsDir}/<keys_in_secret> | "" |
Airflow Flower Values:
| Parameter | Description | Default |
|---|---|---|
flower.enabled | if the Flower UI should be deployed | true |
flower.resources | resource requests/limits for the flower Pods | {} |
flower.affinity | the affinity configs for the flower Pods | {} |
flower.tolerations | the toleration configs for the flower Pods | [] |
flower.securityContext | the security context for the flower Pods | {} |
flower.labels | labels for the flower Deployment | {} |
flower.podLabels | Pod labels for the flower Deployment | {} |
flower.annotations | annotations for the flower Deployment | {} |
flower.podAnnotations | Pod annotations for the flower Deployment | {} |
flower.safeToEvict | if we should tell Kubernetes Autoscaler that its safe to evict these Pods | true |
flower.podDisruptionBudget.* | configs for the PodDisruptionBudget of the flower Deployment | <see values.yaml> |
flower.oauthDomains | the value of the flower --auth argument | "" |
flower.basicAuthSecret | the name of a pre-created secret containing the basic authentication value for flower | "" |
flower.basicAuthSecretKey | the key within flower.basicAuthSecret containing the basic authentication string | "" |
flower.urlPrefix | sets AIRFLOW__CELERY__FLOWER_URL_PREFIX | "" |
flower.service.* | configs for the Service of the flower Pods | <see values.yaml> |
flower.initialStartupDelay | the number of seconds to wait (in bash) before starting the flower container | 0 |
flower.minReadySeconds | the number of seconds to wait before declaring a new Pod available | 5 |
flower.extraConfigmapMounts | extra ConfigMaps to mount on the flower Pods | [] |
Airflow Logs Values:
| Parameter | Description | Default |
|---|---|---|
logs.path | the airflow logs folder | /opt/airflow/logs |
logs.persistence.* | configs for the logs PVC | <see values.yaml> |
Airflow DAGs Values:
| Parameter | Description | Default |
|---|---|---|
dags.path | the airflow dags folder | /opt/airflow/dags |
dags.doNotPickle | whether to disable pickling dags from the scheduler to workers | false |
dags.installRequirements | install any Python requirements.txt at the root of dags.path automatically | false |
dags.persistence.* | configs for the dags PVC | <see values.yaml> |
dags.git.* | configs for the DAG git repository & sync container | <see values.yaml> |
dags.initContainer.* | configs for the git-clone container | <see values.yaml> |
Airflow Ingress Values:
| Parameter | Description | Default |
|---|---|---|
ingress.enabled | if we should deploy Ingress resources | false |
ingress.web.* | configs for the Ingress of the web Service | <see values.yaml> |
ingress.flower.* | configs for the Ingress of the flower Service | <see values.yaml> |
Airflow Kubernetes Values:
| Parameter | Description | Default |
|---|---|---|
rbac.create | if Kubernetes RBAC resources are created | true |
rbac.events | if the created RBAR role has GET/LIST access to Event resources | false |
serviceAccount.create | if a Kubernetes ServiceAccount is created | true |
serviceAccount.name | the name of the ServiceAccount | "" |
serviceAccount.annotations | annotations for the ServiceAccount | {} |
extraManifests | additional Kubernetes manifests to include with this chart | [] |
Airflow Database (Internal PostgreSQL) Values:
| Parameter | Description | Default |
|---|---|---|
postgresql.enabled | if the stable/postgresql chart is used | true |
postgresql.postgresqlDatabase | the postgres database to use | airflow |
postgresql.postgresqlUsername | the postgres user to create | postgres |
postgresql.postgresqlPassword | the postgres user's password | airflow |
postgresql.existingSecret | the name of a pre-created secret containing the postgres password | "" |
postgresql.existingSecretKey | the key within postgresql.passwordSecret containing the password string | postgresql-password |
postgresql.persistence.* | configs for the PVC of postgresql | <see values.yaml> |
postgresql.master.* | configs for the postgres StatefulSet | <see values.yaml> |
Airflow Database (External) Values:
| Parameter | Description | Default |
|---|---|---|
externalDatabase.type | the type of external database: {mysql,postgres} | postgres |
externalDatabase.host | the host of the external database | localhost |
externalDatabase.port | the port of the external database | 5432 |
externalDatabase.database | the database/scheme to use within the the external database | airflow |
externalDatabase.user | the user of the external database | airflow |
externalDatabase.passwordSecret | the name of a pre-created secret containing the external database password | "" |
externalDatabase.passwordSecretKey | the key within externalDatabase.passwordSecret containing the password string | postgresql-password |
externalDatabase.properties | the connection properties e.g. "?sslmode=require" | "" |
Airflow Redis (Internal) Values:
| Parameter | Description | Default |
|---|---|---|
redis.enabled | if the stable/redis chart is used | true |
redis.password | the redis password | airflow |
redis.existingSecret | the name of a pre-created secret containing the redis password | "" |
redis.existingSecretPasswordKey | the key within redis.existingSecret containing the password string | redis-password |
redis.cluster.* | configs for redis cluster mode | <see values.yaml> |
redis.master.* | configs for the redis master | <see values.yaml> |
redis.slave.* | configs for the redis slaves | <see values.yaml> |
Airflow Redis (External) Values:
| Parameter | Description | Default |
|---|---|---|
externalRedis.host | the host of the external redis | localhost |
externalRedis.port | the port of the external redis | 6379 |
externalRedis.databaseNumber | the database number to use within the the external redis | 1 |
externalRedis.passwordSecret | the name of a pre-created secret containing the external redis password | "" |
externalRedis.passwordSecretKey | the key within externalRedis.passwordSecret containing the password string | redis-password |
Airflow Prometheus Values:
| Parameter | Description | Default |
|---|---|---|
serviceMonitor.enabled | if the ServiceMonitor resources should be deployed | false |
serviceMonitor.selector | labels for ServiceMonitor, so that Prometheus can select it | { prometheus: "kube-prometheus" } |
serviceMonitor.path | the ServiceMonitor web endpoint path | /admin/metrics |
serviceMonitor.interval | the ServiceMonitor web endpoint path | 30s |
prometheusRule.enabled | if the PrometheusRule resources should be deployed | false |
prometheusRule.additionalLabels | labels for PrometheusRule, so that Prometheus can select it | {} |
prometheusRule.groups | alerting rules for Prometheus | [] |