Back to Charts

⚠️ THIS CHART HAS MOVED ⚠️

stable/airflow/README.md

latest35.1 KB
Original Source

⚠️ THIS CHART HAS MOVED ⚠️

New location: https://github.com/airflow-helm/charts/tree/main/charts/airflow



Airflow Helm Chart

Airflow is a platform to programmatically author, schedule and monitor workflows.

Installation

(Helm 2) install the Airflow Helm Chart:

bash
helm install stable/airflow \
  --name "airflow" \
  --version "X.X.X" \
  --namespace "airflow" \
  --values ./custom-values.yaml

(Helm 3) install the Airflow Helm Chart:

bash
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo update

helm install "airflow" stable/airflow \
  --version "X.X.X" \
  --namespace "airflow" \
  --values ./custom-values.yaml

Get the status of the Airflow Helm Chart:

bash
helm status "airflow"

Uninstall the Airflow Helm Chart:

bash
helm delete "airflow"

Run bash commands in the Airflow Webserver Pod:

bash
# create an interactive bash session in the Webserver Pod
# use this bash session for commands like: `airflow create_user`
kubectl exec \
  -it \
  --namespace airflow \
  --container airflow-web \
  Deployment/airflow-web \
  /bin/bash

Upgrade Steps

Chart version numbers: Chart.yaml or Artifact Hub


Example Helm Values

Here are some starting points for your custom-values.yaml:

NameFileDescription
(CeleryExecutor) Minimalexamples/minikube/custom-values.yamla non-production starting point
(CeleryExecutor) Google Cloudexamples/google-gke/custom-values.yamla production starting point for GKE on Google Cloud

Docs (Airflow) - Configs

While we don't expose the airflow.cfg directly, you can use environment variables to set Airflow configs.

We expose the airflow.config value to make this easier:

yaml
airflow:
  config:
    ## Security
    AIRFLOW__CORE__SECURE_MODE: "True"
    AIRFLOW__API__AUTH_BACKEND: "airflow.api.auth.backend.deny_all"
    AIRFLOW__WEBSERVER__EXPOSE_CONFIG: "False"
    AIRFLOW__WEBSERVER__RBAC: "False"

    ## DAGS
    AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "30"
    AIRFLOW__CORE__LOAD_EXAMPLES: "False"

    ## Email (SMTP)
    AIRFLOW__EMAIL__EMAIL_BACKEND: "airflow.utils.email.send_email_smtp"
    AIRFLOW__SMTP__SMTP_HOST: "smtpmail.example.com"
    AIRFLOW__SMTP__SMTP_STARTTLS: "False"
    AIRFLOW__SMTP__SMTP_SSL: "False"
    AIRFLOW__SMTP__SMTP_PORT: "25"
    AIRFLOW__SMTP__SMTP_MAIL_FROM: "[email protected]"

    ## Disable noisy "Handling signal: ttou" Gunicorn log messages
    GUNICORN_CMD_ARGS: "--log-level WARNING"

Docs (Airflow) - Connections

Option 1 - Values.yaml

We expose the scheduler.connections value to specify Airflow Connections, which will be automatically imported by the airflow-scheduler when it starts up.

By default, we will delete and re-create connections each time the airflow-scheduler restarts. (If you want to manually modify a connection in the WebUI, you should disable this behaviour by setting scheduler.refreshConnections to false)

For example, to add a connection called my_aws:

yaml
scheduler:
  connections:
    - id: my_aws
      type: aws
      extra: |
        {
          "aws_access_key_id": "XXXXXXXX",
          "aws_secret_access_key": "XXXXXXXX",
          "region_name":"eu-central-1"
        }

Option 2 - Kubernetes Secret

If you don't want to store connections in your values.yaml, use scheduler.existingSecretConnections to specify the name of an existing Kubernetes Secret containing an add-connections.sh script. Note, your script will be run EACH TIME the airflow-scheduler Pod restarts, and scheduler.connections will not longer work.

Here is an example Secret you might create:

yaml
apiVersion: v1
kind: Secret
metadata:
  name: my-airflow-connections
type: Opaque
stringData:
  add-connections.sh: |
    #!/usr/bin/env bash

    # remove any existing connection
    airflow connections --delete \
      --conn_id "my_aws"
  
    # re-add your custom connection
    airflow connections --add \
      --conn_id "my_aws" \
      --conn_type "aws" \
      --conn_extra "{\"aws_access_key_id\": \"XXXXXXXX\", \"aws_secret_access_key\": \"XXXXXXXX\", \"region_name\":\"eu-central-1\"}"

Docs (Airflow) - Variables

We expose the scheduler.variables value to specify Airflow Variables, which will be automatically imported by the airflow-scheduler when it starts up.

For example, to specify a variable called environment:

yaml
scheduler:
  variables: |
    { "environment": "dev" }

Docs (Airflow) - Pools

We expose the scheduler.pools value to specify Airflow Pools, which will be automatically imported by the Airflow scheduler when it starts up.

For example, to create a pool called example:

yaml
scheduler:
  pools: |
    {
      "example": {
        "description": "This is an example pool with 2 slots.",
        "slots": 2
      }
    }

Docs (Airflow) - Environment Variables

We expose the airflow.extraEnv value to mount extra environment variables, this can be used to pass sensitive configs to Airflow.

For example, passing a Fernet key and LDAP password, (the airflow and ldap Kubernetes Secrets must already exist):

yaml
airflow:
  extraEnv:
    - name: AIRFLOW__CORE__FERNET_KEY
      valueFrom:
        secretKeyRef:
          name: airflow
          key: fernet-key
    - name: AIRFLOW__LDAP__BIND_PASSWORD
      valueFrom:
        secretKeyRef:
          name: ldap
          key: password

Docs (Airflow) - ConfigMaps

We expose the airflow.extraConfigmapMounts value to mount extra Kubernetes ConfigMaps.

For example, a webserver_config.py file:

yaml
airflow:
  extraConfigmapMounts:
    - name: my-webserver-config
      mountPath: /opt/airflow/webserver_config.py
      configMap: my-airflow-webserver-config
      readOnly: true
      subPath: webserver_config.py

To create the my-airflow-webserver-config ConfigMap, you could use:

bash
kubectl create configmap \
  my-airflow-webserver-config \
  --from-file=webserver_config.py \
  --namespace airflow

Docs (Airflow) - Install Python Packages

We expose the airflow.extraPipPackages and web.extraPipPackages values to install Python Pip packages, these will work with any pip package that you can install with pip install XXXX.

For example, enabling the airflow airflow-exporter package:

yaml
airflow:
  extraPipPackages:
    - "airflow-exporter==1.3.1"

For example, you may be using flask_oauthlib to integrate with Okta/Google/etc for authorizing WebUI users:

yaml
web:
  extraPipPackages:
    - "apache-airflow[google_auth]==1.10.12"

Docs (Kubernetes) - Ingress

This chart provides an optional Kubernetes Ingress resource, for accessing airflow-webserver and airflow-flower outside of the cluster.

URL Prefix:

If you already have something hosted at the root of your domain, you might want to place airflow under a URL-prefix:

In this example, would set these values:

yaml
web:
  baseUrl: "http://example.com/airflow/"

flower:
  urlPrefix: "/airflow/flower"

ingress:
  web:
    path: "/airflow"

  flower:
    path: "/airflow/flower"

Custom Paths:

We expose the ingress.web.precedingPaths and ingress.web.succeedingPaths values, which are before and after the default path respectively.

A common use-case is enabling https with the aws-alb-ingress-controller ssl-redirect, which needs a redirect path to be hit before the airflow-webserver one.

You would set the values of precedingPaths as the following:

yaml
ingress:
  web:
    precedingPaths:
      - path: "/*"
        serviceName: "ssl-redirect"
        servicePort: "use-annotation"

Docs (Kubernetes) - Worker Autoscaling

We use a Kubernetes StatefulSet for the Celery workers, this allows the webserver to requests logs from each workers individually, with a fixed DNS name.

Celery workers can be scaled using the Horizontal Pod Autoscaler. To enable autoscaling, you must set workers.autoscaling.enabled=true, then provide workers.autoscaling.maxReplicas, and workers.replicas for the minimum amount.

Assume every task a worker executes consumes approximately 200Mi memory, that means memory is a good metric for utilisation monitoring. For a worker pod you can calculate it: WORKER_CONCURRENCY * 200Mi, so for 10 tasks a worker will consume ~2Gi of memory.

In the following config if a worker consumes 80% of 2Gi (which will happen if it runs 9-10 tasks at the same time), an autoscaling event will be triggered, and a new worker will be added. If you have many tasks in a queue, Kubernetes will keep adding workers until maxReplicas reached, in this case 16.

yaml
workers:
  # the initial/minimum number of workers
  replicas: 2

  resources:
    requests:
      memory: "2Gi"

  podDisruptionBudget:
    enabled: true
    ## prevents losing more than 20% of current worker task slots in a voluntary disruption
    maxUnavailable: "20%"

  autoscaling:
    enabled: true
    maxReplicas: 16
    metrics:
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

  celery:
    instances: 10

    ## wait at most 9min for running tasks to complete before SIGTERM
    ## WARNING: 
    ##  - some cluster-autoscaler (GKE) will not respect graceful 
    ##    termination periods over 10min
    gracefullTermination: true
    gracefullTerminationPeriod: 540

  ## how many seconds (after the 9min) to wait before SIGKILL
  terminationPeriod: 60

dags:
  git:
    gitSync:
      resources:
        requests:
          ## IMPORTANT! for autoscaling to work
          memory: "64Mi"

Docs (Kubernetes) - Worker Secrets

We expose the workers.secrets value to allow mounting secrets at {workers.secretsDir}/<secret-name> in airflow-worker Pods.

For example, mounting password Secrets:

yaml
workers:
  secretsDir: /var/airflow/secrets
  secrets:
    - redshift-user
    - redshift-password
    - elasticsearch-user
    - elasticsearch-password

With the above configuration, you could read the redshift-user password from within a DAG or Python function using:

python
import os
from pathlib import Path

def get_secret(secret_name):
    secrets_dir = Path('/var/airflow/secrets')
    secret_path = secrets_dir / secret_name
    assert secret_path.exists(), f'could not find {secret_name} at {secret_path}'
    secret_data = secret_path.read_text().strip()
    return secret_data

redshift_user = get_secret('redshift-user')

To create the redshift-user Secret, you could use:

bash
kubectl create secret generic \
  redshift-user \
  --from-literal=redshift-user=MY_REDSHIFT_USERNAME \
  --namespace airflow

Docs (Kubernetes) - Additional Manifests

We expose the extraManifests.[] value to add custom Kubernetes manifests to the chart.

For example, adding a BackendConfig resource for GKE:

yaml
extraManifests:
  - apiVersion: cloud.google.com/v1beta1
    kind: BackendConfig
    metadata:
      name: "{{ .Release.Name }}-test"
    spec:
      securityPolicy:
        name: "gcp-cloud-armor-policy-test"

Docs (Database) - DB Initialization

If the value scheduler.initdb is set to true (this is the default), the airflow-scheduler container will run airflow initdb as part of its startup script.

If the value scheduler.preinitdb is set to true, then we ALSO RUN airflow initdb in an init-container (retrying 5 times). This is unusually NOT necessary unless your synced DAGs include custom database hooks that prevent airflow initdb from running.

Docs (Database) - Passwords

PostgreSQL is the default database in this chart, because we use insecure username/password combinations by default, you should create secure credentials before installing the Helm chart.

Example bash command to create the required Kubernetes Secrets:

bash
# set postgress password
kubectl create secret generic \
  airflow-postgresql \
  --from-literal=postgresql-password=$(openssl rand -base64 13) \
  --namespace airflow

# set redis password
kubectl create secret generic \
  airflow-redis \
  --from-literal=redis-password=$(openssl rand -base64 13) \
  --namespace airflow

Example values.yaml, to use those secrets:

yaml
postgresql:
  existingSecret: airflow-postgresql

redis:
  existingSecret: airflow-redis

Docs (Database) - External Database

While this chart comes with an embedded stable/postgresql, this is NOT SUITABLE for production. You should make use of an external mysql or postgres database, for example, one that is managed by your cloud provider.

Option 1 - Postgres

Example values for an external Postgres database, with an existing airflow_cluster1 database:

yaml
externalDatabase:
  type: postgres
  host: postgres.example.org
  port: 5432
  database: airflow_cluster1
  user: airflow_cluster1
  passwordSecret: "airflow-cluster1-postgres-password"
  passwordSecretKey: "postgresql-password"

Option 2 - MySQL

WARNING: Airflow requires that explicit_defaults_for_timestamp=1 in your MySQL instance, see here

Example values for an external MySQL database, with an existing airflow_cluster1 database:

yaml
externalDatabase:
  type: mysql
  host: mysql.example.org
  port: 3306
  database: airflow_cluster1
  user: airflow_cluster1
  passwordSecret: "airflow-cluster1-mysql-password"
  passwordSecretKey: "mysql-password"

Docs (Other) - Log Persistence

By default, logs from the airflow-web/scheduler/worker are written within the Docker container's filesystem, therefore any restart of the pod will wipe the logs. For a production deployment, you will likely want to persist the logs.

You must give airflow credentials for it to read/write on the remote bucket, this can be achieved with AIRFLOW__CORE__REMOTE_LOG_CONN_ID, or by using something like Workload Identity (GKE), or IAM Roles for Service Accounts (EKS).

Example, using AIRFLOW__CORE__REMOTE_LOG_CONN_ID (can be used with AWS too):

yaml
airflow:
  config:
    AIRFLOW__CORE__REMOTE_LOGGING: "True"
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "gs://<<MY-BUCKET-NAME>>/airflow/logs"
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "google_cloud_airflow"

scheduler:
  connections:
    - id: google_cloud_airflow
      type: google_cloud_platform
      extra: |-
        {
         "extra__google_cloud_platform__num_retries": "5",
         "extra__google_cloud_platform__keyfile_dict": "{...}"
        }

Example, using Workload Identity (GKE):

yaml
airflow:
  config:
    AIRFLOW__CORE__REMOTE_LOGGING: "True"
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "gs://<<MY-BUCKET-NAME>>/airflow/logs"
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "google_cloud_default"

serviceAccount:
  annotations:
    iam.gke.io/gcp-service-account: "<<MY-ROLE-NAME>>@<<MY-PROJECT-NAME>>.iam.gserviceaccount.com"

Example, using IAM Roles for Service Accounts (EKS):

yaml
airflow:
  config:
    AIRFLOW__CORE__REMOTE_LOGGING: "True"
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "s3://<<MY-BUCKET-NAME>>/airflow/logs"
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "aws_default"

scheduler:
  securityContext:
    fsGroup: 65534

web:
  securityContext:
    fsGroup: 65534

workers:
  securityContext:
    fsGroup: 65534

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::XXXXXXXXXX:role/<<MY-ROLE-NAME>>"

Option 2 - Kubernetes PVC

yaml
logs:
  persistence:
    enabled: true

Docs (Other) - Service Monitor

The service monitor is something introduced by the CoresOS Prometheus Operator. To be able to expose metrics to prometheus you need install a plugin, this can be added to the docker image. A good one is epoch8/airflow-exporter, which exposes dag and task based metrics from Airflow.

For more information, see the serviceMonitor section of values.yaml.


Docs (Other) - DAG Storage

This method places a git sidecar in each worker/scheduler/web Kubernetes Pod, that perpetually syncs your git repo into the dag folder every dags.git.gitSync.refreshTime seconds.

WARNING: In the dags.git.secret the known_hosts file is present to reduce the possibility of a man-in-the-middle attack. However, if you want to implicitly trust all repo host signatures set dags.git.sshKeyscan to true.

yaml
dags:
  git:
    url: ssh://[email protected]/example.git
    repoHost: repo.example.com
    secret: airflow-git-keys
    privateKeyName: id_rsa

    gitSync:
      enabled: true
      refreshTime: 60

You can create the dags.git.secret from your local ~/.ssh folder using:

bash
kubectl create secret generic \
  airflow-git-keys \
  --from-file=id_rsa=~/.ssh/id_rsa \
  --from-file=id_rsa.pub=~/.ssh/id_rsa.pub \
  --from-file=known_hosts=~/.ssh/known_hosts \
  --namespace airflow

Option 2 - Mount a Shared Persistent Volume

This method stores your DAGs in a Kubernetes Persistent Volume Claim (PVC), you must use some external system to ensure this volume has your latest DAGs. For example, you could use your CI/CD pipeline system to preform a sync as changes are pushed to a git repo.

Since ALL Pods MUST HAVE the same collection of DAG files, it is recommended to create just one PVC that is shared. To share a PVC with multiple Pods, the PVC needs to have accessMode set to ReadOnlyMany or ReadWriteMany (Note: different StorageClass support different access modes). If you are using Kubernetes on a public cloud, a persistent volume controller is likely built in: Amazon EKS, Azure AKS, Google GKE.

For example, to use the storage class called default:

yaml
dags:
  persistence:
    enabled: true
    storageClass: default
    accessMode: ReadOnlyMany
    size: 1Gi

Option 2a - Single PVC for DAGs & Logs

You may want to store DAGs and logs on the same volume and configure Airflow to use subdirectories for them.

WARNING: you must use a PVC which supports accessMode: ReadWriteMany

Here's an approach that achieves this:

  • Configure airflow.extraVolume and airflow.extraVolumeMount to put a volume at /opt/airflow/efs
  • Configure dags.persistence.enabled and logs.persistence.enabled to be false
  • Configure dags.path to be /opt/airflow/efs/dags
  • Configure logs.path to be /opt/airflow/efs/logs

Docs (Other) - requirements.txt

We expose the dags.installRequirements value to enable installing any requirements.txt found at the root of your dags.path folder as airflow-workers start.

WARNING: if you update the requirements.txt, you will have to restart your airflow-workers for changes to take effect

NOTE: you might also want to consider using airflow.extraPipPackages


Helm Chart Values

Full documentation can be found in the comments of the values.yaml file, but a high level overview is provided here.

Global Values:

ParameterDescriptionDefault
airflow.image.*configs for the docker image of the web/scheduler/worker<see values.yaml>
airflow.executorthe airflow executor type to useCeleryExecutor
airflow.fernetKeythe fernet key used to encrypt the connections/variables in the database7T512UXSSmBOkpWimFHIVb8jK6lfmSAvx4mO6Arehnc=
airflow.configenvironment variables for the web/scheduler/worker pods (for airflow configs){}
airflow.podAnnotationsextra annotations for the web/scheduler/worker/flower Pods{}
airflow.extraEnvextra environment variables for the web/scheduler/worker/flower Pods[]
airflow.extraConfigmapMountsextra configMap volumeMounts for the web/scheduler/worker/flower Pods[]
airflow.extraContainersextra containers for the web/scheduler/worker Pods[]
airflow.extraPipPackagesextra pip packages to install in the web/scheduler/worker Pods[]
airflow.extraVolumeMountsextra volumeMounts for the web/scheduler/worker Pods[]
airflow.extraVolumesextra volumes for the web/scheduler/worker Pods[]

Airflow Scheduler values:

ParameterDescriptionDefault
scheduler.resourcesresource requests/limits for the scheduler Pods{}
scheduler.nodeSelectorthe nodeSelector configs for the scheduler Pods{}
scheduler.affinitythe affinity configs for the scheduler Pods{}
scheduler.tolerationsthe toleration configs for the scheduler Pods[]
scheduler.securityContextthe security context for the scheduler Pods{}
scheduler.labelslabels for the scheduler Deployment{}
scheduler.podLabelsPod labels for the scheduler Deployment{}
scheduler.annotationsannotations for the scheduler Deployment{}
scheduler.podAnnotationsPod Annotations for the scheduler Deployment{}
scheduler.safeToEvictif we should tell Kubernetes Autoscaler that its safe to evict these Podstrue
scheduler.podDisruptionBudget.*configs for the PodDisruptionBudget of the scheduler<see values.yaml>
scheduler.connectionscustom airflow connections for the airflow scheduler[]
scheduler.refreshConnectionsif scheduler.connections are deleted and re-added after each scheduler restarttrue
scheduler.existingSecretConnectionsthe name of an existing Secret containing an add-connections.sh script to run on scheduler start""
scheduler.variablescustom airflow variables for the airflow scheduler"{}"
scheduler.poolscustom airflow pools for the airflow scheduler"{}"
scheduler.numRunsthe value of the airflow --num_runs parameter used to run the airflow scheduler-1
scheduler.initdbif we run airflow initdb when the scheduler startstrue
scheduler.preinitdbif we run airflow initdb inside a special initContainerfalse
scheduler.initialStartupDelaythe number of seconds to wait (in bash) before starting the scheduler container0
scheduler.livenessProbe.*configs for the scheduler liveness probe<see values.yaml>
scheduler.extraInitContainersextra init containers to run before the scheduler pod[]

Airflow Webserver Values:

ParameterDescriptionDefault
web.resourcesresource requests/limits for the airflow web pods{}
web.replicasthe number of web Pods to run1
web.nodeSelectorthe number of web Pods to run{}
web.affinitythe affinity configs for the web Pods{}
web.tolerationsthe toleration configs for the web Pods[]
web.securityContextthe security context for the web Pods{}
web.labelslabels for the web Deployment{}
web.podLabelsPod labels for the web Deployment{}
web.annotationsannotations for the web Deployment{}
web.podAnnotationsPod annotations for the web Deployment{}
web.safeToEvictif we should tell Kubernetes Autoscaler that its safe to evict these Podstrue
web.podDisruptionBudget.*configs for the PodDisruptionBudget of the web Deployment<see values.yaml>
web.service.*configs for the Service of the web pods<see values.yaml>
web.baseUrlsets AIRFLOW__WEBSERVER__BASE_URLhttp://localhost:8080
web.serializeDAGssets AIRFLOW__CORE__STORE_SERIALIZED_DAGSfalse
web.extraPipPackagesextra pip packages to install in the web container[]
web.initialStartupDelaythe number of seconds to wait (in bash) before starting the web container0
web.minReadySecondsthe number of seconds to wait before declaring a new Pod available5
web.readinessProbe.*configs for the web Service readiness probe<see values.yaml>
web.livenessProbe.*configs for the web Service liveness probe<see values.yaml>
web.secretsDirthe directory in which to mount secrets on web containers/var/airflow/secrets
web.secretsthe names of existing Kubernetes Secrets to mount as files at {workers.secretsDir}/<secret_name>/<keys_in_secret>[]
web.secretsMapthe name of an existing Kubernetes Secret to mount as files to {web.secretsDir}/<keys_in_secret>""

Airflow Worker Values:

ParameterDescriptionDefault
workers.enabledif the airflow workers StatefulSet should be deployedtrue
workers.resourcesresource requests/limits for the airflow worker Pods{}
workers.replicasthe number of workers Pods to run1
workers.nodeSelectorthe nodeSelector configs for the worker Pods{}
workers.affinitythe affinity configs for the worker Pods{}
workers.tolerationsthe toleration configs for the worker Pods[]
workers.securityContextthe security context for the worker Pods{}
workers.labelslabels for the worker StatefulSet{}
workers.podLabelsPod labels for the worker StatefulSet{}
workers.annotationsannotations for the worker StatefulSet{}
workers.podAnnotationsPod annotations for the worker StatefulSet{}
workers.safeToEvictif we should tell Kubernetes Autoscaler that its safe to evict these Podstrue
workers.podDisruptionBudget.*configs for the PodDisruptionBudget of the worker StatefulSet<see values.yaml>
workers.autoscaling.*configs for the HorizontalPodAutoscaler of the worker Pods<see values.yaml>
workers.initialStartupDelaythe number of seconds to wait (in bash) before starting each worker container0
workers.celery.*configs for the celery worker Pods<see values.yaml>
workers.terminationPeriodhow many seconds to wait after SIGTERM before SIGKILL of the celery worker60
workers.secretsDirdirectory in which to mount secrets on worker containers/var/airflow/secrets
workers.secretsthe names of existing Kubernetes Secrets to mount as files at {workers.secretsDir}/<secret_name>/<keys_in_secret>[]
workers.secretsMapthe name of an existing Kubernetes Secret to mount as files to {web.secretsDir}/<keys_in_secret>""

Airflow Flower Values:

ParameterDescriptionDefault
flower.enabledif the Flower UI should be deployedtrue
flower.resourcesresource requests/limits for the flower Pods{}
flower.affinitythe affinity configs for the flower Pods{}
flower.tolerationsthe toleration configs for the flower Pods[]
flower.securityContextthe security context for the flower Pods{}
flower.labelslabels for the flower Deployment{}
flower.podLabelsPod labels for the flower Deployment{}
flower.annotationsannotations for the flower Deployment{}
flower.podAnnotationsPod annotations for the flower Deployment{}
flower.safeToEvictif we should tell Kubernetes Autoscaler that its safe to evict these Podstrue
flower.podDisruptionBudget.*configs for the PodDisruptionBudget of the flower Deployment<see values.yaml>
flower.oauthDomainsthe value of the flower --auth argument""
flower.basicAuthSecretthe name of a pre-created secret containing the basic authentication value for flower""
flower.basicAuthSecretKeythe key within flower.basicAuthSecret containing the basic authentication string""
flower.urlPrefixsets AIRFLOW__CELERY__FLOWER_URL_PREFIX""
flower.service.*configs for the Service of the flower Pods<see values.yaml>
flower.initialStartupDelaythe number of seconds to wait (in bash) before starting the flower container0
flower.minReadySecondsthe number of seconds to wait before declaring a new Pod available5
flower.extraConfigmapMountsextra ConfigMaps to mount on the flower Pods[]

Airflow Logs Values:

ParameterDescriptionDefault
logs.paththe airflow logs folder/opt/airflow/logs
logs.persistence.*configs for the logs PVC<see values.yaml>

Airflow DAGs Values:

ParameterDescriptionDefault
dags.paththe airflow dags folder/opt/airflow/dags
dags.doNotPicklewhether to disable pickling dags from the scheduler to workersfalse
dags.installRequirementsinstall any Python requirements.txt at the root of dags.path automaticallyfalse
dags.persistence.*configs for the dags PVC<see values.yaml>
dags.git.*configs for the DAG git repository & sync container<see values.yaml>
dags.initContainer.*configs for the git-clone container<see values.yaml>

Airflow Ingress Values:

ParameterDescriptionDefault
ingress.enabledif we should deploy Ingress resourcesfalse
ingress.web.*configs for the Ingress of the web Service<see values.yaml>
ingress.flower.*configs for the Ingress of the flower Service<see values.yaml>

Airflow Kubernetes Values:

ParameterDescriptionDefault
rbac.createif Kubernetes RBAC resources are createdtrue
rbac.eventsif the created RBAR role has GET/LIST access to Event resourcesfalse
serviceAccount.createif a Kubernetes ServiceAccount is createdtrue
serviceAccount.namethe name of the ServiceAccount""
serviceAccount.annotationsannotations for the ServiceAccount{}
extraManifestsadditional Kubernetes manifests to include with this chart[]

Airflow Database (Internal PostgreSQL) Values:

ParameterDescriptionDefault
postgresql.enabledif the stable/postgresql chart is usedtrue
postgresql.postgresqlDatabasethe postgres database to useairflow
postgresql.postgresqlUsernamethe postgres user to createpostgres
postgresql.postgresqlPasswordthe postgres user's passwordairflow
postgresql.existingSecretthe name of a pre-created secret containing the postgres password""
postgresql.existingSecretKeythe key within postgresql.passwordSecret containing the password stringpostgresql-password
postgresql.persistence.*configs for the PVC of postgresql<see values.yaml>
postgresql.master.*configs for the postgres StatefulSet<see values.yaml>

Airflow Database (External) Values:

ParameterDescriptionDefault
externalDatabase.typethe type of external database: {mysql,postgres}postgres
externalDatabase.hostthe host of the external databaselocalhost
externalDatabase.portthe port of the external database5432
externalDatabase.databasethe database/scheme to use within the the external databaseairflow
externalDatabase.userthe user of the external databaseairflow
externalDatabase.passwordSecretthe name of a pre-created secret containing the external database password""
externalDatabase.passwordSecretKeythe key within externalDatabase.passwordSecret containing the password stringpostgresql-password
externalDatabase.propertiesthe connection properties e.g. "?sslmode=require"""

Airflow Redis (Internal) Values:

ParameterDescriptionDefault
redis.enabledif the stable/redis chart is usedtrue
redis.passwordthe redis passwordairflow
redis.existingSecretthe name of a pre-created secret containing the redis password""
redis.existingSecretPasswordKeythe key within redis.existingSecret containing the password stringredis-password
redis.cluster.*configs for redis cluster mode<see values.yaml>
redis.master.*configs for the redis master<see values.yaml>
redis.slave.*configs for the redis slaves<see values.yaml>

Airflow Redis (External) Values:

ParameterDescriptionDefault
externalRedis.hostthe host of the external redislocalhost
externalRedis.portthe port of the external redis6379
externalRedis.databaseNumberthe database number to use within the the external redis1
externalRedis.passwordSecretthe name of a pre-created secret containing the external redis password""
externalRedis.passwordSecretKeythe key within externalRedis.passwordSecret containing the password stringredis-password

Airflow Prometheus Values:

ParameterDescriptionDefault
serviceMonitor.enabledif the ServiceMonitor resources should be deployedfalse
serviceMonitor.selectorlabels for ServiceMonitor, so that Prometheus can select it{ prometheus: "kube-prometheus" }
serviceMonitor.paththe ServiceMonitor web endpoint path/admin/metrics
serviceMonitor.intervalthe ServiceMonitor web endpoint path30s
prometheusRule.enabledif the PrometheusRule resources should be deployedfalse
prometheusRule.additionalLabelslabels for PrometheusRule, so that Prometheus can select it{}
prometheusRule.groupsalerting rules for Prometheus[]