Back to Mlflow

MLflow Helm Chart

charts/README.md

3.13.04.9 KB
Original Source

MLflow Helm Chart

A production-ready Helm chart for deploying MLflow on Kubernetes.

Features

  • MLflow server with configurable CLI options
  • TLS support via an existing Kubernetes Secret
  • Persistent storage with a PersistentVolumeClaim for SQLite or file-based artifact stores
  • Ingress for external access
  • Prometheus metrics and optional ServiceMonitor for the Prometheus Operator
  • NetworkPolicy restricting ingress and egress to required ports
  • RBAC with independent namespace-scoped (namespace_rbac) and cluster-scoped (cluster_rbac) rules
  • Garbage collection via an optional CronJob that runs mlflow gc

Prerequisites

  • Kubernetes 1.23+
  • Helm 3.8+

Installation

bash
helm install mlflow ./charts --namespace mlflow --create-namespace

With custom values:

bash
helm install mlflow ./charts \
  --namespace mlflow \
  --create-namespace \
  -f my-values.yaml

Quick Start: Shared Dev Instance

The simplest way to get a shared MLflow instance running on a cluster — no external database or object store required. MLflow stores metadata in SQLite and artifacts on a PersistentVolumeClaim:

bash
helm install mlflow ./charts \
  --namespace mlflow \
  --create-namespace \
  --set storage.enabled=true \
  --set mlflow.backendStoreUri="sqlite:////mlflow/mlflow.db" \
  --set mlflow.defaultArtifactRoot="/mlflow/artifacts"

Access the UI via port-forward:

bash
kubectl port-forward -n mlflow svc/mlflow-mlflow 5000:5000

Then open http://localhost:5000 in your browser.

Note: SQLite and local file storage are not suitable for production or high-concurrency use. For production deployments see the Backend store and Artifact store sections below.

Configuration

See values.yaml for the full list of configurable parameters. Common scenarios are described below.

Backend store (metadata database)

Inline URI (password visible in values — use only for development):

yaml
mlflow:
  backendStoreUri: "postgresql://user:password@postgres:5432/mlflow"

Read from a Kubernetes Secret (recommended for production):

bash
kubectl create secret generic mlflow-db-secret \
  --from-literal=uri="postgresql://user:password@postgres:5432/mlflow"
yaml
mlflow:
  backendStoreUriFrom:
    secretKeyRef:
      name: mlflow-db-secret
      key: uri

Artifact store

yaml
mlflow:
  defaultArtifactRoot: "s3://my-bucket/mlflow"
  artifactsDestination: "s3://my-bucket/mlflow"

env:
  - name: AWS_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: s3-credentials
        key: access-key-id
  - name: AWS_SECRET_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: s3-credentials
        key: secret-access-key

Persistent local storage (SQLite / file store)

yaml
storage:
  enabled: true
  size: 10Gi

mlflow:
  backendStoreUri: "sqlite:////mlflow/mlflow.db"
  defaultArtifactRoot: "/mlflow/artifacts"

TLS

bash
kubectl create secret tls mlflow-tls \
  --cert=tls.crt \
  --key=tls.key
yaml
tls:
  enabled: true
  secretName: mlflow-tls

Ingress

MLflow's host-validation middleware only allows localhost and private-IP hosts by default. When exposing MLflow through an Ingress with a public hostname, set allowed_hosts to match that hostname, otherwise requests will be rejected with HTTP 403.

yaml
server:
  value_options:
    allowed_hosts: "mlflow.example.com"

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: mlflow.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: mlflow-tls
      hosts:
        - mlflow.example.com

Prometheus metrics

yaml
metrics:
  enabled: true
  path: /metrics

serviceMonitor:
  enabled: true

Garbage collection

Periodically remove soft-deleted runs, experiments, and their artifacts:

yaml
garbageCollection:
  enabled: true
  schedule: "0 2 * * 0" # weekly at 2 AM on Sunday
  olderThan: "30d" # only remove resources soft-deleted for 30+ days

Resource limits

yaml
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi

Example

See example-mlflow-charts.yaml for a production-oriented example with a PostgreSQL backend, S3 artifact store, and Secret-based credential injection.

bash
helm install mlflow ./charts \
  --namespace mlflow \
  --create-namespace \
  -f charts/example-mlflow-charts.yaml

Upgrading

bash
helm upgrade mlflow ./charts --namespace mlflow -f my-values.yaml

Uninstalling

bash
helm uninstall mlflow --namespace mlflow

Note: helm uninstall does not delete PersistentVolumeClaims. If storage.enabled=true, delete the PVC manually after uninstalling:

bash
kubectl delete pvc -n mlflow --all