docs/how-to-guides/feast-operator/06-batch-and-jobs.md
This guide covers two related top-level spec fields:
spec.batchEngine — override the compute engine used for materializationspec.cronJob — schedule periodic feast materialize-incremental (or any command) as a Kubernetes CronJobspec.batchEngine)By default, Feast runs materialization using the local batch engine (in-process Python). For large feature sets you can point the operator at a Spark, Ray, or other supported engine via a Kubernetes ConfigMap.
Create a ConfigMap whose value is a YAML snippet identical to the batch_engine section of
feature_store.yaml. Include the type: key and all engine-specific options:
apiVersion: v1
kind: ConfigMap
metadata:
name: feast-batch-engine
data:
config: |
type: spark
spark_conf:
spark.master: k8s://https://kubernetes.default.svc
spark.kubernetes.namespace: feast
spark.kubernetes.container.image: ghcr.io/feast-dev/feast-spark:latest
spark.executor.instances: "2"
spark.executor.memory: 4g
spark.driver.memory: 2g
Reference the ConfigMap from the CR:
apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
name: sample-spark
spec:
feastProject: my_project
batchEngine:
configMapRef:
name: feast-batch-engine # ConfigMap name
configMapKey: config # key inside the ConfigMap (default: "config")
type | Notes |
|---|---|
local | Default; in-process Python, no extra infra |
spark | Apache Spark; requires a Spark operator or standalone cluster |
ray | Ray cluster; requires a Ray operator |
bytewax | Bytewax streaming engine |
snowflake.engine | Snowflake Snowpark compute |
For engine-specific YAML options (Spark conf, Ray address, etc.) see the Feast SDK — Compute Engine docs.
spec.cronJob)The operator can deploy a Kubernetes CronJob that runs feast materialize-incremental
(or any custom command) on a schedule. This is the recommended way to keep your online store
fresh without managing an external job scheduler.
The CronJob container image is resolved through the following priority chain:
cronJob.containerConfigs.image in the CR — per-CronJob overrideRELATED_IMAGE_CRON_JOB env var on the operator pod — cluster-wide default set by OLM/platform (default: quay.io/openshift/origin-cli:4.17)# Override cluster-wide for all CronJobs
kubectl set env deployment/feast-operator-controller-manager \
RELATED_IMAGE_CRON_JOB=my-registry.example.com/tools/cli:latest \
-n feast-operator-system
apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
name: feast-production
spec:
feastProject: my_project
cronJob:
schedule: "0 2 * * *" # every day at 02:00 UTC
containerConfigs:
image: quay.io/feastdev/feature-server:0.62.0
The CronJob runs the default feast materialize-incremental command using the same
feature_store.yaml that the operator generated for this FeatureStore.
Override the container command to run any Feast CLI command:
cronJob:
schedule: "*/30 * * * *"
containerConfigs:
image: quay.io/feastdev/feature-server:0.62.0
commands:
- feast
- materialize-incremental
- "2099-01-01T00:00:00" # materialize up to a fixed end time
Or run a Python script instead:
containerConfigs:
commands:
- python
- /app/materialize.py
cronJob:
schedule: "0 3 * * *"
timeZone: "Asia/Kolkata" # defaults to the kube-controller-manager time zone
cronJob:
schedule: "0 * * * *"
concurrencyPolicy: Forbid # Allow | Forbid | Replace
startingDeadlineSeconds: 300 # skip if missed by > 5 minutes
cronJob:
schedule: "0 2 * * *"
successfulJobsHistoryLimit: 3 # keep last 3 successful runs
failedJobsHistoryLimit: 5 # keep last 5 failed runs
To temporarily pause scheduled runs without deleting the CronJob:
cronJob:
schedule: "0 2 * * *"
suspend: true
cronJob:
schedule: "0 2 * * *"
containerConfigs:
image: quay.io/feastdev/feature-server:0.62.0
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "4"
memory: "8Gi"
cronJob:
schedule: "0 2 * * *"
containerConfigs:
image: quay.io/feastdev/feature-server:0.62.0
envFrom:
- secretRef:
name: feast-data-stores
env:
- name: FEAST_USAGE
value: "false"
cronJob:
schedule: "0 2 * * *"
jobSpec:
parallelism: 1
completions: 1
activeDeadlineSeconds: 3600 # abort if job takes more than 1 hour
backoffLimit: 2 # retry up to 2 times on failure
podTemplateAnnotations:
prometheus.io/scrape: "false"
containerConfigs:
image: quay.io/feastdev/feature-server:0.62.0
cronJob field reference| Field | Type | Default | Description |
|---|---|---|---|
schedule | string | — | Cron expression (required) |
timeZone | string | kube-controller-manager TZ | IANA time zone name |
concurrencyPolicy | string | Allow | Allow / Forbid / Replace |
suspend | bool | false | Suspend future runs |
startingDeadlineSeconds | int64 | — | Abort if start missed by this many seconds |
successfulJobsHistoryLimit | int32 | — | Successful job history to keep |
failedJobsHistoryLimit | int32 | — | Failed job history to keep |
annotations | map | — | CronJob metadata annotations |
jobSpec.parallelism | int32 | 1 | Job pod parallelism |
jobSpec.completions | int32 | 1 | Required completions |
jobSpec.activeDeadlineSeconds | int64 | — | Max job duration |
jobSpec.backoffLimit | int32 | — | Retry limit |
containerConfigs.image | string | operator default | Feature server image |
containerConfigs.commands | []string | feast materialize-incremental | Override container command |
containerConfigs.resources | ResourceRequirements | — | CPU/memory requests and limits |
containerConfigs.env / envFrom | — | — | Environment variables |
Use both together to run scheduled Spark-based materialization:
apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
name: feast-spark
spec:
feastProject: my_project
batchEngine:
configMapRef:
name: feast-spark-engine
cronJob:
schedule: "0 1 * * *"
concurrencyPolicy: Forbid
containerConfigs:
image: quay.io/feastdev/feature-server:0.62.0
resources:
requests:
memory: "4Gi"