Back to Feast

MLflow Integration

docs/reference/mlflow.md

0.64.014.0 KB
Original Source

MLflow Integration

Feast provides native integration with MLflow for automatic feature lineage tracking alongside ML experiments. When enabled, every feature retrieval is logged to the active MLflow run.

Overview

  • Which features did this model use? -- auto-logged on every get_historical_features() / get_online_features() call
  • Which feature service should I use to serve this model? -- resolved from model URI via store.mlflow.resolve_features()
  • Can I reproduce the exact training data? -- entity DataFrame saved as an MLflow artifact
  • Which models break if I change a feature view? -- reverse index via the Feast UI /api/mlflow-feature-usage endpoint
  • When was the feature store last updated? -- feast apply and feast materialize logged to a separate ops experiment

Capabilities

CapabilityHow
Auto-log feature metadataTags on every retrieval inside an active MLflow run
Entity DataFrame archivalentity_df.parquet artifact for full reproducibility
Model registration with lineagefeast.feature_service tag propagated to model versions
Training-to-prediction linkagestore.mlflow.load_model() links prediction runs back to training runs
Model-to-feature resolutionMap any model URI back to its Feast feature service
Operation audit trailfeast apply / feast materialize logged to {project}-feast-ops
store.mlflow APISingle entry point — zero import mlflow, zero client objects
Feast UI integrationPer-feature-view usage stats and registered model associations

Installation

MLflow is an optional dependency:

bash
pip install feast[mlflow]

Configuration

Add the mlflow section to your feature_store.yaml:

yaml
project: my_project
registry: data/registry.db
provider: local
online_store:
  type: sqlite
  path: data/online_store.db

mlflow:
  enabled: true
  tracking_uri: http://127.0.0.1:5000   # optional, falls back to MLFLOW_TRACKING_URI env var
  auto_log: true                         # default
  auto_log_entity_df: false              # default
  entity_df_max_rows: 100000             # default
  log_operations: false                  # default
  ops_experiment_suffix: "-feast-ops"    # default

Configuration options

OptionTypeDefaultDescription
enabledboolfalseMaster switch for the entire integration
tracking_uristring(none)MLflow tracking server URI. Falls back to MLFLOW_TRACKING_URI env var, then MLflow default (./mlruns)
auto_logbooltrueAutomatically log feature metadata on every retrieval when an active MLflow run exists
auto_log_entity_dfboolfalseSave the entity DataFrame as entity_df.parquet artifact on historical retrieval
entity_df_max_rowsint100000Skip entity DataFrame artifact upload for DataFrames exceeding this limit
log_operationsboolfalseLog feast apply and feast materialize to a separate MLflow experiment
ops_experiment_suffixstring"-feast-ops"Suffix appended to project name for the operations experiment

Tracking URI resolution

The tracking URI is resolved in this order:

  1. tracking_uri field in feature_store.yaml
  2. MLFLOW_TRACKING_URI environment variable
  3. MLflow's default (./mlruns local directory)

This means you can omit tracking_uri from the YAML and set MLFLOW_TRACKING_URI in your environment instead, or it would be pulled from ./mlruns automatically when both are not set.

What gets logged

Tags on retrieval runs

When auto_log: true and an active MLflow run exists, each get_historical_features() or get_online_features() call records:

TagExampleDescription
feast.projectmy_projectFeast project name
feast.retrieval_typehistorical / onlineType of feature retrieval
feast.feature_servicedriver_activity_v1Auto-resolved feature service name (if matched)
feast.feature_viewsdriver_hourly_statsComma-separated feature view names
feast.feature_refsdriver_hourly_stats:conv_rate,...All feature references
feast.entity_count200Number of entities in the request
feast.feature_count5Number of features retrieved

Metrics

MetricExampleDescription
feast.job_submission_sec0.4321Feature retrieval duration in seconds

Artifacts

When auto_log_entity_df: true and the entity DataFrame has fewer than entity_df_max_rows rows:

ArtifactDescription
entity_df.parquetFull entity DataFrame used in the retrieval

When a model is logged via store.mlflow.log_model():

ArtifactDescription
feast_features.jsonJSON list of feature references the model was trained on

Entity DataFrame metadata

Regardless of auto_log_entity_df, the following metadata is logged when present:

Tag / ParamWhenDescription
feast.entity_df_typeAlwaysdataframe, sql, or range
feast.entity_df_rowsDataFrame inputRow count
feast.entity_df_columnsDataFrame inputColumn names
feast.entity_df_querySQL inputThe SQL query string
feast.start_date / feast.end_dateRange-based inputDate range

Operation logs

When log_operations: true, feast apply and feast materialize create self-contained runs in the {project}{ops_experiment_suffix} experiment (default: my_project-feast-ops):

Apply runs:

Tag / MetricExample
feast.operationapply
feast.projectmy_project
feast.feature_views_changeddriver_hourly_stats,order_stats
feast.feature_services_changeddriver_activity_v1
feast.entities_changeddriver,restaurant
feast.apply.feature_views_count2
feast.apply.feature_services_count1
feast.apply.entities_count2

Materialize runs:

Tag / MetricExample
feast.operationmaterialize / materialize_incremental
feast.projectmy_project
feast.materialize.feature_viewsdriver_hourly_stats
feast.materialize.start_date2024-01-01T00:00:00
feast.materialize.end_date2024-01-02T00:00:00
feast.materialize.duration_sec12.3456

Usage

Automatic logging (zero code)

With the configuration above, feature metadata is logged automatically whenever there is an active MLflow run. No explicit import mlflow is needed — just use store.mlflow:

python
from feast import FeatureStore

store = FeatureStore(".")

with store.mlflow.start_run(run_name="my_training"):
    training_df = store.get_historical_features(
        features=store.get_feature_service("driver_activity_v1"),
        entity_df=entity_df,
    ).to_df()
    # The run is now tagged with feast.feature_refs, feast.feature_views, etc.

    model = train(training_df)
    store.mlflow.log_model(model, "model")

No extra code needed — the tags are written automatically.

store.mlflow is the primary way to interact with the Feast–MLflow integration. It provides Feast-enhanced versions of common MLflow operations, and delegates everything else to the raw mlflow module:

python
from feast import FeatureStore
from sklearn.linear_model import LogisticRegression

store = FeatureStore(".")

# Training
with store.mlflow.start_run(run_name="v1_training"):
    df = store.get_historical_features(
        features=store.get_feature_service("driver_activity_v1"),
        entity_df=entity_df,
    ).to_df()

    model = LogisticRegression().fit(X, y)
    store.mlflow.log_model(model, "model")     # Feast-enhanced: saves feast_features.json
    train_run_id = store.mlflow.active_run_id

# Register model (auto-tags version with feast.feature_service)
store.mlflow.register_model(f"runs:/{train_run_id}/model", "driver_model")

# Prediction (auto-links to training run)
with store.mlflow.start_run(run_name="prediction"):
    model = store.mlflow.load_model("models:/driver_model/1")
    online_features = store.get_online_features(
        features=store.get_feature_service("driver_activity_v1"),
        entity_rows=[{"driver_id": 1001}],
    )
    predictions = model.predict(...)

feast.mlflow module API (alternative)

For users who prefer a module-level import, feast.mlflow is a drop-in replacement for import mlflow that delegates to the same store.mlflow client under the hood:

python
import feast.mlflow
from feast import FeatureStore

store = FeatureStore(".")   # auto-registers with feast.mlflow

with feast.mlflow.start_run(run_name="training"):
    df = store.get_historical_features(...).to_df()
    feast.mlflow.log_params({"lr": "0.01"})     # plain passthrough
    feast.mlflow.log_metrics({"f1": 0.85})       # plain passthrough
    feast.mlflow.log_model(model, "model")       # Feast-enhanced

Store resolution

feast.mlflow resolves its FeatureStore in this order:

  1. Explicit feast.mlflow.init(store) — if called, overrides everything
  2. Auto-registered — the most recently created FeatureStore with mlflow.enabled=true registers itself automatically
  3. Auto-discovery — falls back to FeatureStore(".") from the current directory

In most cases, simply creating a FeatureStore(...) is enough — no init() needed.

Error handling

feast.mlflow raises clear errors on first use if something is misconfigured:

ConditionError
No feature_store.yaml in cwd and no store createdRuntimeError with guidance to call feast.mlflow.init(store)
mlflow.enabled is not set to trueRuntimeError with guidance to set mlflow.enabled=true
mlflow pip package not installedImportError with guidance to run pip install feast[mlflow]

When mlflow.enabled is false (or omitted), store.mlflow returns None, allowing callers to guard with if store.mlflow:. The feast.mlflow module raises RuntimeError only when you attempt to use it without an enabled store.

Feast-enhanced functions

These functions add automatic Feast tagging and lineage on top of their MLflow counterparts:

FunctionEnhancement
store.mlflow.start_run(run_name, tags)Auto-tags run with feast.project
store.mlflow.log_model(model, path, flavor)Auto-attaches feast_features.json artifact
store.mlflow.register_model(model_uri, name)Auto-tags model version with feast.feature_service
store.mlflow.load_model(model_uri)Auto-tags prediction run with training lineage

Supported model flavors for log_model(): sklearn, pytorch, xgboost, lightgbm, tensorflow, keras, pyfunc.

Feast-only functions

These are unique to the Feast integration and have no mlflow equivalent:

FunctionDescription
store.mlflow.resolve_features(model_uri)Resolve model URI to Feast feature service name
store.mlflow.get_training_entity_df(run_id, ...)Recover entity DataFrame from a past MLflow run
store.mlflow.log_training_dataset(df, dataset_name)Log a training DataFrame as an MLflow dataset input
store.mlflow.active_run_idCurrent active MLflow run ID (or None)
store.mlflow.clientThe underlying MlflowClient instance for advanced queries
feast.mlflow.init(store)Explicitly bind feast.mlflow module to a FeatureStore (optional)

Passthrough behavior

The feast.mlflow module delegates any attribute not listed above to the raw mlflow module. This means you can use feast.mlflow as a drop-in replacement for import mlflow:

python
feast.mlflow.log_params(params)             # passes through to mlflow.log_params
feast.mlflow.log_metrics(metrics)
feast.mlflow.set_tag("env", "staging")
feast.mlflow.MlflowClient()

store.mlflow does not have this passthrough — it only exposes the Feast-enhanced and Feast-only methods listed above. To access raw mlflow functions from store.mlflow, use the escape hatches:

python
store.mlflow.client.log_param(run_id, "lr", "0.01")  # via MlflowClient instance
store.mlflow.mlflow.log_params(params)                # via raw mlflow module

Resolve a model back to its feature service

python
from feast import FeatureStore

store = FeatureStore(".")
fs_name = store.mlflow.resolve_features("models:/driver_model/1")
# Returns: "driver_activity_v1"

Resolution order:

  1. Model version tag feast.feature_service (set by register_model())
  2. Training run tag feast.feature_service (set by auto-logging)

Reproduce training from a past run

python
from feast import FeatureStore

store = FeatureStore(".")

entity_df = store.mlflow.get_training_entity_df(run_id="abc123")

with store.mlflow.start_run(run_name="retrain_v2"):
    new_df = store.get_historical_features(
        features=store.get_feature_service("driver_activity_v1"),
        entity_df=entity_df,
    ).to_df()
    model = train(new_df)
    store.mlflow.log_model(model, "model")

This requires auto_log_entity_df: true to have been enabled when the original run was recorded.

Feast UI integration

The Feast UI server exposes three API endpoints that aggregate data from MLflow:

EndpointDescription
/api/mlflow-runsAll Feast-tagged MLflow runs with linked registered models
/api/mlflow-feature-usagePer-feature-view usage stats (run count, last used, associated models)
/api/mlflow-feature-modelsReverse index of feature refs to registered models

The feature view detail page in the Feast UI displays:

  • MLflow Training Runs count and Last Used date in the header stats
  • An MLflow Usage panel showing training run count, relative last-used time, and a table of registered models that depend on the feature view

Start the Feast UI with:

bash
feast ui --host 127.0.0.1 --port 8888