Back to Datahub

README

metadata-ingestion/docs/sources/mlflow/README.md

1.5.0.33.8 KB
Original Source

Overview

MLflow is a machine learning platform. Learn more in the official MLflow documentation.

The DataHub integration for MLflow covers ML entities such as models, features, and related lineage metadata. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.

Concept Mapping

Source ConceptDataHub ConceptNotes
Registered ModelMlModelGroupThe name of a Model Group is the same as a Registered Model's name (e.g. my_mlflow_model). Registered Models serve as containers for multiple versions of the same model in MLflow.
Model VersionMlModelThe name of a Model is {registered_model_name}{model_name_separator}{model_version} (e.g. my_mlflow_model_1 for Registered Model named my_mlflow_model and Version 1, my_mlflow_model_2, etc.). Each Model Version represents a specific iteration of a model with its own artifacts and metadata.
ExperimentContainerEach Experiment in MLflow is mapped to a Container in DataHub. Experiments organize related runs and serve as logical groupings for model development iterations, allowing tracking of parameters, metrics, and artifacts.
RunDataProcessInstanceCaptures the run's execution details, parameters, metrics, and lineage to a model.
Model StageTagThe mapping between Model Stages and generated Tags is the following:
  • Production: mlflow_production
  • Staging: mlflow_staging
  • Archived: mlflow_archived
  • None: mlflow_none. Model Stages indicate the deployment status of each version. |