metadata-ingestion/docs/sources/mlflow/README.md
MLflow is a machine learning platform. Learn more in the official MLflow documentation.
The DataHub integration for MLflow covers ML entities such as models, features, and related lineage metadata. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.
| Source Concept | DataHub Concept | Notes |
|---|---|---|
Registered Model | MlModelGroup | The name of a Model Group is the same as a Registered Model's name (e.g. my_mlflow_model). Registered Models serve as containers for multiple versions of the same model in MLflow. |
Model Version | MlModel | The name of a Model is {registered_model_name}{model_name_separator}{model_version} (e.g. my_mlflow_model_1 for Registered Model named my_mlflow_model and Version 1, my_mlflow_model_2, etc.). Each Model Version represents a specific iteration of a model with its own artifacts and metadata. |
Experiment | Container | Each Experiment in MLflow is mapped to a Container in DataHub. Experiments organize related runs and serve as logical groupings for model development iterations, allowing tracking of parameters, metrics, and artifacts. |
Run | DataProcessInstance | Captures the run's execution details, parameters, metrics, and lineage to a model. |
Model Stage | Tag | The mapping between Model Stages and generated Tags is the following: |