metadata-ingestion/docs/sources/mlflow/mlflow_post.md
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
:::note Version Compatibility
This connector requires an MLflow server version 1.28.0 or later.
If you're using an earlier version, ingestion of Experiments and Runs will be skipped.
:::
You can configure the MLflow source to authenticate with the MLflow server using the username and password configuration options.
source:
type: mlflow
config:
tracking_uri: "http://127.0.0.1:5000"
username: <username>
password: <password>
You can map MLflow run datasets to specific DataHub platforms using the source_mapping_to_platform configuration option. This allows you to specify which DataHub platform should be associated with datasets from different MLflow engines.
Example:
source_mapping_to_platform:
huggingface: snowflake # Maps Hugging Face datasets to Snowflake platform
http: s3 # Maps HTTP data sources to s3 platform
Default behavior: Links to existing datasets by platform and name; does not create new datasets.
To create datasets automatically, enable materialize_dataset_inputs:
materlize_dataset_inputs: true # Creates new datasets if they don't exist
You can configure these options independently:
# Only map to existing datasets
materlize_dataset_inputs: false
source_mapping_to_platform:
huggingface: snowflake # Maps Hugging Face datasets to Snowflake platform
pytorch: snowflake # Maps PyTorch datasets to Snowflake platform
# Create new datasets and map platforms
materlize_dataset_inputs: true
source_mapping_to_platform:
huggingface: snowflake
pytorch: snowflake
Module behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.