website/versioned_docs/version-1.0.9/Use with MLFlow/Autologging.md
MLflow automatic logging allows you to log metrics, parameters, and models without the need for explicit log statements. SynapseML supports autologging for every model in the library.
To enable autologging for SynapseML:
wasb://<containername>@<accountname>.blob.core.windows.net/PATH_TO_YOUR/log_model_allowlist.txt/dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt.spark.mlflow.pysparkml.autolog.logModelAllowlistFile to the path of your log_model_allowlist.txt file.mlflow.pyspark.ml.autolog() before your training code to enable autologging for all supported models.Note:
with mlflow.start_run() as it might cause multiple runs for one single model or one run for multiple models.%pip install mlflowlog_model_allowlist.txt file to dbfs by clicking File/Upload Data button on Databricks UI.spark.mlflow.pysparkml.autolog.logModelAllowlistFile /dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt
mlflow.pyspark.ml.autolog()
You can customize how autologging works by supplying appropriate parameters.
Experiments tab of the MLFlow UI.from pyspark.ml.linalg import Vectors
from synapse.ml.nn import *
df = spark.createDataFrame([
(Vectors.dense(2.0,2.0,2.0), "foo", 1),
(Vectors.dense(2.0,2.0,4.0), "foo", 3),
(Vectors.dense(2.0,2.0,6.0), "foo", 4),
(Vectors.dense(2.0,2.0,8.0), "foo", 3),
(Vectors.dense(2.0,2.0,10.0), "foo", 1),
(Vectors.dense(2.0,2.0,12.0), "foo", 2),
(Vectors.dense(2.0,2.0,14.0), "foo", 0),
(Vectors.dense(2.0,2.0,16.0), "foo", 1),
(Vectors.dense(2.0,2.0,18.0), "foo", 3),
(Vectors.dense(2.0,2.0,20.0), "foo", 0),
(Vectors.dense(2.0,4.0,2.0), "foo", 2),
(Vectors.dense(2.0,4.0,4.0), "foo", 4),
(Vectors.dense(2.0,4.0,6.0), "foo", 2),
(Vectors.dense(2.0,4.0,8.0), "foo", 2),
(Vectors.dense(2.0,4.0,10.0), "foo", 4),
(Vectors.dense(2.0,4.0,12.0), "foo", 3),
(Vectors.dense(2.0,4.0,14.0), "foo", 2),
(Vectors.dense(2.0,4.0,16.0), "foo", 1),
(Vectors.dense(2.0,4.0,18.0), "foo", 4),
(Vectors.dense(2.0,4.0,20.0), "foo", 4)
], ["features","values","labels"])
cnn = (ConditionalKNN().setOutputCol("prediction"))
cnnm = cnn.fit(df)
test_df = spark.createDataFrame([
(Vectors.dense(2.0,2.0,2.0), "foo", 1, [0, 1]),
(Vectors.dense(2.0,2.0,4.0), "foo", 4, [0, 1]),
(Vectors.dense(2.0,2.0,6.0), "foo", 2, [0, 1]),
(Vectors.dense(2.0,2.0,8.0), "foo", 4, [0, 1]),
(Vectors.dense(2.0,2.0,10.0), "foo", 4, [0, 1])
], ["features","values","labels","conditioner"])
display(cnnm.transform(test_df))
This code should log one run with a ConditionalKNNModel artifact and its parameters.