examples/synapseml/autologging.md
MLflow automatic logging allows you to log metrics, parameters, and models without the need for explicit log statements. SynapseML supports autologging for every model in the library.
Install SynapseML library following this guidance
Default mlflow log_model_allowlist file already includes some SynapseML models. To enable more models, you could use mlflow.pyspark.ml.autolog(log_model_allowlist=YOUR_SET_OF_MODELS) function, or follow the below guidance by specifying a link to the file and update spark configuration.
To enable autologging with your custom log_model_allowlist file:
wasb://<containername>@<accountname>.blob.core.windows.net/PATH_TO_YOUR/log_model_allowlist.txt/dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt.spark.mlflow.pysparkml.autolog.logModelAllowlistFile to the path of your log_model_allowlist.txt file.mlflow.pyspark.ml.autolog() before your training code to enable autologging for all supported models.Note:
If you want to support autologging of PySpark models not present in the log_model_allowlist file, you can add such models to the file.
%pip install mlflow -ulog_model_allowlist.txt file to dbfs by clicking File/Upload Data button on Databricks UI.spark.mlflow.pysparkml.autolog.logModelAllowlistFile /dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt
import mlflow
mlflow.pyspark.ml.autolog()
You can customize how autologging works by supplying appropriate parameters.
Experiments tab of the MLflow UI.from pyspark.ml.linalg import Vectors
from synapse.ml.nn import ConditionalKNN
df = spark.createDataFrame(
[
(Vectors.dense(2.0, 2.0, 2.0), "foo", 1),
(Vectors.dense(2.0, 2.0, 4.0), "foo", 3),
(Vectors.dense(2.0, 2.0, 6.0), "foo", 4),
(Vectors.dense(2.0, 2.0, 8.0), "foo", 3),
(Vectors.dense(2.0, 2.0, 10.0), "foo", 1),
(Vectors.dense(2.0, 2.0, 12.0), "foo", 2),
(Vectors.dense(2.0, 2.0, 14.0), "foo", 0),
(Vectors.dense(2.0, 2.0, 16.0), "foo", 1),
(Vectors.dense(2.0, 2.0, 18.0), "foo", 3),
(Vectors.dense(2.0, 2.0, 20.0), "foo", 0),
(Vectors.dense(2.0, 4.0, 2.0), "foo", 2),
(Vectors.dense(2.0, 4.0, 4.0), "foo", 4),
(Vectors.dense(2.0, 4.0, 6.0), "foo", 2),
(Vectors.dense(2.0, 4.0, 8.0), "foo", 2),
(Vectors.dense(2.0, 4.0, 10.0), "foo", 4),
(Vectors.dense(2.0, 4.0, 12.0), "foo", 3),
(Vectors.dense(2.0, 4.0, 14.0), "foo", 2),
(Vectors.dense(2.0, 4.0, 16.0), "foo", 1),
(Vectors.dense(2.0, 4.0, 18.0), "foo", 4),
(Vectors.dense(2.0, 4.0, 20.0), "foo", 4),
],
["features", "values", "labels"],
)
cnn = ConditionalKNN().setOutputCol("prediction")
cnnm = cnn.fit(df)
test_df = spark.createDataFrame(
[
(Vectors.dense(2.0, 2.0, 2.0), "foo", 1, [0, 1]),
(Vectors.dense(2.0, 2.0, 4.0), "foo", 4, [0, 1]),
(Vectors.dense(2.0, 2.0, 6.0), "foo", 2, [0, 1]),
(Vectors.dense(2.0, 2.0, 8.0), "foo", 4, [0, 1]),
(Vectors.dense(2.0, 2.0, 10.0), "foo", 4, [0, 1]),
],
["features", "values", "labels", "conditioner"],
)
display(cnnm.transform(test_df))
This code should log one run with a ConditionalKNNModel artifact and its parameters.