Model Interpretation on Spark

Interpretable Machine Learning

Interpretable Machine Learning helps developers, data scientists and business stakeholders in the organization gain a comprehensive understanding of their machine learning models. It can also be used to debug models, explain predictions and enable auditing to meet compliance with regulatory requirements.

Why run model interpretation on Spark

Model-agnostic interpretation methods can be computationally expensive due to the multiple evaluations needed to compute the explanations. Model interpretation on Spark enables users to interpret a black-box model at massive scales with the Apache Spark™ distributed computing ecosystem. Various components support local interpretation for tabular, vector, image and text classification models, with two popular model-agnostic interpretation methods: LIME and Kernel SHAP.

Usage

Both LIME and Kernel SHAP are local interpretation methods. Local interpretation explains why does the model predict certain outcome for a given observation.

Both explainers extends from org.apache.spark.ml.Transformer. After setting up the explainer parameters, simply call the transform function on a DataFrame of observations to interpret the model behavior on these observations.

To see examples of model interpretability on Spark in action, take a look at these sample notebooks:

	Tabular models	Vector models	Image models	Text models
LIME explainers	TabularLIME	VectorLIME	ImageLIME	TextLIME
Kernel SHAP explainers	TabularSHAP	VectorSHAP	ImageSHAP	TextSHAP

Common local explainer params

All local explainers support the following params:

Param	Type	Default	Description
targetCol	`String`	"probability"	The column name of the prediction target to explain (i.e. the response variable). This is usually set to "prediction" for regression models and "probability" for probabilistic classification models.
targetClasses	`Array[Int]`	empty array	The indices of the classes for multinomial classification models.
targetClassesCol	`String`		The name of the column that specifies the indices of the classes for multinomial classification models.
outputCol	`String`		The name of the output column for interpretation results.
model	`Transformer`		The model to be explained.

Common LIME explainer params

All LIME based explainers (TabularLIME, VectorLIME, ImageLIME, TextLIME) support the following params:

Param	Type	Default	Description
regularization	`Double`	0	Regularization param for the underlying lasso regression.
kernelWidth	`Double`	sqrt(number of features) * 0.75	Kernel width for the exponential kernel.
numSamples	`Int`	1000	Number of samples to generate.
metricsCol	`String`	"r2"	Column name for fitting metrics.

Common SHAP explainer params

All Kernel SHAP based explainers (TabularSHAP, VectorSHAP, ImageSHAP, TextSHAP) support the following params:

Param	Type	Default	Description
infWeight	`Double`	1E8	The double value to represent infinite weight.
numSamples	`Int`	2 * (number of features) + 2048	Number of samples to generate.
metricsCol	`String`	"r2"	Column name for fitting metrics.

Tabular model explainer params

All tabular model explainers (TabularLIME, TabularSHAP) support the following params:

Param	Type	Default	Description
inputCols	`Array[String]`		The names of input columns to the black-box model.
backgroundData	`DataFrame`		A dataframe containing background data. It must contain all the input columns needed by the black-box model.

Vector model explainer params

All vector model explainers (VectorLIME, VectorSHAP) support the following params:

Param	Type	Default	Description
inputCol	`String`		The names of input vector column to the black-box model.
backgroundData	`DataFrame`		A dataframe containing background data. It must contain the input vector column needed by the black-box model.

Image model explainer params

All image model explainers (ImageLIME, ImageSHAP) support the following params:

Param	Type	Default	Description
inputCol	`String`		The names of input image column to the black-box model.
cellSize	`Double`	16	Number that controls the size of the super-pixels.
modifier	`Double`	130	Controls the trade-off spatial and color distance of super-pixels.
superpixelCol	`String`	"superpixels"	The column holding the super-pixel decompositions.

Text model explainer params

All text model explainers (TextLIME, TextSHAP) support the following params:

Param	Type	Default	Description
inputCol	`String`		The names of input text column to the black-box model.
tokensCol	`String`	"tokens"	The column holding the text tokens.