SHAP Theoretical Foundation

This document explains the theoretical foundations of SHAP (SHapley Additive exPlanations), including Shapley values from game theory, the principles that make SHAP unique, and connections to other explanation methods.

Game Theory Origins

Shapley Values

SHAP is grounded in Shapley values, a solution concept from cooperative game theory developed by Lloyd Shapley in 1951.

Core Concept: In cooperative game theory, players collaborate to achieve a total payoff, and the question is: how should this payoff be fairly distributed among players?

Mapping to Machine Learning:

Players → Input features
Game → Model prediction task
Payoff → Model output (prediction value)
Coalition → Subset of features with known values
Fair Distribution → Attributing prediction to features

The Shapley Value Formula

For a feature $i$, its Shapley value $\phi_i$ is:

$$\phi_i = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} [f(S \cup {i}) - f(S)]$$

Where:

$F$ is the set of all features
$S$ is a subset of features not including $i$
$f(S)$ is the model's expected output given only features in $S$
$|S|$ is the size of subset $S$

Interpretation: The Shapley value averages the marginal contribution of feature $i$ across all possible feature coalitions (subsets). The contribution is weighted by how likely each coalition is to occur.

Key Properties of Shapley Values

1. Efficiency (Additivity): $$\sum_{i=1}^{n} \phi_i = f(x) - f(\emptyset)$$

The sum of all SHAP values equals the difference between the model's prediction for the instance and the expected value (baseline).

This is why SHAP waterfall plots always sum to the total prediction change.

2. Symmetry: If two features $i$ and $j$ contribute equally to all coalitions, then $\phi_i = \phi_j$.

Features with identical effects receive identical attribution.

3. Dummy: If a feature $i$ doesn't change the model output for any coalition, then $\phi_i = 0$.

Irrelevant features receive zero attribution.

4. Monotonicity: If a feature's marginal contribution increases across coalitions, its Shapley value increases.

From Game Theory to Machine Learning

The Challenge

Computing exact Shapley values requires evaluating the model on all possible feature coalitions:

For $n$ features, there are $2^n$ possible coalitions
For 50 features, this is over 1 quadrillion evaluations

This exponential complexity makes exact computation intractable for most real-world models.

SHAP's Solution: Additive Feature Attribution

SHAP connects Shapley values to additive feature attribution methods, enabling efficient computation.

Additive Feature Attribution Model: $$g(z') = \phi_0 + \sum_{i=1}^{M} \phi_i z'_i$$

Where:

$g$ is the explanation model
$z' \in {0,1}^M$ indicates feature presence/absence
$\phi_i$ is the attribution to feature $i$
$\phi_0$ is the baseline (expected value)

SHAP proves that Shapley values are the only attribution values satisfying three desirable properties: local accuracy, missingness, and consistency.

SHAP Properties and Guarantees

Local Accuracy

Property: The explanation matches the model's output: $$f(x) = g(x') = \phi_0 + \sum_{i=1}^{M} \phi_i x'_i$$

Interpretation: SHAP values exactly account for the model's prediction. This enables waterfall plots to precisely decompose predictions.

Missingness

Property: If a feature is missing (not observed), its attribution is zero: $$x'_i = 0 \Rightarrow \phi_i = 0$$

Interpretation: Only features that are present contribute to explanations.

Consistency

Property: If a model changes so a feature's marginal contribution increases (or stays the same) for all inputs, that feature's attribution should not decrease.

Interpretation: If a feature becomes more important to the model, its SHAP value reflects this. This enables meaningful model comparisons.

SHAP as a Unified Framework

SHAP unifies several existing explanation methods by showing they're special cases of Shapley values under specific assumptions.

LIME (Local Interpretable Model-agnostic Explanations)

LIME's Approach: Fit a local linear model around a prediction using perturbed samples.

Connection to SHAP: LIME approximates Shapley values but with suboptimal sample weighting. SHAP uses theoretically optimal weights derived from Shapley value formula.

Key Difference: LIME's loss function and sampling don't guarantee consistency or exact additivity; SHAP does.

DeepLIFT

DeepLIFT's Approach: Backpropagate contributions through neural networks by comparing to reference activations.

Connection to SHAP: DeepExplainer uses DeepLIFT but averages over multiple reference samples to approximate conditional expectations, yielding Shapley values.

Layer-Wise Relevance Propagation (LRP)

LRP's Approach: Decompose neural network predictions by propagating relevance scores backward through layers.

Connection to SHAP: LRP is a special case of SHAP with specific propagation rules. SHAP generalizes these rules with Shapley value theory.

Integrated Gradients

Integrated Gradients' Approach: Integrate gradients along path from baseline to input.

Connection to SHAP: When using a single reference point, Integrated Gradients approximates SHAP values for smooth models.

SHAP Computation Methods

Different SHAP explainers use specialized algorithms to compute Shapley values efficiently for specific model types.

Tree SHAP (TreeExplainer)

Innovation: Exploits tree structure to compute exact Shapley values in polynomial time instead of exponential.

Algorithm:

Traverses each tree path from root to leaf
Computes feature contributions using tree splits and weights
Aggregates across all trees in ensemble

Complexity: $O(TLD^2)$ where $T$ = number of trees, $L$ = max leaves, $D$ = max depth

Key Advantage: Exact Shapley values computed efficiently for tree-based models (XGBoost, LightGBM, Random Forest, etc.)

Kernel SHAP (KernelExplainer)

Innovation: Uses weighted linear regression to estimate Shapley values for any model.

Algorithm:

Samples coalitions (feature subsets) according to Shapley kernel weights
Evaluates model on each coalition (missing features replaced by background values)
Fits weighted linear model to estimate feature attributions

Complexity: $O(n \cdot 2^M)$ but approximates with fewer samples

Key Advantage: Model-agnostic; works with any prediction function

Trade-off: Slower than specialized explainers; approximate rather than exact

Deep SHAP (DeepExplainer)

Innovation: Combines DeepLIFT with Shapley value sampling.

Algorithm:

Computes DeepLIFT attributions for each reference sample
Averages attributions across multiple reference samples
Approximates conditional expectations: $E[f(x) | x_S]$

Complexity: $O(n \cdot m)$ where $m$ = number of reference samples

Key Advantage: Efficiently approximates Shapley values for deep neural networks

Linear SHAP (LinearExplainer)

Innovation: Closed-form Shapley values for linear models.

Algorithm:

For independent features: $\phi_i = w_i \cdot (x_i - E[x_i])$
For correlated features: Adjusts for feature covariance

Complexity: $O(n)$ - nearly instantaneous

Key Advantage: Exact Shapley values with minimal computation

Understanding Conditional Expectations

The Core Challenge

Computing $f(S)$ (model output given only features in $S$) requires handling missing features.

Question: How should we represent "missing" features when the model requires all features as input?

Two Approaches

1. Interventional (Marginal) Approach:

Replace missing features with values from background dataset
Estimates: $E[f(x) | x_S]$ by marginalizing over $x_{\bar{S}}$
Interpretation: "What would the model predict if we didn't know features $\bar{S}$?"

2. Observational (Conditional) Approach:

Use conditional distribution: $E[f(x) | x_S = x_S^*]$
Accounts for feature dependencies
Interpretation: "What would the model predict for similar instances with features $S = x_S^*$?"

Trade-offs:

Interventional: Simpler, assumes feature independence, matches causal interpretation
Observational: More accurate for correlated features, requires conditional distribution estimation

TreeExplainer supports both via feature_perturbation parameter.

Baseline (Expected Value) Selection

The baseline $\phi_0 = E[f(x)]$ represents the model's average prediction.

Computing the Baseline

For TreeExplainer:

With background data: Average prediction on background dataset
With tree_path_dependent: Weighted average using tree leaf distributions

For DeepExplainer / KernelExplainer:

Average prediction on background samples

Importance of Baseline

SHAP values measure deviation from baseline
Different baselines → different SHAP values (but still sum correctly)
Choose baseline representative of "typical" or "neutral" input
Common choices: Training set mean, median, or mode

Interpreting SHAP Values

Units and Scale

SHAP values have the same units as the model output:

Regression: Same units as target variable (dollars, temperature, etc.)
Classification (log-odds): Log-odds units
Classification (probability): Probability units (if model output transformed)

Magnitude: Higher absolute SHAP value = stronger feature impact

Sign:

Positive SHAP value = Feature pushes prediction higher
Negative SHAP value = Feature pushes prediction lower

Additive Decomposition

For a prediction $f(x)$: $$f(x) = E[f(X)] + \sum_{i=1}^{n} \phi_i(x)$$

Example:

Expected value (baseline): 0.3
SHAP values: {Age: +0.15, Income: +0.10, Education: -0.05}
Prediction: $0.3 + 0.15 + 0.10 - 0.05 = 0.50$

Global vs. Local Importance

Local (Instance-level):

SHAP values for single prediction: $\phi_i(x)$
Explains: "Why did the model predict $f(x)$ for this instance?"
Visualization: Waterfall, force plots

Global (Dataset-level):

Average absolute SHAP values: $E[|\phi_i(x)|]$
Explains: "Which features are most important overall?"
Visualization: Beeswarm, bar plots

Key Insight: Global importance is the aggregation of local importances, maintaining consistency between instance and dataset explanations.

SHAP vs. Other Feature Importance Methods

Comparison with Permutation Importance

Permutation Importance:

Shuffles a feature and measures accuracy drop
Global metric only (no instance-level explanations)
Can be misleading with correlated features

SHAP:

Provides both local and global importance
Handles feature correlations through coalitional averaging
Consistent: Additive property guarantees sum to prediction

Comparison with Feature Coefficients (Linear Models)

Feature Coefficients ($w_i$):

Measure impact per unit change in feature
Don't account for feature scale or distribution

SHAP for Linear Models:

$\phi_i = w_i \cdot (x_i - E[x_i])$
Accounts for feature value relative to average
More interpretable for comparing features with different units/scales

Comparison with Tree Feature Importance (Gini/Split-based)

Gini/Split Importance:

Based on training process (purity gain or frequency of splits)
Biased toward high-cardinality features
No instance-level explanations
Can be misleading (importance ≠ predictive power)

SHAP (Tree SHAP):

Based on model output (prediction behavior)
Fair attribution through Shapley values
Provides instance-level explanations
Consistent and theoretically grounded

Interactions and Higher-Order Effects

SHAP Interaction Values

Standard SHAP captures main effects. SHAP interaction values capture pairwise interactions.

Formula for Interaction: $$\phi_{i,j} = \sum_{S \subseteq F \setminus {i,j}} \frac{|S|!(|F|-|S|-2)!}{2(|F|-1)!} \Delta_{ij}(S)$$

Where $\Delta_{ij}(S)$ is the interaction effect of features $i$ and $j$ given coalition $S$.

Interpretation:

$\phi_{i,i}$: Main effect of feature $i$
$\phi_{i,j}$ ($i \neq j$): Interaction effect between features $i$ and $j$

Property: $$\phi_i = \phi_{i,i} + \sum_{j \neq i} \phi_{i,j}$$

Main SHAP value equals main effect plus half of all pairwise interactions involving feature $i$.

Computing Interactions

TreeExplainer supports exact interaction computation:

python

explainer = shap.TreeExplainer(model)
shap_interaction_values = explainer.shap_interaction_values(X)

Limitation: Exponentially complex for other explainers (only practical for tree models)

Theoretical Limitations and Considerations

Computational Complexity

Exact Computation: $O(2^n)$ - intractable for large $n$

Specialized Algorithms:

Tree SHAP: $O(TLD^2)$ - efficient for trees
Deep SHAP, Kernel SHAP: Approximations required

Implication: For non-tree models with many features, explanations may be approximate.

Feature Independence Assumption

Kernel SHAP and Basic Implementation: Assume features can be independently manipulated

Challenge: Real features are often correlated (e.g., height and weight)

Solutions:

Use observational approach (conditional expectations)
TreeExplainer with correlation-aware perturbation
Feature grouping for highly correlated features

Out-of-Distribution Samples

Issue: Creating coalitions by replacing features may create unrealistic samples (outside training distribution)

Example: Setting "Age=5" and "Has PhD=Yes" simultaneously

Implication: SHAP values reflect model behavior on potentially unrealistic inputs

Mitigation: Use observational approach or carefully selected background data

Causality

SHAP measures association, not causation

SHAP answers: "How does the model's prediction change with this feature?" SHAP does NOT answer: "What would happen if we changed this feature in reality?"

Example:

SHAP: "Hospital stay length increases prediction of mortality" (association)
Causality: "Longer hospital stays cause higher mortality" (incorrect!)

Implication: Use domain knowledge to interpret SHAP causally; SHAP alone doesn't establish causation.

Advanced Theoretical Topics

SHAP as Optimal Credit Allocation

SHAP is the unique attribution method satisfying:

Local accuracy: Explanation matches model
Missingness: Absent features have zero attribution
Consistency: Attribution reflects feature importance changes

Proof: Lundberg & Lee (2017) showed Shapley values are the only solution satisfying these axioms.

Connection to Functional ANOVA

SHAP values correspond to first-order terms in functional ANOVA decomposition: $$f(x) = f_0 + \sum_i f_i(x_i) + \sum_{i,j} f_{ij}(x_i, x_j) + ...$$

Where $f_i(x_i)$ captures main effect of feature $i$, and $\phi_i \approx f_i(x_i)$.

Relationship to Sensitivity Analysis

SHAP generalizes sensitivity analysis:

Sensitivity Analysis: $\frac{\partial f}{\partial x_i}$ (local gradient)
SHAP: Integrated sensitivity over feature coalition space

Gradient-based methods (GradientExplainer, Integrated Gradients) approximate SHAP using derivatives.

Practical Implications of Theory

Why Use SHAP?

Theoretical Guarantees: Only method with consistency, local accuracy, and missingness
Unified Framework: Connects and generalizes multiple explanation methods
Additive Decomposition: Predictions precisely decompose into feature contributions
Model Comparison: Consistency enables comparing feature importance across models
Versatility: Works with any model type (with appropriate explainer)

When to Be Cautious

Computational Cost: May be slow for complex models without specialized explainers
Feature Correlation: Standard approaches may create unrealistic samples
Interpretation: Requires understanding baseline, units, and assumptions
Causality: SHAP doesn't imply causation; use domain knowledge
Approximations: Non-tree methods use approximations; understand accuracy trade-offs

References and Further Reading

Foundational Papers:

Shapley, L. S. (1951). "A value for n-person games"
Lundberg, S. M., & Lee, S. I. (2017). "A Unified Approach to Interpreting Model Predictions" (NeurIPS)
Lundberg, S. M., et al. (2020). "From local explanations to global understanding with explainable AI for trees" (Nature Machine Intelligence)

Key Concepts:

Cooperative game theory and Shapley values
Additive feature attribution methods
Conditional expectation estimation
Tree SHAP algorithm and polynomial-time computation

This theoretical foundation explains why SHAP is a principled, versatile, and powerful tool for model interpretation.