scientific-skills/shap/references/theory.md
This document explains the theoretical foundations of SHAP (SHapley Additive exPlanations), including Shapley values from game theory, the principles that make SHAP unique, and connections to other explanation methods.
SHAP is grounded in Shapley values, a solution concept from cooperative game theory developed by Lloyd Shapley in 1951.
Core Concept: In cooperative game theory, players collaborate to achieve a total payoff, and the question is: how should this payoff be fairly distributed among players?
Mapping to Machine Learning:
For a feature $i$, its Shapley value $\phi_i$ is:
$$\phi_i = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} [f(S \cup {i}) - f(S)]$$
Where:
Interpretation: The Shapley value averages the marginal contribution of feature $i$ across all possible feature coalitions (subsets). The contribution is weighted by how likely each coalition is to occur.
1. Efficiency (Additivity): $$\sum_{i=1}^{n} \phi_i = f(x) - f(\emptyset)$$
The sum of all SHAP values equals the difference between the model's prediction for the instance and the expected value (baseline).
This is why SHAP waterfall plots always sum to the total prediction change.
2. Symmetry: If two features $i$ and $j$ contribute equally to all coalitions, then $\phi_i = \phi_j$.
Features with identical effects receive identical attribution.
3. Dummy: If a feature $i$ doesn't change the model output for any coalition, then $\phi_i = 0$.
Irrelevant features receive zero attribution.
4. Monotonicity: If a feature's marginal contribution increases across coalitions, its Shapley value increases.
Computing exact Shapley values requires evaluating the model on all possible feature coalitions:
This exponential complexity makes exact computation intractable for most real-world models.
SHAP connects Shapley values to additive feature attribution methods, enabling efficient computation.
Additive Feature Attribution Model: $$g(z') = \phi_0 + \sum_{i=1}^{M} \phi_i z'_i$$
Where:
SHAP proves that Shapley values are the only attribution values satisfying three desirable properties: local accuracy, missingness, and consistency.
Property: The explanation matches the model's output: $$f(x) = g(x') = \phi_0 + \sum_{i=1}^{M} \phi_i x'_i$$
Interpretation: SHAP values exactly account for the model's prediction. This enables waterfall plots to precisely decompose predictions.
Property: If a feature is missing (not observed), its attribution is zero: $$x'_i = 0 \Rightarrow \phi_i = 0$$
Interpretation: Only features that are present contribute to explanations.
Property: If a model changes so a feature's marginal contribution increases (or stays the same) for all inputs, that feature's attribution should not decrease.
Interpretation: If a feature becomes more important to the model, its SHAP value reflects this. This enables meaningful model comparisons.
SHAP unifies several existing explanation methods by showing they're special cases of Shapley values under specific assumptions.
LIME's Approach: Fit a local linear model around a prediction using perturbed samples.
Connection to SHAP: LIME approximates Shapley values but with suboptimal sample weighting. SHAP uses theoretically optimal weights derived from Shapley value formula.
Key Difference: LIME's loss function and sampling don't guarantee consistency or exact additivity; SHAP does.
DeepLIFT's Approach: Backpropagate contributions through neural networks by comparing to reference activations.
Connection to SHAP: DeepExplainer uses DeepLIFT but averages over multiple reference samples to approximate conditional expectations, yielding Shapley values.
LRP's Approach: Decompose neural network predictions by propagating relevance scores backward through layers.
Connection to SHAP: LRP is a special case of SHAP with specific propagation rules. SHAP generalizes these rules with Shapley value theory.
Integrated Gradients' Approach: Integrate gradients along path from baseline to input.
Connection to SHAP: When using a single reference point, Integrated Gradients approximates SHAP values for smooth models.
Different SHAP explainers use specialized algorithms to compute Shapley values efficiently for specific model types.
Innovation: Exploits tree structure to compute exact Shapley values in polynomial time instead of exponential.
Algorithm:
Complexity: $O(TLD^2)$ where $T$ = number of trees, $L$ = max leaves, $D$ = max depth
Key Advantage: Exact Shapley values computed efficiently for tree-based models (XGBoost, LightGBM, Random Forest, etc.)
Innovation: Uses weighted linear regression to estimate Shapley values for any model.
Algorithm:
Complexity: $O(n \cdot 2^M)$ but approximates with fewer samples
Key Advantage: Model-agnostic; works with any prediction function
Trade-off: Slower than specialized explainers; approximate rather than exact
Innovation: Combines DeepLIFT with Shapley value sampling.
Algorithm:
Complexity: $O(n \cdot m)$ where $m$ = number of reference samples
Key Advantage: Efficiently approximates Shapley values for deep neural networks
Innovation: Closed-form Shapley values for linear models.
Algorithm:
Complexity: $O(n)$ - nearly instantaneous
Key Advantage: Exact Shapley values with minimal computation
Computing $f(S)$ (model output given only features in $S$) requires handling missing features.
Question: How should we represent "missing" features when the model requires all features as input?
1. Interventional (Marginal) Approach:
2. Observational (Conditional) Approach:
Trade-offs:
TreeExplainer supports both via feature_perturbation parameter.
The baseline $\phi_0 = E[f(x)]$ represents the model's average prediction.
For TreeExplainer:
For DeepExplainer / KernelExplainer:
SHAP values have the same units as the model output:
Magnitude: Higher absolute SHAP value = stronger feature impact
Sign:
For a prediction $f(x)$: $$f(x) = E[f(X)] + \sum_{i=1}^{n} \phi_i(x)$$
Example:
Local (Instance-level):
Global (Dataset-level):
Key Insight: Global importance is the aggregation of local importances, maintaining consistency between instance and dataset explanations.
Permutation Importance:
SHAP:
Feature Coefficients ($w_i$):
SHAP for Linear Models:
Gini/Split Importance:
SHAP (Tree SHAP):
Standard SHAP captures main effects. SHAP interaction values capture pairwise interactions.
Formula for Interaction: $$\phi_{i,j} = \sum_{S \subseteq F \setminus {i,j}} \frac{|S|!(|F|-|S|-2)!}{2(|F|-1)!} \Delta_{ij}(S)$$
Where $\Delta_{ij}(S)$ is the interaction effect of features $i$ and $j$ given coalition $S$.
Interpretation:
Property: $$\phi_i = \phi_{i,i} + \sum_{j \neq i} \phi_{i,j}$$
Main SHAP value equals main effect plus half of all pairwise interactions involving feature $i$.
TreeExplainer supports exact interaction computation:
explainer = shap.TreeExplainer(model)
shap_interaction_values = explainer.shap_interaction_values(X)
Limitation: Exponentially complex for other explainers (only practical for tree models)
Exact Computation: $O(2^n)$ - intractable for large $n$
Specialized Algorithms:
Implication: For non-tree models with many features, explanations may be approximate.
Kernel SHAP and Basic Implementation: Assume features can be independently manipulated
Challenge: Real features are often correlated (e.g., height and weight)
Solutions:
Issue: Creating coalitions by replacing features may create unrealistic samples (outside training distribution)
Example: Setting "Age=5" and "Has PhD=Yes" simultaneously
Implication: SHAP values reflect model behavior on potentially unrealistic inputs
Mitigation: Use observational approach or carefully selected background data
SHAP measures association, not causation
SHAP answers: "How does the model's prediction change with this feature?" SHAP does NOT answer: "What would happen if we changed this feature in reality?"
Example:
Implication: Use domain knowledge to interpret SHAP causally; SHAP alone doesn't establish causation.
SHAP is the unique attribution method satisfying:
Proof: Lundberg & Lee (2017) showed Shapley values are the only solution satisfying these axioms.
SHAP values correspond to first-order terms in functional ANOVA decomposition: $$f(x) = f_0 + \sum_i f_i(x_i) + \sum_{i,j} f_{ij}(x_i, x_j) + ...$$
Where $f_i(x_i)$ captures main effect of feature $i$, and $\phi_i \approx f_i(x_i)$.
SHAP generalizes sensitivity analysis:
Gradient-based methods (GradientExplainer, Integrated Gradients) approximate SHAP using derivatives.
Foundational Papers:
Key Concepts:
This theoretical foundation explains why SHAP is a principled, versatile, and powerful tool for model interpretation.