Shapley Value Interpretability

Updated 28 January 2026

Shapley value-based interpretability is a formal framework using cooperative game theory to allocate model outputs to input features while satisfying fairness axioms.
Practical methods like KernelSHAP, permutation sampling, and deep amortized approaches enable efficient approximation of Shapley values for complex models and structured data.
Recent advances integrate statistical guarantees, uncertainty quantification, and privacy mitigation to deliver reliable, contrastive, and real-time model explanations.

Shapley value-based interpretability is a principled class of methods for attributing the output of a machine learning model to its input features via the formalism of cooperative game theory. Shapley values uniquely satisfy fairness, symmetry, dummy, and efficiency axioms, and have become a dominant paradigm in explaining predictions of black-box models across regression, classification, and reinforcement learning. Key advances address how to define the attribution game, handle categorical and structured data, estimate or approximate the value function under sampling constraints, and quantify statistical reliability and privacy risks.

1. Mathematical Formalism and Foundational Properties

At the core of Shapley-based interpretability is the allocation of a model output to features (“players”) as follows. For a model $f:\mathbb{R}^d \to \mathbb{R}$ and input $x \in \mathbb{R}^d$ , the Shapley value of feature $i$ is

$\phi_i(x,f) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!(d-|S|-1)!}{d!} \left[ v(S \cup \{i\}) - v(S) \right]$

where $N = \{1,\ldots,d\}$ and $v(S)$ is the expected model output under a mechanism for “removing” features not in $S$ . Standard choices include marginalizing out or conditioning on held-out features, and the precise definition of $v(S)$ determines whether a method is “true to model” (interventional, marginal) or “true to data” (conditional) (Ter-Minassian et al., 2023).

Shapley values satisfy:

Efficiency: $\sum_i \phi_i(x,f) = f(x) - f($ baseline $)$
Symmetry: Indistinguishable features receive identical attributions.
Dummy: Features that affect $f$ nowhere receive zero attribution.
Additivity: Attributions are linear over sums of games or models.

Efficient computation for $d$ features is $\mathcal{O}(2^d)$ in general, motivating approximation schemes such as weighted least-squares (KernelSHAP) (Jethani et al., 2021), permutation sampling (Mitchell et al., 2021), and learned amortizers (Jethani et al., 2021, Alkhatib et al., 7 May 2025).

2. Defining the Value Function and the Role of Reference Distribution

The choice of value function $v(S)$ is fundamental, impacting both semantics and faithfulness of attributions. Common conventions include:

Marginal SHAP (interventional): $v(S) = \mathbb{E}[f(x_S, X_{-S})]$ with $X_{-S}$ sampled iid from the empirical distribution.
Conditional SHAP (observational): $v(S) = \mathbb{E}[f(x_S, X_{-S}) | X_S = x_S]$ .
Contrastive/Reference-based games: $v_{x,r}(S) = f(z(x,r,S)) - f(r)$ , interpreting explanations relative to a user-supplied $r$ or a distribution $D$ (Merrick et al., 2019).

Different choices yield diverging explanations, notably with conditional SHAP potentially violating the dummy property (nonzero attribution to irrelevant features) if the model does not use feature $i$ but conditional independence is not respected (Merrick et al., 2019, Ter-Minassian et al., 2023).

Shapley attributions can be made fully contrastive, answering, “Why does $f(x)$ differ from the typical case $f(r)$ ?” Selection of $D$ or $r$ must be tailored to the interpretive task, as reference distribution semantics dominate clinical, fairness, and contrastive scenarios (Ter-Minassian et al., 2023, Merrick et al., 2019).

3. Practical Algorithms and Computational Advances

Due to combinatorial explosion, practical estimation incorporates Monte Carlo, regression-based, or learning-based methods:

KernelSHAP/Weighted Least Squares: Approximates Shapley by solving a regression over sampled subsets $S$ , weighted by Shapley kernel $w(S)$ (Jethani et al., 2021). Generalizes to model-agnostic, fast explanations if a surrogate exists.
Permutation Sampling and Quasi-Monte Carlo: Samples random or structured permutations of features to estimate Shapley values, with advanced techniques such as kernel herding, sequential Bayesian quadrature, or low-discrepancy sphere embeddings yielding lower RMSE and faster convergence than vanilla Monte Carlo (Mitchell et al., 2021).
Energy-based and Deep Amortized Methods: Learning the full conditional $p(x_{\bar S}|x_S)$ using energy-based models or deep nets with auxiliary losses provides globally accurate, scalable Shapley estimation for complex models and data (Lu et al., 2024, Alkhatib et al., 7 May 2025).
Intrinsic/Jointly Trained Explanation Models: ViaSHAP (Alkhatib et al., 7 May 2025), Shapley Explanation Networks (Wang et al., 2021), InstaSHAP (Enouen et al., 20 Feb 2025), Shapley-auxiliary multitask frameworks (Fan et al., 16 Dec 2025), and FastSHAP (Jethani et al., 2021) all exploit explicit Shapley objectives embedded at training time to provide explanations at inference “for free”—critical for large-scale and real-time applications.
Structured and Functional Data: For graph-structured or infinite-dimensional data, L-Shapley/C-Shapley (Chen et al., 2018) and continuous Shapley frameworks (Delicado et al., 2024) approximate or extend attributions to local neighborhoods or continuous domains, relying on Markov properties or set-function measures.

For tree ensembles, specialized discrete and leaf-based estimators exploit the tree partition to yield more accurate, variance-stabilized attributions and correct handling of categorical features than “naïve” implementations (Amoukou et al., 2021).

4. Extensions Beyond Classical Features: Coalitions, Structured, and Functional Explanations

Naïvely summing Shapley attributions over dummy-encoded variables is incorrect for categorical data. Coalitional Shapley values—where sets of features are treated as unified players—ensure correct accounting and invariance to encoding (Amoukou et al., 2021). For graph-structured data, L-Shapley and C-Shapley restrict summations to local or connected neighborhoods, relating to the Myerson value and reducing computational complexity under conditional independence (Chen et al., 2018).

In high-dimensional vision tasks, interpretable attributions are achieved by restricting the player set to semantically meaningful regions (e.g., superpixels, high-level attributes), or by generating counterfactuals (e.g., varying “blonde hair” in face GANs) and applying the classic Shapley formula within this space (Lahiri et al., 2022, Fan et al., 16 Dec 2025).

Functional regression introduces the continuous Shapley value, distributing model relevance over an uncountable set (e.g., $t \in [0,1]$ for $X(t)$ ), operationalized via discretization and permutation averaging (Delicado et al., 2024).

5. Statistical Guarantees, Testing, and Limitations

Recent work connects Shapley value marginal contributions to statistical hypothesis testing, particularly via randomized conditional independence tests (SHAP-XRT) (Teneggi et al., 2022). For each local test of feature relevance, expected $p$ -values are bounded in terms of the mean Shapley marginal contribution, and the full Shapley value bounds a combined global null $p$ -value. This formal link enables statistical reporting (effect-size and Type I error) with sharp error control.

However, several fundamental limitations persist:

Predictive and Causal Implications: High Shapley value for a feature does not imply significant predictive loss upon removal nor causal relevance. In settings with complicated dependence or colliders, redundant features may have higher attribution than direct causes (Ma et al., 2020).
Globality of Explanations: Shapley values aggregate over all possible coalitions, so even features “inactive” at $x$ may receive nonzero attribution, especially in models with conditional logic or gated sub-functions (Amoukou et al., 2021).
Estimator Bias and Variance: The accuracy of approximations (marginal, conditional, or learned) depends on the expressiveness of the conditional model, sample size, and the match between the data distribution and the actual model’s local geometry. This is particularly acute for complex dependencies, tail events, or highly correlated features (Lu et al., 2024, Amoukou et al., 2021).
Privacy Risks: Released Shapley vectors can leak substantial information about the underlying sensitive input features, enabling adversaries to reconstruct x from $\phi(f,x)$ , especially with auxiliary data or exploitable local linearities. Mitigations include quantization, noise addition, dimensionality reduction, or cryptographically secure protocols, but all trade off interpretability fidelity (Luo et al., 2024).

6. Application Domains and Case Studies

Shapley value-based interpretability underpins clinical diagnostics (e.g., heart-disease pre-screening (Rodriguez et al., 2024), cohort-specific clinical explanations (Ter-Minassian et al., 2023)), fairness audits, credit scoring, scientific model auditing, multi-agent reinforcement learning (MSV, SHAQ (Wang et al., 2021)), and explanation of high-dimensional models across vision, text, and functional data (Lahiri et al., 2022, Delicado et al., 2024).

Domain-specific advances include pairwise Shapley values for actionable, comparison-based feature attribution (Xu et al., 18 Feb 2025), counterfactual-based explanations in vision (Lahiri et al., 2022), and joint-training approaches for robust, real-time explanation at minimal inference cost (Wang et al., 2021, Alkhatib et al., 7 May 2025, Fan et al., 16 Dec 2025, Jethani et al., 2021).

7. Open Challenges and Methodological Considerations

The continued development of Shapley value-based interpretability is characterized by several challenges:

Optimal selection and justification of reference distributions for task- and domain-specific interpretability.
Balancing computational complexity, estimator variance, and explanation fidelity—especially for large $d$ and complex conditional dependencies.
Integrating statistical confidence intervals and uncertainty quantification into explanations (Merrick et al., 2019).
Reconciling interpretability and privacy, formalizing “explanation privacy,” and constructing robust, privacy-preserving explanations (Luo et al., 2024).
Overcoming the limitations of purely additive decompositions (GAMs, ANOVA-1) when input features are highly correlated or interacting beyond pairwise, as Shapley only characterizes singleton contributions in such cases (Enouen et al., 20 Feb 2025).

The state of the art thus combines principled game-theoretic allocation with advances in conditional estimation, function approximation, sampling, and statistical inference—forming a comprehensive and evolving framework for faithful, efficient, and actionable model interpretation.