Marginal SHAP Tensor

Updated 13 December 2025

MST is a formal framework that unifies local and coalitional SHAP values using tensor networks to encode feature attributions.
It employs tensor contraction and tensor train methods to compute SHAP values efficiently with provable complexity guarantees.
The approach extends to models like decision trees, linear models, and binarized neural networks, enabling scalable interpretability.

The Marginal SHAP Tensor (MST) is a formalization of Shapley additive explanations (SHAP) in the context of Tensor Networks (TNs). It unifies local and coalitional SHAP values into a structured, contractible tensor object that encodes feature attributions for any input–feature–output tuple, and allows provably tractable computation under restricted model classes, notably Tensor Trains (TTs). The MST also underpins reductions and computational results for a broad class of machine learning models, including decision trees, tree ensembles, linear models, and binarized neural networks, by leveraging the algebra of tensor contraction and parallelism (Marzouk et al., 24 Oct 2025).

1. Mathematical Definition of Marginal SHAP Tensor

Let $n_{in}$ denote the number of input features, $n_{out}$ the output dimension, $D$ a finite input domain, $M: D \rightarrow \mathbb{R}^{n_{out}}$ a tensor network model, and $P$ a data-generating distribution (also represented as a TN over $D$ ). The Marginal SHAP Tensor $T^{(M, P)} \in \mathbb{R}^{|D| \times n_{in} \times n_{out}}$ is defined as

$T^{(M, P)}_{x, i, :} = \phi_i(M, x, P),$

where $\phi_i(M, x, P)$ is the SHAP value for feature $i$ at input $x$ , relative to $M$ under $P$ . Specifically,

$\phi_i(M, x, P) = \sum_{S \subseteq [n_{in}] \setminus \{i\}} W(S) \cdot \left[ V_M(x, S \cup \{i\}; P) - V_M(x, S; P) \right],$

with the combinatorial factor $W(S) = \frac{|S|! (n_{in} - |S| - 1)!}{n_{in}!}$ , and the marginal value operator

$V_M(x, S; P) = \mathbb{E}_{x' \sim P}[M(x_S, x'_{\bar S})].$

Alternatively, the construction utilizes two explicit tensors:

Modified Weighted Coalitional Tensor $\widetilde{W} \in \mathbb{R}^{n_{in} \times 2^{n_{in}}}$ , encoding weight signs and factorial coefficients,
Marginal-Value Tensor $V^{(M, P)} \in \mathbb{R}^{|D| \times 2^{n_{in}} \times n_{out}}$ , collecting all coalitional marginal expectations.

The full contraction

$T^{(M, P)} = \widetilde{W} \times_{S} V^{(M, P)}$

over the suitable mode pairings $S$ (feature–coalition modes) yields the MST (Marzouk et al., 24 Oct 2025).

2. Coalitional Properties and Encoded Contributions

The MST $T^{(M, P)}$ encodes, for all inputs $x$ , features $i$ , and outputs, the complete set of coalitional SHAP contributions. Contraction along the feature index recovers classic Shapley coefficients: $\phi_i(f) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|! (n - |S| - 1)!}{n!} [ f(S \cup \{i\}) - f(S) ],$ where, in the MST formalism, $f(S)$ is replaced by $V_M(x, S; P)$ . $\widetilde{W}$ determines the sign and combinatorial scaling, while $V^{(M, P)}$ provides the expected model outcome for every possible coalition (Marzouk et al., 24 Oct 2025).

3. Algorithmic Construction and Complexity Analysis

For general TNs, constructing the MST decomposes into the following steps:

Formation of $\widetilde{W}$ as a sparse "coalitional–weight" TT, with cores $G^{(1)},\ldots, G^{(n_{in})}$ of size $O(n_{in}^2 \times 2 \times n_{in}^2)$ . This step is parallelizable in $O(\log n_{in})$ time using $O(n_{in}^3)$ processors.
Assembly of $V^{(M, P)}$ via "router" tensors $M^{(i)}$ , which control the switching of features between model and distribution channels, each sparse and parallelizable in $O(1)$ time per feature.
Single-mode contraction over all coalitional indices to produce the final MST.

The computational complexity for general TNs is dominated by the optimal contraction ordering problem, and, in the absence of special structure, is #P-hard [(Marzouk et al., 24 Oct 2025), Prop. 4.1].

4. Efficient Parallel MST Computation in Tensor Train Models

When both $M$ and $P$ are presented as Tensor Trains,

$M = \llbracket I^{(1)}, \ldots, I^{(n_{in})} \rrbracket, \quad P = \llbracket P^{(1)}, \ldots, P^{(n_{in})} \rrbracket,$

the MST $T^{(M, P)}$ itself admits a TT representation of length $n_{in}$ . The $i$ -th core of the TT form is

$C^{(i)} = M^{(i)} \times_{(4,2)} I^{(i)} \times_{(3,2)} P^{(i)} \times_{(2,2)} G^{(i)}.$

The computation of SHAP values for a given input reduces to right-contraction of this TT with the one-hot encoding of $x$ . Each core multiplication is a matrix operation in NC $^1$ (parallelizable), and the polychain contraction yields overall complexity in NC $^2$ , i.e., $(\log n_{in})^2$ parallel time with polynomially many processors [(Marzouk et al., 24 Oct 2025), Theorem 4.2, Prop. 4.3].

Consequently, MST leads to provably efficient, parallelizable SHAP evaluation for TT-structured models, providing strict complexity guarantees unattainable for arbitrary black-box networks.

5. Extensions to Structured Models via TT Reductions

Many popular model classes reduce in NC to Tensor Trains, and thus inherit the tractable SHAP computation of TT-MST:

Decision trees and ensembles: Each path is encoded as a DFA; the set product is a TN reducible to a TT [(Marzouk et al., 24 Oct 2025), Theorem 5.1]. Therefore, SHAP explanations in tree models, when cast via TT, are in NC $^2$ .
Linear models and linear RNNs: Second-order linear RNNs are equivalent to stationary TTs of bounded bond-dimension. This subsumes linear regression as a trivial subclass.
Binarized Neural Networks (BNNs): Parameterized complexity analysis provides:
- Para-NP-hardness in depth: Constant-depth BNNs remain #P-hard for SHAP computation.
- XP in width: Width- $W$ BNNs compile to a TT of bond dimension $O(R^W)$ , yielding SHAP complexity $n^{O(1)} \cdot 2^{O(R^W)}$ (XP in $W$ ).
- FPT in (width, sparsity): Fixed width and cardinality yield fixed-parameter tractability.

This analysis isolates width (with optional sparsity) as the critical bottleneck for SHAP computation in BNNs, while depth alone does not guarantee tractability [(Marzouk et al., 24 Oct 2025), Theorems 6.1.1–6.1.3].

6. Relation to Feature Discarding and Global Interpretations

The MST framework is naturally compatible with the aggregation principle described for global SHAP over the product-of-marginals ("extended") distribution. Aggregate SHAP values computed over this domain permit theoretically sound feature selection and safe discarding:

If the aggregate SHAP value over the extended support is small, there exists a function $g$ not depending on the $i$ th feature such that $f$ and $g$ are close in $L^2(\mu^*)$ , with quantitative bound [(Bhattacharjee et al., 29 Mar 2025), Theorem 2]:

$\int_{\mathrm{supp}(\mu^*)} (f(x) - g(x))^2 d\mu^*(x) < d^2 \varepsilon$

for aggregate SHAP below $\varepsilon$ .

In practice, the MST structure can be leveraged for robust global interpretability and efficient feature pruning by integrating efficient parallelization and the computation of aggregate attributions via tensor contractions, guaranteeing that aggregate measures correspond to negligible per-feature effect in the model (Marzouk et al., 24 Oct 2025, Bhattacharjee et al., 29 Mar 2025).

7. Significance and Broader Implications

The Marginal SHAP Tensor supplies a unified, algebraic and computational framework for SHAP value analysis and computation beyond the classical tabular or ad hoc settings:

It reduces the exponential coalitional sum underlying SHAP values to a single structured tensor contraction, making the problem tractable for TT-type and similarly structured models.
The construction generalizes efficiently to classes reducible to TTs, providing new parallel algorithms for SHAP in decision trees, tree ensembles, linear models, and linear RNNs.
Parameterized complexity results for BNNs localize the computational hardness to network width, demonstrating that width and not depth is the limiting factor for tractable attributions in TN-aligned representations.
These results advance understanding of the computational–statistical tradeoffs in explainable AI, clarifying the boundaries of efficient SHAP computation and the formal justification for global feature importance aggregation in both synthetic and practical modeling scenarios (Marzouk et al., 24 Oct 2025, Bhattacharjee et al., 29 Mar 2025).

Markdown Report Issue Upgrade to Chat

References (2)

SHAP Meets Tensor Networks: Provably Tractable Explanations with Parallelism (2025)

How to safely discard features based on aggregate SHAP values (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Marginal SHAP Tensor (MST).