Marginal SHAP Tensor
- MST is a formal framework that unifies local and coalitional SHAP values using tensor networks to encode feature attributions.
- It employs tensor contraction and tensor train methods to compute SHAP values efficiently with provable complexity guarantees.
- The approach extends to models like decision trees, linear models, and binarized neural networks, enabling scalable interpretability.
The Marginal SHAP Tensor (MST) is a formalization of Shapley additive explanations (SHAP) in the context of Tensor Networks (TNs). It unifies local and coalitional SHAP values into a structured, contractible tensor object that encodes feature attributions for any input–feature–output tuple, and allows provably tractable computation under restricted model classes, notably Tensor Trains (TTs). The MST also underpins reductions and computational results for a broad class of machine learning models, including decision trees, tree ensembles, linear models, and binarized neural networks, by leveraging the algebra of tensor contraction and parallelism (Marzouk et al., 24 Oct 2025).
1. Mathematical Definition of Marginal SHAP Tensor
Let denote the number of input features, the output dimension, a finite input domain, a tensor network model, and a data-generating distribution (also represented as a TN over ). The Marginal SHAP Tensor is defined as
where is the SHAP value for feature at input , relative to under . Specifically,
with the combinatorial factor , and the marginal value operator
Alternatively, the construction utilizes two explicit tensors:
- Modified Weighted Coalitional Tensor , encoding weight signs and factorial coefficients,
- Marginal-Value Tensor , collecting all coalitional marginal expectations.
The full contraction
over the suitable mode pairings (feature–coalition modes) yields the MST (Marzouk et al., 24 Oct 2025).
2. Coalitional Properties and Encoded Contributions
The MST encodes, for all inputs , features , and outputs, the complete set of coalitional SHAP contributions. Contraction along the feature index recovers classic Shapley coefficients: where, in the MST formalism, is replaced by . determines the sign and combinatorial scaling, while provides the expected model outcome for every possible coalition (Marzouk et al., 24 Oct 2025).
3. Algorithmic Construction and Complexity Analysis
For general TNs, constructing the MST decomposes into the following steps:
- Formation of as a sparse "coalitional–weight" TT, with cores of size . This step is parallelizable in time using processors.
- Assembly of via "router" tensors , which control the switching of features between model and distribution channels, each sparse and parallelizable in time per feature.
- Single-mode contraction over all coalitional indices to produce the final MST.
The computational complexity for general TNs is dominated by the optimal contraction ordering problem, and, in the absence of special structure, is #P-hard [(Marzouk et al., 24 Oct 2025), Prop. 4.1].
4. Efficient Parallel MST Computation in Tensor Train Models
When both and are presented as Tensor Trains,
the MST itself admits a TT representation of length . The -th core of the TT form is
The computation of SHAP values for a given input reduces to right-contraction of this TT with the one-hot encoding of . Each core multiplication is a matrix operation in NC (parallelizable), and the polychain contraction yields overall complexity in NC, i.e., parallel time with polynomially many processors [(Marzouk et al., 24 Oct 2025), Theorem 4.2, Prop. 4.3].
Consequently, MST leads to provably efficient, parallelizable SHAP evaluation for TT-structured models, providing strict complexity guarantees unattainable for arbitrary black-box networks.
5. Extensions to Structured Models via TT Reductions
Many popular model classes reduce in NC to Tensor Trains, and thus inherit the tractable SHAP computation of TT-MST:
- Decision trees and ensembles: Each path is encoded as a DFA; the set product is a TN reducible to a TT [(Marzouk et al., 24 Oct 2025), Theorem 5.1]. Therefore, SHAP explanations in tree models, when cast via TT, are in NC.
- Linear models and linear RNNs: Second-order linear RNNs are equivalent to stationary TTs of bounded bond-dimension. This subsumes linear regression as a trivial subclass.
- Binarized Neural Networks (BNNs): Parameterized complexity analysis provides:
This analysis isolates width (with optional sparsity) as the critical bottleneck for SHAP computation in BNNs, while depth alone does not guarantee tractability [(Marzouk et al., 24 Oct 2025), Theorems 6.1.1–6.1.3].
6. Relation to Feature Discarding and Global Interpretations
The MST framework is naturally compatible with the aggregation principle described for global SHAP over the product-of-marginals ("extended") distribution. Aggregate SHAP values computed over this domain permit theoretically sound feature selection and safe discarding:
- If the aggregate SHAP value over the extended support is small, there exists a function not depending on the th feature such that and are close in , with quantitative bound [(Bhattacharjee et al., 29 Mar 2025), Theorem 2]:
for aggregate SHAP below .
- In practice, the MST structure can be leveraged for robust global interpretability and efficient feature pruning by integrating efficient parallelization and the computation of aggregate attributions via tensor contractions, guaranteeing that aggregate measures correspond to negligible per-feature effect in the model (Marzouk et al., 24 Oct 2025, Bhattacharjee et al., 29 Mar 2025).
7. Significance and Broader Implications
The Marginal SHAP Tensor supplies a unified, algebraic and computational framework for SHAP value analysis and computation beyond the classical tabular or ad hoc settings:
- It reduces the exponential coalitional sum underlying SHAP values to a single structured tensor contraction, making the problem tractable for TT-type and similarly structured models.
- The construction generalizes efficiently to classes reducible to TTs, providing new parallel algorithms for SHAP in decision trees, tree ensembles, linear models, and linear RNNs.
- Parameterized complexity results for BNNs localize the computational hardness to network width, demonstrating that width and not depth is the limiting factor for tractable attributions in TN-aligned representations.
- These results advance understanding of the computational–statistical tradeoffs in explainable AI, clarifying the boundaries of efficient SHAP computation and the formal justification for global feature importance aggregation in both synthetic and practical modeling scenarios (Marzouk et al., 24 Oct 2025, Bhattacharjee et al., 29 Mar 2025).