Kernel Interpretability Study

Updated 26 January 2026

Kernel interpretability is the process of extracting human-readable insights from kernel-based models using techniques like feature-kernel correspondence and information-theoretic optimality.
The study demonstrates reformulating kernel methods as weighted linear models and applying sparse kernel expansions and prototype learning to achieve feature-level interpretability.
Advanced attribution methods and programmable kernel networks extend interpretability to deep architectures, enabling clear, actionable insights and robust performance guarantees.

Kernel interpretability addresses the challenge of extracting human-understandable insight from kernel-based methods in both classical machine learning and modern deep learning architectures. Approaches range from rigorous mathematical frameworks that establish feature-kernel correspondences and information-theoretic optimality, to practical algorithms enabling feature-level attributions, prototype summarization, clustering explanations, and transparent model decompositions. The field now integrates techniques for exact and approximate equivalence, sparse kernel expansions, post-hoc attributions, and logic-based symbolic interfaces, covering SVMs, kernel PCA, clustering, RL, and deep models.

1. Feature Subspace–Kernel Correspondence and Information-Theoretic Optimality

In the kernel feature space $H_X = \{f\colon X\to\mathbb{R}\}$ with inner product $\langle f_1, f_2\rangle_H = E_X[f_1(X)f_2(X)]$ , every finite-dimensional linear subspace $G\subset H_X$ is associated with a unique symmetric positive-definite projection kernel $k_G(x,x') = f(x)^T\Lambda_f^{-1}f(x')$ , where $f(x)$ stacks any zero-mean basis and $\Lambda_f$ is its covariance. The integral operator $\tau$ defined via $k_G$ yields exact orthogonal projection onto $G$ ; thus, there is a one-to-one correspondence between finite-rank kernels and feature subspaces.

To quantify kernel informativeness, the information-theoretic $H$ -score is introduced: for feature $f$ ,

$H(f) = \frac{1}{2}\mathbb{E}_Y[\|\Lambda_f^{-1/2} E[f(X)|Y]\|_2^2]$

and equivalently for subspace $G$ , $H(G) = \frac{1}{2}\left(\mathbb{E}_{(X,X')} k_G(X,X') - \mathbb{E}_X\mathbb{E}_{X'} k_G(X,X')\right)$ . This captures the normalized separation of class-conditional means and, via HGR theory, maximization leads to the maximal-correlation kernel, which is optimal for prediction error among all projection kernels.

In balanced binary SVMs, kernels corresponding to $G^* = \operatorname{span}\{f^*\}$ , with $f^*$ the maximal-correlation function, lead to minimal prediction error, recovering the MAP classifier. The Fisher kernel is also shown to coincidentally be a maximal-correlation kernel in exponential-family generative models, establishing equivalence and optimality in local neighborhoods (Xu et al., 2023).

2. Re-expressing Kernel Solutions for Feature-level Interpretability

Traditional kernel machines do not offer a direct map of predictions to the original feature space. The interpretable kernels framework proves that for data matrices $X$ with $p\geq n$ (more features than observations, full row-rank), any kernel ridge regression or convex penalized kernel model can be algebraically reformulated as a weighted linear model in the original features, with anisotropic penalty $\gamma^TA^{-1}\gamma$ :

$\min_{\gamma} f(X\gamma) + \lambda \gamma^TA^{-1}\gamma, \quad \hat y = X\hat\gamma$

where $A=B B^T$ encodes the geometry of the original kernel and $\gamma$ has a direct feature interpretation. In the $p<n$ regime, a least-squares approximation recovers approximate interpretability with fidelity proportional to the kernel accounted for $KAF = \|K̂\|_F^2/\|K\|_F^2$ .

Extended to kernel logistic and Poisson regression, the method enables penalized GLMs with explicit anisotropic shrinkage. Computational cost is dominated by eigendecompositions. For correlated predictors, feature interpretations remain valid, though the penalty induces non-trivial shrinkage patterns. Thus, kernel methods admit interpretable coefficients whenever the problem admits an explicit linearization in the original space (Groenen et al., 21 Aug 2025).

3. Interpretable Kernel Prototyping and Feature Selection

Prototype-based interpretability is formalized in Interpretable Multiple-Kernel Prototype Learning (IMKPL). Prototypes are learned via block-coordinate minimization of an objective that integrates sparse reconstructions, class-homogeneity regularization, explicit feature selection via kernel-combination weights $\alpha$ , locality-separation, and interpretability constraints.

Quantification employs dedicated metrics for prototype purity and discriminative usage ( $IP$ , $DR$ ); empirical evaluation demonstrates IMKPL consistently improves interpretability and discriminative grouping compared to baseline kernel-dictionary methods, with prototype dictionaries substantially reduced in size and better feature selection capacity. When Gaussian base kernels coincide with data dimensions, selected $\alpha$ coefficients yield direct discriminative feature selection and minimal prototype overlap. The ability to re-weight or drop base kernels enables simultaneous discovery of locally separated RKHS embeddings, one-class prototype purity, and sparse discriminative feature subsets (Hosseini et al., 2019).

4. Attribution Methods via Kernel-based Feature Weighting

Additive Feature Attribution (AFA) methods such as SHAP, LIME, and their alternatives are unified as solutions to a constrained least-squares kernel-weighted game in the feature coalition space,

$\min_\phi \sum_{S\subseteq N\setminus\{\emptyset\}} (g(z_S) - f(z_S))^2 T_x(S), \quad \sum_{j\in N} \phi_j = V_T(N) - V_T(\emptyset)$

yielding for any kernel $T_x(S)$ :

$\phi_j = \sum_{S \subseteq N, j \in S} T_x(S) V_T(S) + \frac{V_T(N) - V_T(\emptyset) - \sum_{i \in N} \sum_{S \subseteq N, i \in S} T_x(S)V_T(S)}{n}$

Novel kernels are proposed: the LS prenucleolus corresponds to a flat kernel, while linearly and exponentially growing kernels emphasize coalition locality. SHAP is recovered as a concave kernel in coalition size, which theoretically emphasizes singleton and almost-full perturbations and thus may be less aligned with interpretive locality than LIME. This analytic framework enables explicit tuning of bias-variance-locality trade-offs and selection of attribution methods according to kernel properties (Hiraki et al., 2024).

Weighted least-squares extension to higher-order interactions yields KernelSHAP-IQ, which is proven exact for SV and pairwise SII and conjectured for higher orders. Via iterative residual fitting and sampling strategies, KernelSHAP-IQ delivers efficient, high-fidelity interaction attributions for real models, outperforming prior methods (Fumagalli et al., 2024).

5. Sparse and Programmable Kernel Networks for Interpretability

Sparse Pre-image Kernel Machine (SPKM) directly addresses the interpretability deficit in classical representer-theorem-based kernel learning. The SPKM paradigm posits models as sparse expansions in learned, potentially sparse basis vectors $u_r$ in the original space:

$f(x) = \sum_{r=1}^R c_r k(x, u_r)$

Dual sparsity ( $R \ll n$ ) and primal sparsity ( $u_r$ sparse in coordinates) yield prototype-style explanations, direct feature-level insights, and highly compact models. Theoretical Rademacher bounds guarantee generalization parity with traditional kernel machines. Empirical studies confirm interpretability with competitive accuracy, substantial speedup, and utility as task-aware landmarks for Nyström approximations (Huusari et al., 2021).

Programmable regression networks operate modularly, decomposing observed signals into optimally recovered modes by solving a product-space norm minimization—equivalent to saddle-point games and kriging. Nonlinear recomposition steps (pooling, graph cuts) operate on energy/variance criteria with explicit meaning; all stages admit theoretical guarantees and provable min-max optimality. Assemblies structurally mimic CNNs, but every step has interpretable recovery, decomposition, and pooling functions (Owhadi et al., 2019).

6. Interpretability in Deep Kernel and RKHS-based Architectures

Kernel interpretability extends beyond classical kernels into CNNs and RL. For CNNs, constrained optimization is used to construct modified inputs that preserve the activation of a target kernel while suppressing all others, achieving crisp disentanglement of mixed-feature regions. Performance metrics indicate superiority over traditional attribution methods in fidelity and specificity, while multiple solutions reveal robust sub-features. Limitations include tuning sensitivity and the need for more expressive regularizers (Zhuang et al., 2021).

Neurosymbolic methods (NeSyFOLD-G) construct symbolic rule-sets from last-layer kernel groupings in CNNs via cosine similarity, binarization, and logic program induction, subsequently mapping to semantic concepts using segmentation masks. The result is a compact, semantic, human-visible decision theory that quantitatively reduces the explainer complexity by 30–60% with no sacrifice in accuracy, and admits full symbolic justification for network decisions (Padalkar et al., 2023).

For RL, attribution-weighted actor-critic methods (RSA2C) embed actor, value, and advantage critics in RKHSs with operator-valued and scalar kernels, respectively, and compute state-value attributions via RKHS–SHAP (kernel mean embedding for on-manifold and conditional mean embedding for off-manifold expectations). Attributions are directly used for Mahalanobis-gated gradient modulation, with provable non-asymptotic convergence bounds and empirical confirmation of efficiency, stability, and rich attribution maps for state-dimension impacts (Li et al., 4 Dec 2025).

Unified training and attribution frameworks (ExPLAIND) employ path kernels—kernel expansions tracking gradient trajectories over training steps and parameters—yielding exact equivalence between trained networks and kernel machines, and enabling step- and parameter-wise influence scores that facilitate pruning, mechanistic analysis, and unified model/data/training attributions. Application to grokking phenomena shows alignment and memorization phases explicable via kernel component metrics (Eichin et al., 26 May 2025).

7. Kernel Selection, Identifiability, and Recommendations

For Gaussian process models, additive Matérn mixtures collapse in behavior to their least smooth (lowest-regularity) component, as demonstrated both theoretically and empirically. Individual mixture parameters—weights, scales, smoothness—are unidentifiable except for their microergodic combination. In separable multi-output kernels $K(x,y) = A K_0(x,y)$ , only the relative correlations $A_{ij}/\sqrt{A_{ii}A_{jj}}$ (up to scale) are identifiable. Empirical studies show no predictive advantage in mixtures over pure least-smooth kernels. Practitioners should select kernel structures with care, avoiding over-parameterized mixtures if interpretability or identifiability of sub-kernel parameters is desired (Chen et al., 2023).

8. Kernel PCA Feature Importance and Interpretability

Kernel PCA Interpretable Gradient (KPCA-IG) computes feature-level importances by analyzing the sensitivity of nonlinear principal components to perturbations in original variables. For each feature, the squared norm of the gradient of the projection onto leading components is averaged, yielding an interpretable ranking. Empirical results confirm that KPCA-IG selected features retain (or improve) clustering quality and generalize variance explained, with computational cost linear in features and cubic in sample size, enabling high-dimensional analysis of genomics data and reliable biomarker selection (Briscik et al., 2023).