Sparse GP MIL in Digital Pathology

Updated 12 January 2026

The paper introduces SGPMIL, a probabilistic MIL framework that leverages sparse Gaussian processes to generate uncertainty-aware attention maps for digital pathology.
It employs variational inference with a learnable feature scaling term, enabling efficient and robust bag-level and instance-level classification.
Empirical results on benchmarks like CAMELYON16 and PANDA demonstrate improved classification performance and faster computation compared to traditional methods.

Sparse Gaussian Process Multiple Instance Learning (SGPMIL) is a probabilistic attention-based framework designed for Multiple Instance Learning (MIL) settings, particularly in domains where only coarse, bag-level labels are available and instance-level annotations are absent. MIL is prominent in digital pathology, where input images are partitioned into large sets of patches (instances), but only slide-level diagnostic outcomes ("bags") are available. SGPMIL leverages sparse Gaussian Processes (SGP) to model uncertainty over attention scores, resulting in principled uncertainty quantification, more reliable instance relevance estimation, and interpretability at both bag and instance levels (Lolos et al., 11 Jul 2025).

1. Problem Formulation and Model Architecture

SGPMIL operates within the standard MIL framework: its training set consists of $N$ bags, $\mathcal{D}=\{(X_i,Y_i)\}_{i=1}^N$ , where $X_i=\{x_{i1},\ldots,x_{iK_i}\}$ with $x_{ik}\in\mathbb{R}^D$ represents $K_i$ instances per bag. Only bag-level labels $Y_i\in\{1,\ldots,C\}$ are observed. The core MIL assumption is: $Y_i = \begin{cases} 0, & \text{if }\sum_k y_{ik}=0, \ c, & \text{if }\exists\,k:y_{ik}=c,\;c\in\{1,\ldots,C\}, \end{cases}$ where $y_{ik}$ denote latent instance labels.

SGPMIL defines a latent function $f:\mathbb{R}^d\to\mathbb{R}$ , operating on a linear projection of instance embeddings, to generate unnormalized attention scores. $f$ is assigned a Gaussian Process prior,

$p(f) = \mathcal{GP}\big(m(\cdot),\,k(\cdot,\cdot)\big),$

with mean function $m(\cdot)$ (zero by default) and RBF kernel $k(x,x')$ parameterized by amplitude $A$ , lengthscale matrix $\Theta$ , and jitter $C$ .

To ensure scalability, SGPMIL employs $M$ inducing points, $\mathbf{Z}=\{z_1,\ldots,z_M\}$ and their latent values $\mathbf{U}$ . The joint GP prior over $\mathbf{F}$ (function values at all instances) and $\mathbf{U}$ is Gaussian: $p(\mathbf{F},\mathbf{U}) = \mathcal{N}\bigg( \begin{bmatrix}\mathbf{F}\\mathbf{U}\end{bmatrix} \Big| \begin{bmatrix}m_{\mathbf{X}}\m_{\mathbf{Z}}\end{bmatrix}, \begin{bmatrix} K_{XX} & K_{XZ} \ K_{ZX} & K_{ZZ} \end{bmatrix} \bigg)$ where kernel matrices are defined as $K_{ZZ}[ij]=k(z_i,z_j)$ , $K_{XZ}[ik]=k(x_i,z_k)$ , $K_{XX}[ij]=k(x_i,x_j)$ .

The conditional for instance locations is: $p(\mathbf{F}\mid\mathbf{U}) = \mathcal{N}\left( m_\mathbf{X} + K_{XZ}K_{ZZ}^{-1}(\mathbf{U}-m_\mathbf{Z}), K_{XX}-K_{XZ}K_{ZZ}^{-1}K_{ZX} \right)$

SGPMIL augments the prior mean with a learnable linear-affine feature scaling term: $\mu_\mathbf{X} = W_A\,X + b$ enabling adaptation to the scale of learned representations, with $X$ the matrix of projected features.

2. Variational Inference and Optimization

SGPMIL employs variational inference to approximate the posterior over latent GP values. The variational posterior is: $q(\mathbf{U}) = \mathcal{N}(\mathbf{U}\mid \mathbf{m}_U,\,S_{ZZ})$ leading to a marginal over $\mathbf{F}$ computed as: $q(\mathbf{F}) = \int p(\mathbf{F}\mid\mathbf{U})\,q(\mathbf{U})\,d\mathbf{U}$ yielding a closed-form Gaussian.

Training maximizes the Evidence Lower Bound (ELBO): $\log p(Y) \geq \mathbb{E}_{q(\mathbf{A})}\big[\log p(Y\mid \mathbf{A})\big] - \mathrm{KL}\big[q(\mathbf{U})\,\|\,p(\mathbf{U})\big]$ where $\mathbf{A} \equiv \mathbf{F}$ are latent attention variables. For each bag $i$ , this expectation is computed via Monte Carlo sampling: $\mathbb{E}_{q(\mathbf{A}_i)}\big[\log p(Y_i\mid \mathbf{A}_i)\big] \approx \frac{1}{N_s}\sum_{s=1}^{N_s}\log p(Y_i\mid \mathbf{A}_i^{(s)})$ with each likelihood term implemented as cross-entropy under the downstream classification network.

Optimization utilizes stochastic variational inference with the local reparameterization trick for efficient sampling, sampling one bag and $N_s$ attention draws per minibatch. All parameters $\{W_A, b, \theta, \mathbf{Z}, \mathbf{m}_U, S_{ZZ}\}$ and classifier parameters are updated via AdamW.

3. Attention Mechanism and Uncertainty Quantification

SGPMIL generates uncertainty-aware attention maps by treating the GP posterior at each instance $x_{ik}$ as $\mathcal N(\mu_{ik},\,\sigma^2_{ik})$ . $N_s$ samples $f_{ik}^{(s)}$ are drawn, mapped through a sigmoid: $a_{ik}^{(s)} = \mathrm{sigmoid}(f_{ik}^{(s)}) \in (0,1)$

Normalization is performed using the relaxed constraint $\frac{1}{N_s}\sum_s a_{ik}^{(s)}\in[0,1]$ , rather than hard softmax, permitting multiple patches with high attention per bag.

Bag embedding for each sample is computed as $\sum_k a_{ik}^{(s)} h_{ik}$ , then classified into $p_i^{(s)}$ . At inference, mean attention $\bar a_{ik}$ and sample variance $\mathrm{Var}_s[a_{ik}^{(s)}]$ quantify uncertainty. Heatmaps use $\bar a_{ik}$ ; uncertainty maps use variance. On bag-level, standard deviation of predicted probabilities across $N_s$ samples correlates with misclassification likelihood.

4. Computational Efficiency and Scalability

SGPMIL achieves scalability to large datasets via sparse approximation and computational optimizations:

For $K$ $K$ instances per bag, $M$ $M$ inducing points, and $D$ $D$ features:
- Exact GP complexity is $\mathcal O((KM)^3)$ .
- SGPMIL with diagonal covariance:
- Kernel-times vector $K_{XZ}$ : $\mathcal O(KM)$
- Inversion of $K_{ZZ}$ : $\mathcal O(M^3)$ (cached per epoch)
- Local reparameterization sampling: $\mathcal O(KM)$
- Amortized per-bag cost is $\mathcal O(KM + M^3)$ ; memory usage is $\mathcal O(KM + M^2)$ .

Efficiency is further improved by:

Diagonal approximation to $q(\mathbf{A})$ , avoiding full Cholesky decomposition.
Feature-scaling linear term $W_A\,X$ accelerates kernel hyperparameter adaptation.

5. Empirical Performance and Evaluation

SGPMIL demonstrates competitive to superior performance on prominent digital pathology benchmarks—CAMELYON16, TCGA-NSCLC, PANDA (6-class), and BRACS (3-class):

For bag-level classification, SGPMIL matches or outperforms state-of-the-art deterministic attention-based MIL methods; on PANDA and BRACS, achieves 2–3% gains in ACC/AUC.
Instance-level evaluation computes ROC AUC, ACC, and ACE using attention mean $\bar a_{ik}$ as positive-patch probability against pixel-level tumor masks. On CAMELYON16, SGPMIL attains instance-AUC 0.973 and instance-ACE 0.051, best among all methods tested.

Ablation analysis (Table 1) illustrates that inclusion of the learnable mean (LM/feature scaling), sigmoid (vs softmax) post-activation, and diagonal covariance jointly provide fastest ( $\times3$ ) and most accurate configuration (bag-AUC 0.986, instance-AUC 0.973).

Configuration	Bag-AUC	Instance-AUC	Relative Speed
LM + sigmoid + diagonal	0.986	0.973	$\times3$

6. Context, Advantages, and Future Directions

SGPMIL provides a Bayesian treatment of instance attention, leading to calibrated uncertainty for both instance and bag/slide prediction. Feature-scaling of the GP mean and diagonal-covariance structure enable runtimes comparable to deterministic attention-based MIL. Improved instance localization and correlation between confidence estimates and accuracy are notable benefits.

Present limitations include focus on unimodal image patches. Potential extensions are integration of clinical or genomic covariates, use of multi-scale or learned kernels to relax single-scale RBF assumptions, and exploration of multi-layer GPs (deep GP) or non-Gaussian likelihoods, e.g., Poisson for count data.

In summary, SGPMIL synthesizes sparse Gaussian processes, learnable feature scaling, and efficient variational inference to deliver a fast, stable, and uncertainty-aware attention-based MIL model exhibiting state-of-the-art bag and instance-level performance in computational pathology (Lolos et al., 11 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (1)

SGPMIL: Sparse Gaussian Process Multiple Instance Learning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Gaussian Process MIL (SGPMIL).