Interpretable VAMP-Net for MTB Drug Resistance

Updated 1 January 2026

The paper introduces VAMP-Net, a hybrid deep learning model that combines a permutation-invariant set attention transformer with a quality-aware 1D-CNN to accurately predict drug resistance in MTB.
It employs an adaptive gating fusion mechanism to integrate biological and technical features, achieving superior performance with AUC scores close to 97%.
The framework enhances interpretability with attention analyses and integrated gradients, facilitating insights into epistatic interactions and variant-level quality for clinical applications.

The Interpretable Variant-Aware Multi-Path Network (VAMP-Net) is a supervised deep learning framework for robust genomic prediction of drug resistance in Mycobacterium tuberculosis (MTB). It implements two complementary machine learning pathways: a permutation-invariant Set Attention Transformer to capture epistatic interactions between genomic loci, and a quality-aware 1D Convolutional Neural Network (CNN) for adaptive assessment of variant-level quality metrics. These are fused via an adaptive gating mechanism for final resistance classification, yielding high predictive accuracy while providing interpretable outputs at both the biological and technical levels (Boutorh et al., 25 Dec 2025).

1. Permutation-Invariant Set Attention Transformer: Path-1

Path-1 processes each MTB isolate as an unordered set of genomic variant tokens $V = \{a_1, \ldots, a_T\}$ , with each token represented as ChromPos_Ref>Alt (e.g., “761139_C>A”). Tokens are embedded via a shared embedding matrix $E \in \mathbb R^{|\text{vocab}| \times d_{\rm model}}$ to yield $X = [x_1; \ldots; x_N]$ for $N$ padded length, with no positional encoding, enforcing strict permutation invariance: $f(\pi X) = f(X)$ for all $\pi \in S_N$ .

The model utilizes multi-head Set Attention Blocks (SAB) with $h$ heads. Each SAB computes:

$Q = XW^Q, \quad K = XW^K, \quad V = XW^V$

For unmasked attention:

$\mathrm{Attention}(Q,K,V) = \mathrm{softmax}\left(\frac{Q K^\top}{\sqrt{d_k}}\right) V$

With padding-masked attention, a mask $M$ is introduced, modifying attention to:

$\mathrm{Attention}(Q,K,V;M) = \mathrm{softmax}\left(\frac{QK^\top + M}{\sqrt{d_k}}\right) V$

SABs are stacked with layer-norm and residual connections (no causal masking). After $L$ SABs, outputs are globally pooled:

$z_{\rm SAB} = \rho(H) = \frac{1}{|\{ i \mid i \text{ valid} \}|} \sum_{i=1}^N H_{i,*}$

This pathway enables sensitive modeling of multilocus epistatic architectures by eschewing input order, a crucial feature when variant call sets exhibit arbitrary ordering.

2. Quality-Aware 1D-CNN: Path-2

Path-2 processes variant-level quality features per isolate. Inputs $F \in \mathbb R^{N \times d}$ , with $d = 8$ channels comprising GT, DP, DPF, COV_REF, COV_ALT, FRS, GT_CONF, and GT_CONF_PERCENTILE. Missing values are imputed and features min–max scaled.

Each convolutional block applies:

$h_j^{(\ell)} = \phi \left( \sum_{c=1}^d \sum_{t=1}^k w^{(\ell)}_{c, t} F_{j + t - 1, c} + b^{(\ell)} \right)$

Activation functions use ReLU (or optionally GELU), kernel size $k = 3$ , dropout $\approx 0.11$ , and L2 regularization. The best configuration (Model A) uses 3 convolutional layers followed by flattening or pooling to generate $z_{\rm CNN} \in \mathbb R^{d_{\rm cnn}}$ .

This pathway affords adaptive modeling of sequencing evidence, tracking technical variables that influence confidence in each variant call.

3. Fusion and Classification Architecture

VAMP-Net integrates both pathways using a gated amplification module, selected as the optimal fusion strategy. The gate $g = \sigma(z_{\rm CNN}) \in [0,1]^{d_{\rm model}}$ modulates SAB outputs:

$z_{\rm fused} = (1 + g) \odot z_{\rm SAB}$

Alternatives such as suppression and bipolar adaptive gating were evaluated, but amplification yielded superior accuracy.

The fused vector $z_{\rm fused}$ passes through fully-connected layers with dropout ( $\approx 0.11$ ) and ReLU, producing a final logit $s$ with output probability $\hat y = \sigma(s)$ . Classification uses weighted binary cross-entropy to address class imbalance, with $L_2$ regularization:

$\mathcal L = -\sum_i w_i \big[ y_i \log \hat y_i + (1 - y_i)\log(1 - \hat y_i) \big] + \lambda \| \theta \|^2$

4. Interpretability Mechanisms

VAMP-Net provides dual interpretability layers:

Attention Weight Analysis (Path-1): Extracts self-attention matrices $A \in \mathbb R^{N \times N}$ , averages over heads and samples, ranks pairwise interactions, and defines epistatic networks. Variant hubs with high summed attention signal significant genetic interactions.
Integrated Gradients (Path-1): Computes per-variant saliency for token embedding $x_i$ given baseline $x'_i$ :

$\mathrm{IG}_i(x) = (x_i - x'_i) \int_{\alpha=0}^1 \frac{\partial F(x' + \alpha(x-x'))}{\partial x_i} d\alpha$

Averaging over the test set highlights critical resistance loci (notably rpoB).

Gradient-based Feature Importance and Ablation (Path-2): Saliency maps $\partial s / \partial F_{j,c}$ enable ranking and test-time ablation, assessing channel relevance by drop in AUC or accuracy.

This interpretability design enables auditable correlation between genotype, variant confidence, and the final resistance call, facilitating biological inference and technical quality control.

5. Empirical Performance and Robustness

Evaluation used CRyPTIC MTB isolates for Rifampicin (RIF) and Rifabutin (RFB). Model A with gated amplification fusion achieved:

Drug	Accuracy	Precision	Recall	F1	AUC
RIF	0.952	0.951	0.960	0.955	0.969
RFB	0.939	0.952	0.955	0.954	0.968

Baseline comparisons consisted of MLP and 1D-CNN on binary SNP matrix after $\chi^2$ selection, yielding AUC $\sim$ 0.85–0.87. VAMP-Net’s SAB pathway outperformed both by approximately 10 AUC points.

Additional ablation demonstrated minor improvements from padding-masked attention (AUC = 0.969, balanced accuracy $\approx$ 0.945) over unmasked SAB (AUC $\approx$ 0.967). Early-stopping and stable cross-validation splits confirmed reproducible gains, though no formal $p$ -values were reported.

A plausible implication is that fusion of biologically and technically grounded pathways produces generalizable, high-precision clinical resistance prediction not achievable with single-modal baselines.

6. Context, Significance, and Future Directions

VAMP-Net establishes a new paradigm for interpretable, actionable genomics in resistance prediction, combining state-of-the-art accuracy (AUC $\sim$ 97%) with comprehensive dual-layer auditability. The SAB pathway enables detection and visualization of epistatic genomic networks, while the quality-aware CNN contextually weighs input confidence on a per-drug basis.

This suggests a generalizable framework for genotype-to-phenotype models where variant call quality and nonlinear genetic interactions are critical. A plausible implication is that further extension of permutation-invariant attention and adaptive gating may benefit other clinical genomics domains. The architecture is designed with modularity to accommodate additional input modalities or alternative fusion mechanisms.

No indicators of controversy were reported regarding this approach in the referenced literature (Boutorh et al., 25 Dec 2025). The combination of permutation-invariant modeling, set attention, and interpretable feature attribution mechanisms positions VAMP-Net as a reference model for computational resistance prediction pipelines, especially where interpretability at both genetic and technical levels is essential for clinical adoption.

Markdown Report Issue Upgrade to Chat

References (1)

VAMP-Net: An Interpretable Multi-Path Framework of Genomic Permutation-Invariant Set Attention and Quality-Aware 1D-CNN for MTB Drug Resistance (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interpretable Variant-Aware Multi-Path Network (VAMP-Net).