PSM: Purifier & Smoothness Mapper for Robust MLLMs
- PSM is a plug-and-play architectural wrapper for multimodal large language models that enhances certified robustness against ℓ2 adversarial attacks using feature-space smoothing.
- It integrates a lightweight diffusion-based Purifier and a residual Smoothness Mapper to denoise inputs and adjust feature representations without modifying the core model.
- Empirical evaluations demonstrate that PSM increases cosine similarity scores, task accuracy, and reduces attack success rates, confirming its effectiveness as an adversarial defense mechanism.
The Purifier and Smoothness Mapper (PSM) is a plug-and-play architectural wrapper designed for multimodal LLMs (MLLMs) to provably enhance their robustness against -bounded adversarial attacks. Operating in the context of @@@@1@@@@ (FS), PSM achieves robustness by maximizing the Gaussian robustness score , thereby tightening theoretical lower bounds on the preservation of feature-space information between clean and adversarial examples. Critically, PSM improves certified robustness without requiring retraining or modification of the core MLLM or its encoders (Xia et al., 22 Jan 2026).
1. Feature-space Smoothing and Certified Robustness
Feature-space Smoothing (FS) provides a theoretical framework for certifying how much an input perturbation can distort the encoded feature vectors in an MLLM. Given a normalized feature encoder with , FS defines a smoothed encoder: This Gaussian smoothing at the feature level, rather than the prediction level, enables the derivation of a Feature Cosine Similarity Bound (FCSB). For input perturbations with , the following holds: where is the standard normal CDF, and is the Gaussian robustness score: A higher yields a tighter lower bound on feature-space similarity, certifying robustness.
2. Architectural Design of the Purifier and Smoothness Mapper
The PSM module is constructed from two neural components operated around a frozen encoder:
- Purifier (): A lightweight, one-step diffusion network (e.g., guided-diffusion U-Net at ) trained to denoise inputs contaminated by Gaussian noise with standard deviation .
- Smoothness Mapper (): A residual network acting on feature tensors (with tokens and channels, blocks). Each block implements:
- Noise-aware LayerNorm and FiLM(); FiLM conditions on per-channel scale and shift.
- Parallel processing via (1) multi-head self-attention (block 1 only), (2) depth-wise convolution, and (3) channel-wise MLP.
- Noise-adaptive residuals, modulated by learnable functions of .
The architecture can be depicted as:
| Input | Module | Output |
|---|---|---|
| Purifier () | ||
| Encoder () | ||
| Mapper () |
Inference involves Monte Carlo smoothing: adding Gaussian noise, passing through , encoding by , mapping by , and averaging over samples.
3. Training Objectives and Loss Functions
Both Purifier and Mapper are optimized for enhanced feature robustness and statistical consistency:
- Purifier losses:
- Reconstruction (MSE):
- Purifier robustness:
- Total:
- Mapper losses:
- Mapper robustness:
where . - Statistical matching:
- Identity (for ):
- Total:
Typical hyperparameters: , for inference, mapper blocks, and loss weights , .
4. Implementation and Deployment
The PSM is inserted as a wrapper without modifying the main MLLM encoder. At inference:
1 2 3 4 5 6 7 8 9 10 |
def FS_PSM_smooth(x, f_e, P, M, n_smooth=4, sigma=0.25): z_accum = 0 for i in range(n_smooth): eps = sample_normal(0, sigma, x.shape) x_noisy = x + eps x_pur = P(x_noisy) z = f_e(x_pur) z_mapped = z + M(z, sigma) z_accum += z_mapped return z_accum / n_smooth |
5. Empirical Evaluation and Performance
Extensive experiments were conducted on LLaVA-1.5-7B, OpenFlamingo-9B, and CLIP-L14, using image captioning, image classification, and VQA tasks. White-box attacks included AttackVLM, M-Attack, and FOA, with (and $32/255$ for stress tests).
Key metrics:
- FCS: Cosine similarity between clean and adversarial feature vectors
- ACC: Task accuracy
- ASR: Attack success rate
Sample results (FOA attack):
| Setting | FCS | ACC | ASR |
|---|---|---|---|
| Org. (LLaVA-7B, caption) | 0.388 | 1% | 94% |
| FS only | 0.652 | 87% | 1% |
| FS+PSM (vanilla) | -- | 87% | 1% |
| FS+PSM (FARE encoder) | 0.789 | 39% | 20% |
Ablation studies (CLIP-B16, classification, FOA) further show that both Purifier and Mapper are essential for maximal gains, increasing certified radius and accuracy while reducing ASR.
6. Comparative Analysis and Significance
FS alone significantly reduces ASR (from ≈90% to 1–2%) across all tested MLLMs and tasks. When combined with PSM, further increases in FCS and ACC are observed. Notably, the plug-and-play approach allows stacking on top of encoders already adversarially trained (e.g., FARE or TeCoA), yielding consistent robustness improvements without retraining those encoders. All modifications are strictly external to the MLLM: neither the core encoder nor LLM weights change.
A plausible implication is that PSM establishes a practical route for certifiable robustness in large models, where architectural or full-model retraining is infeasible. By maximizing the Gaussian robustness score, PSM offers both theoretical guarantees and empirical improvements, confirming the effectiveness of feature-space smoothing as a paradigm for adversarial defense (Xia et al., 22 Jan 2026).