PSM: Purifier & Smoothness Mapper for Robust MLLMs

Updated 24 January 2026

PSM is a plug-and-play architectural wrapper for multimodal large language models that enhances certified robustness against ℓ2 adversarial attacks using feature-space smoothing.
It integrates a lightweight diffusion-based Purifier and a residual Smoothness Mapper to denoise inputs and adjust feature representations without modifying the core model.
Empirical evaluations demonstrate that PSM increases cosine similarity scores, task accuracy, and reduces attack success rates, confirming its effectiveness as an adversarial defense mechanism.

The Purifier and Smoothness Mapper (PSM) is a plug-and-play architectural wrapper designed for multimodal LLMs (MLLMs) to provably enhance their robustness against $\ell_2$ -bounded adversarial attacks. Operating in the context of @@@@1@@@@ (FS), PSM achieves robustness by maximizing the Gaussian robustness score $\hat S$ , thereby tightening theoretical lower bounds on the preservation of feature-space information between clean and adversarial examples. Critically, PSM improves certified robustness without requiring retraining or modification of the core MLLM or its encoders (Xia et al., 22 Jan 2026).

1. Feature-space Smoothing and Certified Robustness

Feature-space Smoothing (FS) provides a theoretical framework for certifying how much an input perturbation can distort the encoded feature vectors in an MLLM. Given a normalized feature encoder $f_e:\mathbb R^d\to\mathbb R^D$ with $\|f_e(x)\|_2=1$ , FS defines a smoothed encoder: $\hat f_e(x) = \mathbb E_{\varepsilon\sim\mathcal N(0,I)}[f_e(x+\varepsilon)]$ This Gaussian smoothing at the feature level, rather than the prediction level, enables the derivation of a Feature Cosine Similarity Bound (FCSB). For input perturbations $\delta$ with $\|\delta\|_2 \leq \epsilon$ , the following holds: $\cos(\hat f_e(x+\delta), f_e(x)) \geq 2\Phi(\Phi^{-1}(\hat S(x)) - \epsilon) - 1$ where $\Phi$ is the standard normal CDF, and $\hat S(x)$ is the Gaussian robustness score: $\hat S(x) = \mathbb E_{\varepsilon} \left[\frac{1 + \cos(f_e(x+\varepsilon), f_e(x))}{2}\right]$ A higher $\hat S(x)$ yields a tighter lower bound on feature-space similarity, certifying robustness.

2. Architectural Design of the Purifier and Smoothness Mapper

The PSM module is constructed from two neural components operated around a frozen $f_e$ encoder:

Purifier ( $\mathcal P$ ): A lightweight, one-step diffusion network (e.g., guided-diffusion U-Net at $256\times256$ ) trained to denoise inputs contaminated by Gaussian noise with standard deviation $\sigma$ .
Smoothness Mapper ( $\mathcal M$ ): A residual network acting on feature tensors $(L, D)$ $(L, D)$ (with $L$ $L$ tokens and $D$ $D$ channels, $k=3$ $k = 3$ blocks). Each block implements:
- Noise-aware LayerNorm and FiLM( $\sigma$ ); FiLM conditions on per-channel scale and shift.
- Parallel processing via (1) multi-head self-attention (block 1 only), (2) depth-wise convolution, and (3) channel-wise MLP.
- Noise-adaptive residuals, modulated by learnable functions of $\sigma$ .

The architecture can be depicted as:

Input	Module	Output
$x$	Purifier ( $\mathcal P$ )	$x_\mathrm{pur}$
$x_\mathrm{pur}$	Encoder ( $f_e$ )	$z$
$z$	Mapper ( $\mathcal M$ )	$z_\mathrm{sm}$

Inference involves Monte Carlo smoothing: adding Gaussian noise, passing through $\mathcal P$ , encoding by $f_e$ , mapping by $\mathcal M$ , and averaging over samples.

3. Training Objectives and Loss Functions

Both Purifier and Mapper are optimized for enhanced feature robustness and statistical consistency:

Purifier losses:
- Reconstruction (MSE):
$\ell_{\rm mse} = \mathbb E_{x,\varepsilon} \| x - \mathcal P(x+\varepsilon) \|_2^2$ - Purifier robustness:

$\ell^{\mathcal P}_{\rm rb} = \mathbb E_{x,\varepsilon} [\cos(f_e(\mathcal P(x+\varepsilon)), f_e(x))]$ - Total:

$\mathcal L_{\mathcal P} = \ell_{\rm diff} + \lambda_1 \ell^{\mathcal P}_{\rm rb} + \lambda_2 \ell_{\rm mse}$
Mapper losses:
- Mapper robustness:
$\ell^{\mathcal M}_{\rm rb} = \mathbb E_{x,\varepsilon} [\cos(z_\mathrm{sm}, f_e(x))]$

where $z_\mathrm{sm} = f_e(\mathcal P(x+\varepsilon)) + \mathcal M(f_e(\mathcal P(x+\varepsilon)), \sigma)$ . - Statistical matching:

$\ell_{\rm stats} = \mathbb E \sum_{d=1}^D [(\mu_{z_{\rm sm}^{(d)}}-\mu_z^{(d)})^2 + (\sigma_{z_{\rm sm}^{(d)}}-\sigma_z^{(d)})^2]$ - Identity (for $\sigma=0$ ):

$\ell_{\rm id} = \mathbb E\| \mathcal M(z,0) \|_2^2$ - Total:

$\mathcal L_{\mathcal M} = \ell^{\mathcal M}_{\rm rb} + \lambda_3 \ell_{\rm stats} + \lambda_4 \ell_{\rm id}$

Typical hyperparameters: $\sigma=0.25$ , $n_\text{smooth}=4$ for inference, $k=3$ mapper blocks, and loss weights $\lambda_1 = \lambda_2 = \lambda_4 = 0.25$ , $\lambda_3 = 100$ .

4. Implementation and Deployment

The PSM is inserted as a wrapper without modifying the main MLLM encoder. At inference:

def FS_PSM_smooth(x, f_e, P, M, n_smooth=4, sigma=0.25):
    z_accum = 0
    for i in range(n_smooth):
        eps = sample_normal(0, sigma, x.shape)
        x_noisy = x + eps
        x_pur = P(x_noisy)
        z = f_e(x_pur)
        z_mapped = z + M(z, sigma)
        z_accum += z_mapped
    return z_accum / n_smooth

No retraining or modification of the MLLM weights is required during deployment. The result is a smoothed feature

z_\text{smooth}

with an empirically and theoretically higher

\hat S

and thus a tighter FCSB.

5. Empirical Evaluation and Performance

Extensive experiments were conducted on LLaVA-1.5-7B, OpenFlamingo-9B, and CLIP-L14, using image captioning, image classification, and VQA tasks. White-box attacks included AttackVLM, M-Attack, and FOA, with $\|\epsilon\|_\infty=16/255$ (and $32/255$ for stress tests).

Key metrics:

FCS: Cosine similarity between clean and adversarial feature vectors
ACC: Task accuracy
ASR: Attack success rate

Sample results (FOA attack):

Setting	FCS	ACC	ASR
Org. (LLaVA-7B, caption)	0.388	1%	94%
FS only	0.652	87%	1%
FS+PSM (vanilla)	--	87%	1%
FS+PSM (FARE encoder)	0.789	39%	20%

Ablation studies (CLIP-B16, classification, FOA) further show that both Purifier and Mapper are essential for maximal gains, increasing certified radius $\mathcal R$ and accuracy while reducing ASR.

6. Comparative Analysis and Significance

FS alone significantly reduces ASR (from ≈90% to 1–2%) across all tested MLLMs and tasks. When combined with PSM, further increases in FCS and ACC are observed. Notably, the plug-and-play approach allows stacking on top of encoders already adversarially trained (e.g., FARE or TeCoA), yielding consistent robustness improvements without retraining those encoders. All modifications are strictly external to the MLLM: neither the core encoder nor LLM weights change.

A plausible implication is that PSM establishes a practical route for certifiable robustness in large models, where architectural or full-model retraining is infeasible. By maximizing the Gaussian robustness score, PSM offers both theoretical guarantees and empirical improvements, confirming the effectiveness of feature-space smoothing as a paradigm for adversarial defense (Xia et al., 22 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Purifier and Smoothness Mapper (PSM).