Papers
Topics
Authors
Recent
Search
2000 character limit reached

PSM: Purifier & Smoothness Mapper for Robust MLLMs

Updated 24 January 2026
  • PSM is a plug-and-play architectural wrapper for multimodal large language models that enhances certified robustness against ℓ2 adversarial attacks using feature-space smoothing.
  • It integrates a lightweight diffusion-based Purifier and a residual Smoothness Mapper to denoise inputs and adjust feature representations without modifying the core model.
  • Empirical evaluations demonstrate that PSM increases cosine similarity scores, task accuracy, and reduces attack success rates, confirming its effectiveness as an adversarial defense mechanism.

The Purifier and Smoothness Mapper (PSM) is a plug-and-play architectural wrapper designed for multimodal LLMs (MLLMs) to provably enhance their robustness against 2\ell_2-bounded adversarial attacks. Operating in the context of @@@@1@@@@ (FS), PSM achieves robustness by maximizing the Gaussian robustness score S^\hat S, thereby tightening theoretical lower bounds on the preservation of feature-space information between clean and adversarial examples. Critically, PSM improves certified robustness without requiring retraining or modification of the core MLLM or its encoders (Xia et al., 22 Jan 2026).

1. Feature-space Smoothing and Certified Robustness

Feature-space Smoothing (FS) provides a theoretical framework for certifying how much an input perturbation can distort the encoded feature vectors in an MLLM. Given a normalized feature encoder fe:RdRDf_e:\mathbb R^d\to\mathbb R^D with fe(x)2=1\|f_e(x)\|_2=1, FS defines a smoothed encoder: f^e(x)=EεN(0,I)[fe(x+ε)]\hat f_e(x) = \mathbb E_{\varepsilon\sim\mathcal N(0,I)}[f_e(x+\varepsilon)] This Gaussian smoothing at the feature level, rather than the prediction level, enables the derivation of a Feature Cosine Similarity Bound (FCSB). For input perturbations δ\delta with δ2ϵ\|\delta\|_2 \leq \epsilon, the following holds: cos(f^e(x+δ),fe(x))2Φ(Φ1(S^(x))ϵ)1\cos(\hat f_e(x+\delta), f_e(x)) \geq 2\Phi(\Phi^{-1}(\hat S(x)) - \epsilon) - 1 where Φ\Phi is the standard normal CDF, and S^(x)\hat S(x) is the Gaussian robustness score: S^(x)=Eε[1+cos(fe(x+ε),fe(x))2]\hat S(x) = \mathbb E_{\varepsilon} \left[\frac{1 + \cos(f_e(x+\varepsilon), f_e(x))}{2}\right] A higher S^(x)\hat S(x) yields a tighter lower bound on feature-space similarity, certifying robustness.

2. Architectural Design of the Purifier and Smoothness Mapper

The PSM module is constructed from two neural components operated around a frozen fef_e encoder:

  • Purifier (P\mathcal P): A lightweight, one-step diffusion network (e.g., guided-diffusion U-Net at 256×256256\times256) trained to denoise inputs contaminated by Gaussian noise with standard deviation σ\sigma.
  • Smoothness Mapper (M\mathcal M): A residual network acting on feature tensors (L,D)(L, D) (with LL tokens and DD channels, k=3k=3 blocks). Each block implements:
    • Noise-aware LayerNorm and FiLM(σ\sigma); FiLM conditions on per-channel scale and shift.
    • Parallel processing via (1) multi-head self-attention (block 1 only), (2) depth-wise convolution, and (3) channel-wise MLP.
    • Noise-adaptive residuals, modulated by learnable functions of σ\sigma.

The architecture can be depicted as:

Input Module Output
xx Purifier (P\mathcal P) xpurx_\mathrm{pur}
xpurx_\mathrm{pur} Encoder (fef_e) zz
zz Mapper (M\mathcal M) zsmz_\mathrm{sm}

Inference involves Monte Carlo smoothing: adding Gaussian noise, passing through P\mathcal P, encoding by fef_e, mapping by M\mathcal M, and averaging over samples.

3. Training Objectives and Loss Functions

Both Purifier and Mapper are optimized for enhanced feature robustness and statistical consistency:

  • Purifier losses:

    • Reconstruction (MSE):

    mse=Ex,εxP(x+ε)22\ell_{\rm mse} = \mathbb E_{x,\varepsilon} \| x - \mathcal P(x+\varepsilon) \|_2^2 - Purifier robustness:

    rbP=Ex,ε[cos(fe(P(x+ε)),fe(x))]\ell^{\mathcal P}_{\rm rb} = \mathbb E_{x,\varepsilon} [\cos(f_e(\mathcal P(x+\varepsilon)), f_e(x))] - Total:

    LP=diff+λ1rbP+λ2mse\mathcal L_{\mathcal P} = \ell_{\rm diff} + \lambda_1 \ell^{\mathcal P}_{\rm rb} + \lambda_2 \ell_{\rm mse}

  • Mapper losses:

    • Mapper robustness:

    rbM=Ex,ε[cos(zsm,fe(x))]\ell^{\mathcal M}_{\rm rb} = \mathbb E_{x,\varepsilon} [\cos(z_\mathrm{sm}, f_e(x))]

    where zsm=fe(P(x+ε))+M(fe(P(x+ε)),σ)z_\mathrm{sm} = f_e(\mathcal P(x+\varepsilon)) + \mathcal M(f_e(\mathcal P(x+\varepsilon)), \sigma). - Statistical matching:

    stats=Ed=1D[(μzsm(d)μz(d))2+(σzsm(d)σz(d))2]\ell_{\rm stats} = \mathbb E \sum_{d=1}^D [(\mu_{z_{\rm sm}^{(d)}}-\mu_z^{(d)})^2 + (\sigma_{z_{\rm sm}^{(d)}}-\sigma_z^{(d)})^2] - Identity (for σ=0\sigma=0):

    id=EM(z,0)22\ell_{\rm id} = \mathbb E\| \mathcal M(z,0) \|_2^2 - Total:

    LM=rbM+λ3stats+λ4id\mathcal L_{\mathcal M} = \ell^{\mathcal M}_{\rm rb} + \lambda_3 \ell_{\rm stats} + \lambda_4 \ell_{\rm id}

Typical hyperparameters: σ=0.25\sigma=0.25, nsmooth=4n_\text{smooth}=4 for inference, k=3k=3 mapper blocks, and loss weights λ1=λ2=λ4=0.25\lambda_1 = \lambda_2 = \lambda_4 = 0.25, λ3=100\lambda_3 = 100.

4. Implementation and Deployment

The PSM is inserted as a wrapper without modifying the main MLLM encoder. At inference:

1
2
3
4
5
6
7
8
9
10
def FS_PSM_smooth(x, f_e, P, M, n_smooth=4, sigma=0.25):
    z_accum = 0
    for i in range(n_smooth):
        eps = sample_normal(0, sigma, x.shape)
        x_noisy = x + eps
        x_pur = P(x_noisy)
        z = f_e(x_pur)
        z_mapped = z + M(z, sigma)
        z_accum += z_mapped
    return z_accum / n_smooth
No retraining or modification of the MLLM weights is required during deployment. The result is a smoothed feature zsmoothz_\text{smooth} with an empirically and theoretically higher S^\hat S and thus a tighter FCSB.

5. Empirical Evaluation and Performance

Extensive experiments were conducted on LLaVA-1.5-7B, OpenFlamingo-9B, and CLIP-L14, using image captioning, image classification, and VQA tasks. White-box attacks included AttackVLM, M-Attack, and FOA, with ϵ=16/255\|\epsilon\|_\infty=16/255 (and $32/255$ for stress tests).

Key metrics:

Sample results (FOA attack):

Setting FCS ACC ASR
Org. (LLaVA-7B, caption) 0.388 1% 94%
FS only 0.652 87% 1%
FS+PSM (vanilla) -- 87% 1%
FS+PSM (FARE encoder) 0.789 39% 20%

Ablation studies (CLIP-B16, classification, FOA) further show that both Purifier and Mapper are essential for maximal gains, increasing certified radius R\mathcal R and accuracy while reducing ASR.

6. Comparative Analysis and Significance

FS alone significantly reduces ASR (from ≈90% to 1–2%) across all tested MLLMs and tasks. When combined with PSM, further increases in FCS and ACC are observed. Notably, the plug-and-play approach allows stacking on top of encoders already adversarially trained (e.g., FARE or TeCoA), yielding consistent robustness improvements without retraining those encoders. All modifications are strictly external to the MLLM: neither the core encoder nor LLM weights change.

A plausible implication is that PSM establishes a practical route for certifiable robustness in large models, where architectural or full-model retraining is infeasible. By maximizing the Gaussian robustness score, PSM offers both theoretical guarantees and empirical improvements, confirming the effectiveness of feature-space smoothing as a paradigm for adversarial defense (Xia et al., 22 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Purifier and Smoothness Mapper (PSM).