MMD Guidance for Distribution Alignment

Updated 20 January 2026

MMD Guidance is a training-free, distribution-matching method that steers generative models by injecting empirical MMD gradients into the sampling process.
It integrates gradient corrections into reverse diffusion, using latent-space computations to achieve robust adaptation and improved sample fidelity.
The approach supports various kernel choices and prompt-aware extensions, offering efficient, reference-driven distribution alignment in generative modeling.

Maximum Mean Discrepancy (MMD) Guidance is a training-free, distribution-matching methodology that steers generative models to align their outputs with a small reference dataset. It operates by injecting gradients of the empirical Maximum Mean Discrepancy—a kernel-based non-parametric statistical distance—into the inference procedure, most notably within reverse diffusion samplers, without updating model weights. MMD Guidance is suitable for unconditional, conditional, and prompt-aware sampling, achieves robust adaptation from limited reference data, and is computationally efficient in modern latent diffusion architectures (Sani et al., 13 Jan 2026).

1. Definition of the MMD Objective and Its Gradient

Let $\{x_i\}_{i=1}^B$ denote a batch of generated samples and $\{y_j\}_{j=1}^{N_r}$ a fixed reference set. Given a positive semi-definite kernel $k : \mathcal{X} \times \mathcal{X} \to \mathbb{R}$ (e.g. Gaussian RBF), the empirical squared Maximum Mean Discrepancy is

$\widehat{\mathrm{MMD}^2}(\widehat{P},\widehat{Q}) = \frac{1}{B^2} \sum_{i,i'=1}^B k(x_i,x_{i'}) + \frac{1}{N_r^2} \sum_{j,j'=1}^{N_r} k(y_j,y_{j'}) - \frac{2}{B N_r} \sum_{i=1}^B \sum_{j=1}^{N_r} k(x_i,y_j)$

The gradient with respect to a generated sample $x_i$ is given by

$\nabla_{x_i} \widehat{\mathrm{MMD}^2} = \frac{2}{B^2} \sum_{i'=1}^B \nabla_{x_i} k(x_i, x_{i'}) - \frac{2}{B N_r} \sum_{j=1}^{N_r} \nabla_{x_i} k(x_i, y_j)$

For a Gaussian RBF kernel $k(u,v) = \exp(-\|u-v\|^2 / (2 \sigma^2))$ , the gradient specializes to

$\nabla_{x_i} \widehat{\mathrm{MMD}^2} = - \frac{2}{\sigma^2 B^2} \sum_{i'=1}^B k(x_i, x_{i'})(x_i - x_{i'}) + \frac{2}{\sigma^2 B N_r} \sum_{j=1}^{N_r} k(x_i, y_j)(x_i - y_j)$

2. Integration of MMD Guidance into Diffusion Sampling

During the reverse diffusion process (e.g., DDPM/DDIM), each denoising step is modified by a small MMD gradient correction. At diffusion timestep $t$ , the update for each sample $x_t^{(i)}$ is

$x_{t-1}^{(i)} = \mathrm{Sampler}(x_t^{(i)}, t, \epsilon_\theta) - \lambda_t \nabla_{x_t^{(i)}} \widehat{\mathrm{MMD}^2}(\{x_t^{(i)}\}, \{y_j\})$

where $\lambda_t$ is a guidance strength (typically constant or slowly decaying across timesteps). This can be performed directly in pixel space or—more efficiently—in the latent space of a pretrained Variational Autoencoder (VAE), as used in Latent Diffusion Models (LDMs):

Encode reference samples in the latent space: $z_j^{(r)} = \mathcal{E}(x_j^{(r)})$ .
Sample $z_T^{(i)} \sim \mathcal{N}(0, I)$ .
For $t = T$ to $1$, perform standard denoising and apply the MMD gradient correction.
Decode final latent to obtain the generated sample.

3. Kernel Choices and Prompt-Conditioned Extensions

The reference-alignment objective can be tailored via kernel selection:

Gaussian RBF: $k(u,v) = \exp(-\|u - v\|^2 / (2\sigma^2))$ , bandwidth $\sigma$ chosen by a grid (e.g. $\{1.25, 1.5, 2.0\}$ ) or proportional to latent scale.
Polynomial kernel: $(c + \langle u, v \rangle)^d$ , $d \in \{2, 3, 4\}$ .
Product kernel (for prompt-aware conditional generation): $k([p,z],[p',z']) = k_p(p,p') \, k_z(z,z')$ , where $k_p$ measures prompt similarity (e.g. cosine or RBF in CLIP embedding space) and $k_z$ is the visual kernel.
Guidance strength $\lambda_t$ : Default values are $\alpha \approx 10^{-4}$ in latent-space guidance, $\alpha \approx 10^{-2}$ in pixel-space.

Prompt-aware adaptation weights the cross-term gradient by prompt similarity, ensuring samples are steered towards references matching the semantic intent.

4. Computational Efficiency and Latent-Space Implementation

Operating in the latent space $\mathcal{Z}$ of a pretrained LDM confers multiple advantages:

Reduced dimensionality accelerates kernel computation and gradient evaluation.
Semantic compression yields better MMD estimation on structured features.
Memory overhead is minimal, since reference encodings are reused.
Runtime overhead for batch sizes up to 500 samples and 50 timesteps is only $10$– $15\%$ on consumer-class GPUs.

Bandwidth selection and guidance strength can be optimized via a grid search on held-out reference data, targeting metrics such as Fréchet Distance (FD) and Kernel Distance (KD).

5. Experimental Evaluation

Experimental benchmarks demonstrate that MMD Guidance achieves consistent and substantial improvements in distributional alignment and sample fidelity:

Synthetic GMMs: recovers desired mixture modes and proportions with as few as $50$–$200$ references, achieving the lowest FD and KD against classifier-guidance (CG) and classifier-free guidance (CFG) baselines.
Mode-proportion correction: can preferentially reproduce new mode proportions when the reference set is Dirichlet-reweighted.
Real-world image adaptation (FFHQ, CelebA-HQ): aligns generated samples to user-defined characteristics using $500$ references, reducing FD from $1221$ (no guidance) to $693$ (MMD guidance).
Prompt-aware stylized image generation: on Stable Diffusion XL and PixArt, guidance reduces FD, KD, Relative Reconstruction Kernel Error (RRKE), and increases coverage; adaptation occurs without network retraining.
Reference set sensitivity: performance improves quickly with $50$–$100$ references and plateaus, with robustness to kernel and guidance strength variations.

6. Advantages, Practical Considerations, and Significance

MMD Guidance is fully training-free and provides direct distribution-aware alignment. It leverages the low-variance, consistent estimation properties of MMD, particularly under small reference sets, and its gradients are efficiently computable via modern hardware. MMD gradients can be injected into any generative process for which a differentiable kernel can be evaluated over its latent or output space, without requiring model finetuning or additional training steps.

The framework generalizes to various adaptation scenarios:

Prompt-aware adaptation in conditional generative models, steering samples jointly with reference prompts.
Distribution correction or style transfer using limited reference data.
Domain adaptation in generative modeling pipelines.

The principal advantage over classifier-based guidance methods is the direct minimization of RKHS distance to the true reference distribution, rather than a surrogate (e.g., classifier likelihood). This yields superior coverage, mode fidelity, and robustness to overfitting (Sani et al., 13 Jan 2026).

7. Summary Table: MMD Guidance Properties

Property	Description	Reference Scenario
Training-free	No update to model weights; operates at inference	Domain adaptation, style
Differentiable MMD	Empirical estimate and gradient in pixel or latent space	Any kernel metric
Kernel flexibility	Gaussian RBF, polynomial, product kernels for conditional	Prompt-aware LDM
Reference-efficient	Robust with $O(100)$ samples; minimal overfitting	Few-shot user adaptation
Low computational overhead	Latent-space implementation adds $10$– $15\%$ runtime cost	Stable Diffusion XL
Direct distribution matching	Aligns to reference data's empirical distribution in RKHS	Synthetic/real domains
Sample fidelity	Preserves generative quality while achieving alignment	FFHQ, CelebA experiments

MMD Guidance formalizes distributional adaptation as direct kernel-mean matching, circumventing limitations of classifier- or discriminator-based guidance frameworks, and is widely applicable to various generation architectures.

Markdown Report Issue Upgrade to Chat

References (1)

Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MMD Guidance.