Papers
Topics
Authors
Recent
Search
2000 character limit reached

MMD Guidance for Distribution Alignment

Updated 20 January 2026
  • MMD Guidance is a training-free, distribution-matching method that steers generative models by injecting empirical MMD gradients into the sampling process.
  • It integrates gradient corrections into reverse diffusion, using latent-space computations to achieve robust adaptation and improved sample fidelity.
  • The approach supports various kernel choices and prompt-aware extensions, offering efficient, reference-driven distribution alignment in generative modeling.

Maximum Mean Discrepancy (MMD) Guidance is a training-free, distribution-matching methodology that steers generative models to align their outputs with a small reference dataset. It operates by injecting gradients of the empirical Maximum Mean Discrepancy—a kernel-based non-parametric statistical distance—into the inference procedure, most notably within reverse diffusion samplers, without updating model weights. MMD Guidance is suitable for unconditional, conditional, and prompt-aware sampling, achieves robust adaptation from limited reference data, and is computationally efficient in modern latent diffusion architectures (Sani et al., 13 Jan 2026).

1. Definition of the MMD Objective and Its Gradient

Let {xi}i=1B\{x_i\}_{i=1}^B denote a batch of generated samples and {yj}j=1Nr\{y_j\}_{j=1}^{N_r} a fixed reference set. Given a positive semi-definite kernel k:X×XRk : \mathcal{X} \times \mathcal{X} \to \mathbb{R} (e.g. Gaussian RBF), the empirical squared Maximum Mean Discrepancy is

MMD2^(P^,Q^)=1B2i,i=1Bk(xi,xi)+1Nr2j,j=1Nrk(yj,yj)2BNri=1Bj=1Nrk(xi,yj)\widehat{\mathrm{MMD}^2}(\widehat{P},\widehat{Q}) = \frac{1}{B^2} \sum_{i,i'=1}^B k(x_i,x_{i'}) + \frac{1}{N_r^2} \sum_{j,j'=1}^{N_r} k(y_j,y_{j'}) - \frac{2}{B N_r} \sum_{i=1}^B \sum_{j=1}^{N_r} k(x_i,y_j)

The gradient with respect to a generated sample xix_i is given by

xiMMD2^=2B2i=1Bxik(xi,xi)2BNrj=1Nrxik(xi,yj)\nabla_{x_i} \widehat{\mathrm{MMD}^2} = \frac{2}{B^2} \sum_{i'=1}^B \nabla_{x_i} k(x_i, x_{i'}) - \frac{2}{B N_r} \sum_{j=1}^{N_r} \nabla_{x_i} k(x_i, y_j)

For a Gaussian RBF kernel k(u,v)=exp(uv2/(2σ2))k(u,v) = \exp(-\|u-v\|^2 / (2 \sigma^2)), the gradient specializes to

xiMMD2^=2σ2B2i=1Bk(xi,xi)(xixi)+2σ2BNrj=1Nrk(xi,yj)(xiyj)\nabla_{x_i} \widehat{\mathrm{MMD}^2} = - \frac{2}{\sigma^2 B^2} \sum_{i'=1}^B k(x_i, x_{i'})(x_i - x_{i'}) + \frac{2}{\sigma^2 B N_r} \sum_{j=1}^{N_r} k(x_i, y_j)(x_i - y_j)

2. Integration of MMD Guidance into Diffusion Sampling

During the reverse diffusion process (e.g., DDPM/DDIM), each denoising step is modified by a small MMD gradient correction. At diffusion timestep tt, the update for each sample xt(i)x_t^{(i)} is

xt1(i)=Sampler(xt(i),t,ϵθ)λtxt(i)MMD2^({xt(i)},{yj})x_{t-1}^{(i)} = \mathrm{Sampler}(x_t^{(i)}, t, \epsilon_\theta) - \lambda_t \nabla_{x_t^{(i)}} \widehat{\mathrm{MMD}^2}(\{x_t^{(i)}\}, \{y_j\})

where λt\lambda_t is a guidance strength (typically constant or slowly decaying across timesteps). This can be performed directly in pixel space or—more efficiently—in the latent space of a pretrained Variational Autoencoder (VAE), as used in Latent Diffusion Models (LDMs):

  1. Encode reference samples in the latent space: zj(r)=E(xj(r))z_j^{(r)} = \mathcal{E}(x_j^{(r)}).
  2. Sample zT(i)N(0,I)z_T^{(i)} \sim \mathcal{N}(0, I).
  3. For t=Tt = T to $1$, perform standard denoising and apply the MMD gradient correction.
  4. Decode final latent to obtain the generated sample.

3. Kernel Choices and Prompt-Conditioned Extensions

The reference-alignment objective can be tailored via kernel selection:

  • Gaussian RBF: k(u,v)=exp(uv2/(2σ2))k(u,v) = \exp(-\|u - v\|^2 / (2\sigma^2)), bandwidth σ\sigma chosen by a grid (e.g. {1.25,1.5,2.0}\{1.25, 1.5, 2.0\}) or proportional to latent scale.
  • Polynomial kernel: (c+u,v)d(c + \langle u, v \rangle)^d, d{2,3,4}d \in \{2, 3, 4\}.
  • Product kernel (for prompt-aware conditional generation): k([p,z],[p,z])=kp(p,p)kz(z,z)k([p,z],[p',z']) = k_p(p,p') \, k_z(z,z'), where kpk_p measures prompt similarity (e.g. cosine or RBF in CLIP embedding space) and kzk_z is the visual kernel.
  • Guidance strength λt\lambda_t: Default values are α104\alpha \approx 10^{-4} in latent-space guidance, α102\alpha \approx 10^{-2} in pixel-space.

Prompt-aware adaptation weights the cross-term gradient by prompt similarity, ensuring samples are steered towards references matching the semantic intent.

4. Computational Efficiency and Latent-Space Implementation

Operating in the latent space Z\mathcal{Z} of a pretrained LDM confers multiple advantages:

  • Reduced dimensionality accelerates kernel computation and gradient evaluation.
  • Semantic compression yields better MMD estimation on structured features.
  • Memory overhead is minimal, since reference encodings are reused.
  • Runtime overhead for batch sizes up to 500 samples and 50 timesteps is only $10$–15%15\% on consumer-class GPUs.

Bandwidth selection and guidance strength can be optimized via a grid search on held-out reference data, targeting metrics such as Fréchet Distance (FD) and Kernel Distance (KD).

5. Experimental Evaluation

Experimental benchmarks demonstrate that MMD Guidance achieves consistent and substantial improvements in distributional alignment and sample fidelity:

  • Synthetic GMMs: recovers desired mixture modes and proportions with as few as $50$–$200$ references, achieving the lowest FD and KD against classifier-guidance (CG) and classifier-free guidance (CFG) baselines.
  • Mode-proportion correction: can preferentially reproduce new mode proportions when the reference set is Dirichlet-reweighted.
  • Real-world image adaptation (FFHQ, CelebA-HQ): aligns generated samples to user-defined characteristics using $500$ references, reducing FD from $1221$ (no guidance) to $693$ (MMD guidance).
  • Prompt-aware stylized image generation: on Stable Diffusion XL and PixArt, guidance reduces FD, KD, Relative Reconstruction Kernel Error (RRKE), and increases coverage; adaptation occurs without network retraining.
  • Reference set sensitivity: performance improves quickly with $50$–$100$ references and plateaus, with robustness to kernel and guidance strength variations.

6. Advantages, Practical Considerations, and Significance

MMD Guidance is fully training-free and provides direct distribution-aware alignment. It leverages the low-variance, consistent estimation properties of MMD, particularly under small reference sets, and its gradients are efficiently computable via modern hardware. MMD gradients can be injected into any generative process for which a differentiable kernel can be evaluated over its latent or output space, without requiring model finetuning or additional training steps.

The framework generalizes to various adaptation scenarios:

  • Prompt-aware adaptation in conditional generative models, steering samples jointly with reference prompts.
  • Distribution correction or style transfer using limited reference data.
  • Domain adaptation in generative modeling pipelines.

The principal advantage over classifier-based guidance methods is the direct minimization of RKHS distance to the true reference distribution, rather than a surrogate (e.g., classifier likelihood). This yields superior coverage, mode fidelity, and robustness to overfitting (Sani et al., 13 Jan 2026).

7. Summary Table: MMD Guidance Properties

Property Description Reference Scenario
Training-free No update to model weights; operates at inference Domain adaptation, style
Differentiable MMD Empirical estimate and gradient in pixel or latent space Any kernel metric
Kernel flexibility Gaussian RBF, polynomial, product kernels for conditional Prompt-aware LDM
Reference-efficient Robust with O(100)O(100) samples; minimal overfitting Few-shot user adaptation
Low computational overhead Latent-space implementation adds $10$–15%15\% runtime cost Stable Diffusion XL
Direct distribution matching Aligns to reference data's empirical distribution in RKHS Synthetic/real domains
Sample fidelity Preserves generative quality while achieving alignment FFHQ, CelebA experiments

MMD Guidance formalizes distributional adaptation as direct kernel-mean matching, circumventing limitations of classifier- or discriminator-based guidance frameworks, and is widely applicable to various generation architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MMD Guidance.