Sliced Maximum Mean Discrepancy (MMD)

Updated 19 January 2026

Sliced MMD is a kernel-based measure that compares high-dimensional distributions via one-dimensional projections, enhancing computational efficiency and privacy.
It employs unbiased Monte Carlo estimators and efficient algorithms, achieving sample complexity and per-slice costs that are largely independent of the ambient dimension.
Applications in generative modeling and domain adaptation benefit from its rigorous metric properties and convergence guarantees, making it a practical tool for distribution matching.

Sliced Maximum Mean Discrepancy (MMD) is a family of probability divergences that combine maximum mean discrepancy, a kernel-based measure of disparity between probability distributions, with slicing and (optionally) smoothing strategies for enhanced computational scalability, statistical efficiency, and privacy control. Sliced MMD variants are constructed by projecting high-dimensional distributions onto one-dimensional subspaces, computing univariate MMDs between the projected measures, and aggregating these quantities over many directions. This approach admits unbiased Monte Carlo estimators, enjoys favorable sample complexity independent of the ambient dimension, and under certain choices of kernels, enables fast algorithms via sorting and analytic reductions. Recent work establishes metric and topological guarantees for both vanilla and smoothed variants, as well as strong empirical and theoretical properties in generative modeling, domain adaptation, and privacy-preserving applications (Rakotomamonjy et al., 2021, Hertrich et al., 2023, Hagemann et al., 2023, Kolouri et al., 2020).

1. Formal Definitions and Kernel Reductions

For $\mu, \nu \in \mathcal{P}(\mathbb{R}^d)$ , and characteristic or conditionally positive definite kernel $k:\mathbb{R}\times\mathbb{R}\to\mathbb{R}$ , the (univariate) MMD between distributions $\mu', \nu'$ on $\mathbb{R}$ is

$\mathrm{MMD}(\mu',\nu') = \| \mathbb{E}_{X\sim\mu'}[\varphi(X)] - \mathbb{E}_{Y\sim\nu'}[\varphi(Y)] \|_H,$

where $\varphi$ maps $\mathbb{R}$ into an RKHS $H$ induced by $k$ .

For each $u\in\mathbb{S}^{d-1}$ (the unit sphere), define the pushforward (slice) $\mathcal{R}_u \mu(A) = \mu(\{x\in\mathbb{R}^d\,:\,u^\top x \in A\})$ for measurable $A\subset\mathbb{R}$ . The sliced MMD is obtained by averaging the 1D MMD over directions: $\mathrm{Sliced\text{-}MMD}(\mu,\nu) = \mathbb{E}_{u\sim\mathrm{Unif}(\mathbb{S}^{d-1})}[\mathrm{MMD}(\mathcal{R}_u\mu, \mathcal{R}_u\nu)].$

For the Gaussian-smoothed variant, convolve each 1D projection with $\mathcal{N}_\sigma$ , the Gaussian measure with variance $\sigma^2$ , giving

$G_\sigma \mathrm{SMMD}(\mu,\nu) = \mathbb{E}_{u\sim\mathrm{Unif}(\mathbb{S}^{d-1})}[ \mathrm{MMD}(\mathcal{R}_u\mu * \mathcal{N}_\sigma,\, \mathcal{R}_u\nu * \mathcal{N}_\sigma) ].$

This construction subsumes the unsmoothed case as $\sigma\to0$ (Rakotomamonjy et al., 2021).

Sliced MMD can be recovered as a generalized sliced probability metric (GSPM) for an appropriate choice of slice family and inner 1D metric (Kolouri et al., 2020). In particular, for Riesz kernels $k(x,y) = -\|x-y\|^r$ , $r\in(0,2)$ , the full $d$ -dimensional MMD coincides with its sliced version via spherical integration—enabling exact reduction to univariate settings (Hertrich et al., 2023). For $r=1$ (the energy distance), this yields

$\mathrm{MMD}_K^2(\mu,\nu) = \mathbb{E}_{\xi\sim\mathbb{S}^{d-1}}\mathrm{MMD}_{-|\cdot|}^2(P_\xi\sharp\mu, P_\xi\sharp\nu).$

2. Metric, Topological, and Convergence Properties

Sliced MMD and its Gaussian-smoothed variant are bona fide metrics on $\mathcal{P}(\mathbb{R}^d)$ under standard assumptions on the base kernel (e.g., characteristic kernels) (Rakotomamonjy et al., 2021). They satisfy:

Non-negativity
Symmetry
Triangle inequality
Identity of indiscernibles

Moreover, $G_\sigma\mathrm{SMMD}$ metrizes weak convergence: for any sequence $\mu_n$ of Borel probability measures,

$G_\sigma\mathrm{SMMD}(\mu_n, \mu) \to 0 \iff \mu_n \Rightarrow \mu$

in the weak sense on $\mathbb{R}^d$ (Rakotomamonjy et al., 2021).

For Riesz kernels, the equivalence between sliced and full-dimensional MMDs implies that statistical and topological properties, including identities for weak metrization, pass unaltered to the sliced case (Hertrich et al., 2023, Kolouri et al., 2020).

3. Computational Algorithms and Sample Complexity

For empirical measures with $n$ i.i.d. samples, sliced MMD admits Monte Carlo approximations by averaging over $L$ random directions: $\widehat{G}_\sigma\mathrm{SMMD} = \frac{1}{L} \sum_{\ell=1}^L \mathrm{MMD}(\mathcal{R}_{u_\ell}\hat{\mu}_n * \mathcal{N}_\sigma, \mathcal{R}_{u_\ell}\hat{\nu}_n * \mathcal{N}_\sigma).$ The expected error decays as $O(1/\sqrt{L})$ in the number of directions, independently of $d$ , and sample complexity remains $O(n^{-1/2})$ , also independent of $d$ (Rakotomamonjy et al., 2021). For Riesz/energy kernels, per-slice gradients can be computed in $O((M+N)\log(M+N))$ for $M,N$ support points, exploiting 1D sorting. The mean squared gradient estimation error over $P$ slices scales as $O(\sqrt{d/P})$ (Hertrich et al., 2023, Hagemann et al., 2023).

The table summarizes complexity:

Sliced MMD Variant	Per-Direction Complexity	Overall MC Complexity
General kernel (unsmoothed)	$O(N^2)$	$O(LN^2)$
Riesz/energy kernel	$O(N\log N)$	$O(LN\log N)$
Gaussian-smoothed	$O(N^2)$	$O(LN^2)$

For practical applications, $L, P$ are typically selected to balance variance and computational cost, with $P\approx d$ often sufficient to maintain low slicing error (Hertrich et al., 2023, Hagemann et al., 2023).

4. Smoothing Parameter and Privacy

The smoothing parameter $\sigma$ in $G_\sigma\mathrm{SMMD}$ modulates both the size of the divergence and the privacy properties:

The divergence is monotone non-increasing in $\sigma$ , i.e., as $\sigma$ increases, $G_\sigma\mathrm{SMMD}(\mu,\nu)$ decreases (Rakotomamonjy et al., 2021).
$\lim_{\sigma\to0}G_\sigma\mathrm{SMMD}(\mu,\nu) = \mathrm{Sliced\text{-}MMD}(\mu,\nu)$ .

Gaussan smoothing by variance $\sigma^2$ for each slice is equivalent to adding noise to projected statistics, directly implementing the classical Gaussian mechanism from differential privacy. The privacy level (in the $(\epsilon,\delta)$ -DP sense) improves as $\sigma$ increases, with $\epsilon \approx \Delta/\sigma$ where $\Delta$ is the sensitivity of the 1D projection (Rakotomamonjy et al., 2021). Thus, the privacy-utility tradeoff is explicitly controlled by $\sigma$ —a larger $\sigma$ gives stronger privacy but smaller divergence values.

5. Applications: Gradient Flows and Generative Modeling

Sliced MMD is widely used as a statistical distance in generative modeling, distribution matching, and unsupervised domain adaptation. For instance, particle-based and neural network-based MMD flows can be trained using sliced MMD objectives. For Riesz kernels, the MMD flow can be fully reduced to a sequence of univariate energy distance computations, enabling competitive performance and substantial computational speedups compared to full MMD (Hertrich et al., 2023).

Implementing blockwise generator training by subdividing the MMD flow into short blocks and learning each with a neural network, one achieves Fréchet Inception Distances (FID) competitive with or surpassing other MMD- and Stein-flow–based generators on datasets including MNIST, FashionMNIST, CIFAR10, and CelebA. Speedups in per-gradient evaluation of over $100\times$ are observed for $d=100$ (Hertrich et al., 2023).

In domain adaptation, integrating $G_\sigma$ SMMD as the alignment term enables privacy-preserving transfer without sacrificing target-domain classification accuracy, even at high noise levels that confer strong DP guarantees (Rakotomamonjy et al., 2021).

6. Theoretical Guarantees and Gradient-Flow Convergence

Sliced MMD and its smoothed variants admit rigorous convergence guarantees for both optimization and probabilistic matching. In the context of gradient flows in Wasserstein space, the squared sliced MMD provides a geodesically convex functional. Discrete particle flows with MC slicing inherit global convergence properties under mild regularity conditions on the base kernel and slice family (Kolouri et al., 2020, Hagemann et al., 2023).

For conditional and joint generative modeling, error bounds are established for the approximation of posterior distributions, quantifying how generator error in the sliced MMD metric controls the divergence of conditional marginals (Hagemann et al., 2023). Under compact support and Hölder continuity, the expected conditional MMD scales as a fractional power of the generator error.

7. Empirical Observations and Practical Considerations

Empirical studies confirm several salient properties:

Sample complexity and projection complexity rates are independent of the ambient dimension for both vanilla and smoothed sliced MMD (Rakotomamonjy et al., 2021).
Smoothing effects have minor impact on separation power for practical kernel choices, with negligible performance loss in real tasks.
Monte Carlo slicing error falls as $O(1/\sqrt{L})$ and $O(\sqrt{d/P})$ for direction averaging (Rakotomamonjy et al., 2021, Hertrich et al., 2023).
Sliced MMD flows outperform or match traditional MMD- or optimal transport-based methods in both convergence and visual quality across synthetic and real datasets (Hertrich et al., 2023, Kolouri et al., 2020).
Use of local projections and multiscale flow schedules enhances efficiency for high-dimensional data such as images (Hagemann et al., 2023).

In summary, Sliced Maximum Mean Discrepancy and its Gaussian-smoothed and Riesz-kernel variants provide a flexible, metrically sound, and computationally efficient framework for comparing high-dimensional probability measures, supporting gradient-flow methods, privacy-preserving learning, and scalable distributional alignment (Rakotomamonjy et al., 2021, Hertrich et al., 2023, Hagemann et al., 2023, Kolouri et al., 2020).

Markdown Report Issue Upgrade to Chat

References (4)

Statistical and Topological Properties of Gaussian Smoothed Sliced Probability Divergences (2021)

Generative Sliced MMD Flows with Riesz Kernels (2023)

Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel (2023)

Generalized Sliced Distances for Probability Distributions (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sliced Maximum Mean Discrepancy (MMD).

Sliced Maximum Mean Discrepancy (MMD)

1. Formal Definitions and Kernel Reductions

2. Metric, Topological, and Convergence Properties

3. Computational Algorithms and Sample Complexity

4. Smoothing Parameter and Privacy

5. Applications: Gradient Flows and Generative Modeling

6. Theoretical Guarantees and Gradient-Flow Convergence

7. Empirical Observations and Practical Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sliced Maximum Mean Discrepancy (MMD)

1. Formal Definitions and Kernel Reductions

2. Metric, Topological, and Convergence Properties

3. Computational Algorithms and Sample Complexity

4. Smoothing Parameter and Privacy

5. Applications: Gradient Flows and Generative Modeling

6. Theoretical Guarantees and Gradient-Flow Convergence

7. Empirical Observations and Practical Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research