Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sliced Maximum Mean Discrepancy (MMD)

Updated 19 January 2026
  • Sliced MMD is a kernel-based measure that compares high-dimensional distributions via one-dimensional projections, enhancing computational efficiency and privacy.
  • It employs unbiased Monte Carlo estimators and efficient algorithms, achieving sample complexity and per-slice costs that are largely independent of the ambient dimension.
  • Applications in generative modeling and domain adaptation benefit from its rigorous metric properties and convergence guarantees, making it a practical tool for distribution matching.

Sliced Maximum Mean Discrepancy (MMD) is a family of probability divergences that combine maximum mean discrepancy, a kernel-based measure of disparity between probability distributions, with slicing and (optionally) smoothing strategies for enhanced computational scalability, statistical efficiency, and privacy control. Sliced MMD variants are constructed by projecting high-dimensional distributions onto one-dimensional subspaces, computing univariate MMDs between the projected measures, and aggregating these quantities over many directions. This approach admits unbiased Monte Carlo estimators, enjoys favorable sample complexity independent of the ambient dimension, and under certain choices of kernels, enables fast algorithms via sorting and analytic reductions. Recent work establishes metric and topological guarantees for both vanilla and smoothed variants, as well as strong empirical and theoretical properties in generative modeling, domain adaptation, and privacy-preserving applications (Rakotomamonjy et al., 2021, Hertrich et al., 2023, Hagemann et al., 2023, Kolouri et al., 2020).

1. Formal Definitions and Kernel Reductions

For μ,νP(Rd)\mu, \nu \in \mathcal{P}(\mathbb{R}^d), and characteristic or conditionally positive definite kernel k:R×RRk:\mathbb{R}\times\mathbb{R}\to\mathbb{R}, the (univariate) MMD between distributions μ,ν\mu', \nu' on R\mathbb{R} is

MMD(μ,ν)=EXμ[φ(X)]EYν[φ(Y)]H,\mathrm{MMD}(\mu',\nu') = \| \mathbb{E}_{X\sim\mu'}[\varphi(X)] - \mathbb{E}_{Y\sim\nu'}[\varphi(Y)] \|_H,

where φ\varphi maps R\mathbb{R} into an RKHS HH induced by kk.

For each uSd1u\in\mathbb{S}^{d-1} (the unit sphere), define the pushforward (slice) Ruμ(A)=μ({xRd:uxA})\mathcal{R}_u \mu(A) = \mu(\{x\in\mathbb{R}^d\,:\,u^\top x \in A\}) for measurable ARA\subset\mathbb{R}. The sliced MMD is obtained by averaging the 1D MMD over directions: Sliced-MMD(μ,ν)=EuUnif(Sd1)[MMD(Ruμ,Ruν)].\mathrm{Sliced\text{-}MMD}(\mu,\nu) = \mathbb{E}_{u\sim\mathrm{Unif}(\mathbb{S}^{d-1})}[\mathrm{MMD}(\mathcal{R}_u\mu, \mathcal{R}_u\nu)].

For the Gaussian-smoothed variant, convolve each 1D projection with Nσ\mathcal{N}_\sigma, the Gaussian measure with variance σ2\sigma^2, giving

GσSMMD(μ,ν)=EuUnif(Sd1)[MMD(RuμNσ,RuνNσ)].G_\sigma \mathrm{SMMD}(\mu,\nu) = \mathbb{E}_{u\sim\mathrm{Unif}(\mathbb{S}^{d-1})}[ \mathrm{MMD}(\mathcal{R}_u\mu * \mathcal{N}_\sigma,\, \mathcal{R}_u\nu * \mathcal{N}_\sigma) ].

This construction subsumes the unsmoothed case as σ0\sigma\to0 (Rakotomamonjy et al., 2021).

Sliced MMD can be recovered as a generalized sliced probability metric (GSPM) for an appropriate choice of slice family and inner 1D metric (Kolouri et al., 2020). In particular, for Riesz kernels k(x,y)=xyrk(x,y) = -\|x-y\|^r, r(0,2)r\in(0,2), the full dd-dimensional MMD coincides with its sliced version via spherical integration—enabling exact reduction to univariate settings (Hertrich et al., 2023). For r=1r=1 (the energy distance), this yields

MMDK2(μ,ν)=EξSd1MMD2(Pξμ,Pξν).\mathrm{MMD}_K^2(\mu,\nu) = \mathbb{E}_{\xi\sim\mathbb{S}^{d-1}}\mathrm{MMD}_{-|\cdot|}^2(P_\xi\sharp\mu, P_\xi\sharp\nu).

2. Metric, Topological, and Convergence Properties

Sliced MMD and its Gaussian-smoothed variant are bona fide metrics on P(Rd)\mathcal{P}(\mathbb{R}^d) under standard assumptions on the base kernel (e.g., characteristic kernels) (Rakotomamonjy et al., 2021). They satisfy:

  • Non-negativity
  • Symmetry
  • Triangle inequality
  • Identity of indiscernibles

Moreover, GσSMMDG_\sigma\mathrm{SMMD} metrizes weak convergence: for any sequence μn\mu_n of Borel probability measures,

GσSMMD(μn,μ)0    μnμG_\sigma\mathrm{SMMD}(\mu_n, \mu) \to 0 \iff \mu_n \Rightarrow \mu

in the weak sense on Rd\mathbb{R}^d (Rakotomamonjy et al., 2021).

For Riesz kernels, the equivalence between sliced and full-dimensional MMDs implies that statistical and topological properties, including identities for weak metrization, pass unaltered to the sliced case (Hertrich et al., 2023, Kolouri et al., 2020).

3. Computational Algorithms and Sample Complexity

For empirical measures with nn i.i.d. samples, sliced MMD admits Monte Carlo approximations by averaging over LL random directions: G^σSMMD=1L=1LMMD(Ruμ^nNσ,Ruν^nNσ).\widehat{G}_\sigma\mathrm{SMMD} = \frac{1}{L} \sum_{\ell=1}^L \mathrm{MMD}(\mathcal{R}_{u_\ell}\hat{\mu}_n * \mathcal{N}_\sigma, \mathcal{R}_{u_\ell}\hat{\nu}_n * \mathcal{N}_\sigma). The expected error decays as O(1/L)O(1/\sqrt{L}) in the number of directions, independently of dd, and sample complexity remains O(n1/2)O(n^{-1/2}), also independent of dd (Rakotomamonjy et al., 2021). For Riesz/energy kernels, per-slice gradients can be computed in O((M+N)log(M+N))O((M+N)\log(M+N)) for M,NM,N support points, exploiting 1D sorting. The mean squared gradient estimation error over PP slices scales as O(d/P)O(\sqrt{d/P}) (Hertrich et al., 2023, Hagemann et al., 2023).

The table summarizes complexity:

Sliced MMD Variant Per-Direction Complexity Overall MC Complexity
General kernel (unsmoothed) O(N2)O(N^2) O(LN2)O(LN^2)
Riesz/energy kernel O(NlogN)O(N\log N) O(LNlogN)O(LN\log N)
Gaussian-smoothed O(N2)O(N^2) O(LN2)O(LN^2)

For practical applications, L,PL, P are typically selected to balance variance and computational cost, with PdP\approx d often sufficient to maintain low slicing error (Hertrich et al., 2023, Hagemann et al., 2023).

4. Smoothing Parameter and Privacy

The smoothing parameter σ\sigma in GσSMMDG_\sigma\mathrm{SMMD} modulates both the size of the divergence and the privacy properties:

  • The divergence is monotone non-increasing in σ\sigma, i.e., as σ\sigma increases, GσSMMD(μ,ν)G_\sigma\mathrm{SMMD}(\mu,\nu) decreases (Rakotomamonjy et al., 2021).
  • limσ0GσSMMD(μ,ν)=Sliced-MMD(μ,ν)\lim_{\sigma\to0}G_\sigma\mathrm{SMMD}(\mu,\nu) = \mathrm{Sliced\text{-}MMD}(\mu,\nu).

Gaussan smoothing by variance σ2\sigma^2 for each slice is equivalent to adding noise to projected statistics, directly implementing the classical Gaussian mechanism from differential privacy. The privacy level (in the (ϵ,δ)(\epsilon,\delta)-DP sense) improves as σ\sigma increases, with ϵΔ/σ\epsilon \approx \Delta/\sigma where Δ\Delta is the sensitivity of the 1D projection (Rakotomamonjy et al., 2021). Thus, the privacy-utility tradeoff is explicitly controlled by σ\sigma—a larger σ\sigma gives stronger privacy but smaller divergence values.

5. Applications: Gradient Flows and Generative Modeling

Sliced MMD is widely used as a statistical distance in generative modeling, distribution matching, and unsupervised domain adaptation. For instance, particle-based and neural network-based MMD flows can be trained using sliced MMD objectives. For Riesz kernels, the MMD flow can be fully reduced to a sequence of univariate energy distance computations, enabling competitive performance and substantial computational speedups compared to full MMD (Hertrich et al., 2023).

Implementing blockwise generator training by subdividing the MMD flow into short blocks and learning each with a neural network, one achieves Fréchet Inception Distances (FID) competitive with or surpassing other MMD- and Stein-flow–based generators on datasets including MNIST, FashionMNIST, CIFAR10, and CelebA. Speedups in per-gradient evaluation of over 100×100\times are observed for d=100d=100 (Hertrich et al., 2023).

In domain adaptation, integrating GσG_\sigmaSMMD as the alignment term enables privacy-preserving transfer without sacrificing target-domain classification accuracy, even at high noise levels that confer strong DP guarantees (Rakotomamonjy et al., 2021).

6. Theoretical Guarantees and Gradient-Flow Convergence

Sliced MMD and its smoothed variants admit rigorous convergence guarantees for both optimization and probabilistic matching. In the context of gradient flows in Wasserstein space, the squared sliced MMD provides a geodesically convex functional. Discrete particle flows with MC slicing inherit global convergence properties under mild regularity conditions on the base kernel and slice family (Kolouri et al., 2020, Hagemann et al., 2023).

For conditional and joint generative modeling, error bounds are established for the approximation of posterior distributions, quantifying how generator error in the sliced MMD metric controls the divergence of conditional marginals (Hagemann et al., 2023). Under compact support and Hölder continuity, the expected conditional MMD scales as a fractional power of the generator error.

7. Empirical Observations and Practical Considerations

Empirical studies confirm several salient properties:

  • Sample complexity and projection complexity rates are independent of the ambient dimension for both vanilla and smoothed sliced MMD (Rakotomamonjy et al., 2021).
  • Smoothing effects have minor impact on separation power for practical kernel choices, with negligible performance loss in real tasks.
  • Monte Carlo slicing error falls as O(1/L)O(1/\sqrt{L}) and O(d/P)O(\sqrt{d/P}) for direction averaging (Rakotomamonjy et al., 2021, Hertrich et al., 2023).
  • Sliced MMD flows outperform or match traditional MMD- or optimal transport-based methods in both convergence and visual quality across synthetic and real datasets (Hertrich et al., 2023, Kolouri et al., 2020).
  • Use of local projections and multiscale flow schedules enhances efficiency for high-dimensional data such as images (Hagemann et al., 2023).

In summary, Sliced Maximum Mean Discrepancy and its Gaussian-smoothed and Riesz-kernel variants provide a flexible, metrically sound, and computationally efficient framework for comparing high-dimensional probability measures, supporting gradient-flow methods, privacy-preserving learning, and scalable distributional alignment (Rakotomamonjy et al., 2021, Hertrich et al., 2023, Hagemann et al., 2023, Kolouri et al., 2020).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sliced Maximum Mean Discrepancy (MMD).