Sliced Maximum Mean Discrepancy (MMD)
- Sliced MMD is a kernel-based measure that compares high-dimensional distributions via one-dimensional projections, enhancing computational efficiency and privacy.
- It employs unbiased Monte Carlo estimators and efficient algorithms, achieving sample complexity and per-slice costs that are largely independent of the ambient dimension.
- Applications in generative modeling and domain adaptation benefit from its rigorous metric properties and convergence guarantees, making it a practical tool for distribution matching.
Sliced Maximum Mean Discrepancy (MMD) is a family of probability divergences that combine maximum mean discrepancy, a kernel-based measure of disparity between probability distributions, with slicing and (optionally) smoothing strategies for enhanced computational scalability, statistical efficiency, and privacy control. Sliced MMD variants are constructed by projecting high-dimensional distributions onto one-dimensional subspaces, computing univariate MMDs between the projected measures, and aggregating these quantities over many directions. This approach admits unbiased Monte Carlo estimators, enjoys favorable sample complexity independent of the ambient dimension, and under certain choices of kernels, enables fast algorithms via sorting and analytic reductions. Recent work establishes metric and topological guarantees for both vanilla and smoothed variants, as well as strong empirical and theoretical properties in generative modeling, domain adaptation, and privacy-preserving applications (Rakotomamonjy et al., 2021, Hertrich et al., 2023, Hagemann et al., 2023, Kolouri et al., 2020).
1. Formal Definitions and Kernel Reductions
For , and characteristic or conditionally positive definite kernel , the (univariate) MMD between distributions on is
where maps into an RKHS induced by .
For each (the unit sphere), define the pushforward (slice) for measurable . The sliced MMD is obtained by averaging the 1D MMD over directions:
For the Gaussian-smoothed variant, convolve each 1D projection with , the Gaussian measure with variance , giving
This construction subsumes the unsmoothed case as (Rakotomamonjy et al., 2021).
Sliced MMD can be recovered as a generalized sliced probability metric (GSPM) for an appropriate choice of slice family and inner 1D metric (Kolouri et al., 2020). In particular, for Riesz kernels , , the full -dimensional MMD coincides with its sliced version via spherical integration—enabling exact reduction to univariate settings (Hertrich et al., 2023). For (the energy distance), this yields
2. Metric, Topological, and Convergence Properties
Sliced MMD and its Gaussian-smoothed variant are bona fide metrics on under standard assumptions on the base kernel (e.g., characteristic kernels) (Rakotomamonjy et al., 2021). They satisfy:
- Non-negativity
- Symmetry
- Triangle inequality
- Identity of indiscernibles
Moreover, metrizes weak convergence: for any sequence of Borel probability measures,
in the weak sense on (Rakotomamonjy et al., 2021).
For Riesz kernels, the equivalence between sliced and full-dimensional MMDs implies that statistical and topological properties, including identities for weak metrization, pass unaltered to the sliced case (Hertrich et al., 2023, Kolouri et al., 2020).
3. Computational Algorithms and Sample Complexity
For empirical measures with i.i.d. samples, sliced MMD admits Monte Carlo approximations by averaging over random directions: The expected error decays as in the number of directions, independently of , and sample complexity remains , also independent of (Rakotomamonjy et al., 2021). For Riesz/energy kernels, per-slice gradients can be computed in for support points, exploiting 1D sorting. The mean squared gradient estimation error over slices scales as (Hertrich et al., 2023, Hagemann et al., 2023).
The table summarizes complexity:
| Sliced MMD Variant | Per-Direction Complexity | Overall MC Complexity |
|---|---|---|
| General kernel (unsmoothed) | ||
| Riesz/energy kernel | ||
| Gaussian-smoothed |
For practical applications, are typically selected to balance variance and computational cost, with often sufficient to maintain low slicing error (Hertrich et al., 2023, Hagemann et al., 2023).
4. Smoothing Parameter and Privacy
The smoothing parameter in modulates both the size of the divergence and the privacy properties:
- The divergence is monotone non-increasing in , i.e., as increases, decreases (Rakotomamonjy et al., 2021).
- .
Gaussan smoothing by variance for each slice is equivalent to adding noise to projected statistics, directly implementing the classical Gaussian mechanism from differential privacy. The privacy level (in the -DP sense) improves as increases, with where is the sensitivity of the 1D projection (Rakotomamonjy et al., 2021). Thus, the privacy-utility tradeoff is explicitly controlled by —a larger gives stronger privacy but smaller divergence values.
5. Applications: Gradient Flows and Generative Modeling
Sliced MMD is widely used as a statistical distance in generative modeling, distribution matching, and unsupervised domain adaptation. For instance, particle-based and neural network-based MMD flows can be trained using sliced MMD objectives. For Riesz kernels, the MMD flow can be fully reduced to a sequence of univariate energy distance computations, enabling competitive performance and substantial computational speedups compared to full MMD (Hertrich et al., 2023).
Implementing blockwise generator training by subdividing the MMD flow into short blocks and learning each with a neural network, one achieves Fréchet Inception Distances (FID) competitive with or surpassing other MMD- and Stein-flow–based generators on datasets including MNIST, FashionMNIST, CIFAR10, and CelebA. Speedups in per-gradient evaluation of over are observed for (Hertrich et al., 2023).
In domain adaptation, integrating SMMD as the alignment term enables privacy-preserving transfer without sacrificing target-domain classification accuracy, even at high noise levels that confer strong DP guarantees (Rakotomamonjy et al., 2021).
6. Theoretical Guarantees and Gradient-Flow Convergence
Sliced MMD and its smoothed variants admit rigorous convergence guarantees for both optimization and probabilistic matching. In the context of gradient flows in Wasserstein space, the squared sliced MMD provides a geodesically convex functional. Discrete particle flows with MC slicing inherit global convergence properties under mild regularity conditions on the base kernel and slice family (Kolouri et al., 2020, Hagemann et al., 2023).
For conditional and joint generative modeling, error bounds are established for the approximation of posterior distributions, quantifying how generator error in the sliced MMD metric controls the divergence of conditional marginals (Hagemann et al., 2023). Under compact support and Hölder continuity, the expected conditional MMD scales as a fractional power of the generator error.
7. Empirical Observations and Practical Considerations
Empirical studies confirm several salient properties:
- Sample complexity and projection complexity rates are independent of the ambient dimension for both vanilla and smoothed sliced MMD (Rakotomamonjy et al., 2021).
- Smoothing effects have minor impact on separation power for practical kernel choices, with negligible performance loss in real tasks.
- Monte Carlo slicing error falls as and for direction averaging (Rakotomamonjy et al., 2021, Hertrich et al., 2023).
- Sliced MMD flows outperform or match traditional MMD- or optimal transport-based methods in both convergence and visual quality across synthetic and real datasets (Hertrich et al., 2023, Kolouri et al., 2020).
- Use of local projections and multiscale flow schedules enhances efficiency for high-dimensional data such as images (Hagemann et al., 2023).
In summary, Sliced Maximum Mean Discrepancy and its Gaussian-smoothed and Riesz-kernel variants provide a flexible, metrically sound, and computationally efficient framework for comparing high-dimensional probability measures, supporting gradient-flow methods, privacy-preserving learning, and scalable distributional alignment (Rakotomamonjy et al., 2021, Hertrich et al., 2023, Hagemann et al., 2023, Kolouri et al., 2020).