Quantile Isometry & Functional Slicing

Updated 12 November 2025

Quantile isometry and functional slicing are mathematical frameworks that embed probability measures into L2 spaces via quantile functions for efficient Wasserstein computations.
They underpin the double-sliced Wasserstein distance, dramatically reducing computational complexity while ensuring robust metric evaluation in diverse applications.
By extending classical sliced-Wasserstein approaches with functional projections, these methods enable stable, scalable optimal transport analysis in high-dimensional and meta-measure settings.

Quantile isometry and functional slicing constitute the core mathematical innovations underpinning scalable optimal transport (OT) for probability measures and, in particular, for meta-measures (measures over measures). These concepts connect the structure of 1D Wasserstein spaces to $L^2(0,1)$ function spaces through the geometry of quantile functions, enabling efficient computation of metrics such as the Wasserstein over Wasserstein (WoW) and its double-sliced surrogates. This article delineates the formal underpinnings, algorithmic realizations, and empirical characteristics of these methods, as developed in the context of the double-sliced Wasserstein (DSW) distance (Piening et al., 26 Sep 2025).

1. Quantile Isometry in One Dimension

The 2-Wasserstein distance $W_2$ on the real line between probability measures $\mu, \nu \in P_2(\mathbb{R})$ —those with finite second moment—admits a canonical formulation: $W_2(\mu,\nu) = \left( \inf_{\gamma \in \Gamma(\mu, \nu)} \int_{\mathbb{R}^2} (x - y)^2 \, d\gamma(x, y) \right)^{1/2},$ where $\Gamma(\mu, \nu)$ is the set of all couplings of $\mu$ and $\nu$ . For such measures, the quantile or inverse–CDF function, $Q_\mu : (0, 1) \rightarrow \mathbb{R}$ , assigns to $s$ the smallest $x$ with $\mu((-\infty, x]) \geq s$ . $Q_\mu$ is in $L^2(0,1)$ .

The critical isometry theorem holds: $W_2(\mu, \nu) = \| Q_\mu - Q_\nu \|_{L^2(0,1)} = \left( \int_0^1 |Q_\mu(s) - Q_\nu(s)|^2 ds \right)^{1/2}.$ Thus, the mapping $q: P_2(\mathbb{R}) \to L^2(0,1)$ given by $\mu \mapsto Q_\mu$ is an isometric embedding for $W_2$ . The uniqueness and monotonicity of optimal couplings in one dimension (monotone rearrangement) ensure that this embedding exactly captures transport geometry.

The practical implication is that computation of $W_2$ in 1D reduces to the $L^2$ norm between quantiles, allowing OT on $\mathbb{R}$ to be computed via quantile sorting—a key enabler for scalable OT algorithms.

2. Functional Slicing in Banach Spaces

Extending to general separable Banach spaces $(E, \|\cdot\|)$ , with $E^*$ the continuous dual, slicing is formalized as follows. For $v \in E^*$ , the linear projection $\pi_v(x) = \langle v, x \rangle$ defines the push-forward $\pi_{v\sharp}\mu \in P_2(\mathbb{R})$ for any probability measure $\mu$ on $E$ . For a probability measure $\xi$ on $E^*$ , the $\xi$ -sliced Wasserstein distance is

$SW(\mu, \nu; \xi) = \left( \int_{E^*} W_2^2(\pi_{v\sharp}\mu, \pi_{v\sharp}\nu) d\xi(v) \right)^{1/2}.$

If $\operatorname{supp} \xi$ is not contained in a proper subspace of $E^*$ , this defines a metric on $P_2(E)$ . Along each direction $v$ , the measure is reduced to a 1D problem computable by quantiles; thus, slicing facilitates scalable, projection-based OT on high- or infinite-dimensional spaces.

This framework generalizes the classical sliced-Wasserstein approach (uniform integration over $S^{d-1}$ in $\mathbb{R}^d$ ) and provides a template for slicing in functional spaces using probability distributions (e.g., Gaussian processes) over $E^*$ .

3. Double-Sliced Wasserstein Distance for Meta-Measures

Comparing meta-measures $\alpha, \beta \in P_2(P_2(\mathbb{R}^d))$ (probability measures over measures) necessitates more structure. The double-sliced Wasserstein (DSW) metric operates by:

Outer (Euclidean) slice: For $u \in S^{d-1}$ , the induced 1D meta-measure $\alpha_u = \pi_{u\sharp}\alpha \in P_2(P_2(\mathbb{R}))$ pushes each atomic measure in $\alpha$ to its 1D projection along $u$ .
Inner (functional) slice: For $v \in L^2(0,1)$ and a 1D meta-measure $\gamma$ , the map $\pi_{v\sharp}(\gamma) = \operatorname{Law}(\langle v, Q_\mu \rangle)$ with $\mu \sim \gamma$ pushes via projections against random $L^2$ directions.

The DSW metric is

$\mathrm{DSW}(\alpha, \beta) = \left( \int_{u \in S^{d-1}} \left( \int_{v \in L^2(0,1)} W_2^2(\pi_{v\sharp} \alpha_u, \pi_{v\sharp} \beta_u) d\rho(v) \right) d\sigma(u) \right)^{1/2},$

where $\sigma$ is uniform measure on the sphere, and $\rho$ is a probability law (e.g., Gaussian process prior) on $L^2(0,1)$ . DSW thus combines spatial (Euclidean) projection with functional (quantile space) slicing.

For discrete meta-measures supported on finitely many atomic measures, DSW minimization is equivalent to minimizing the original WoW metric: $\mathrm{DSW}(\alpha, \beta) \rightarrow 0 \iff W_2(\alpha, \beta; P_2(\mathbb{R}^d)) \rightarrow 0.$ This demonstrates fidelity of DSW as a surrogate for WoW when applied to practical data representations.

4. Algorithmic Structure and Computational Complexity

The computation of DSW proceeds as follows:

Sample $N_u$ directions $u_i \sim \sigma$ on $S^{d-1}$ .
For each $u_i$ , calculate the induced 1D meta-measures $\alpha_{u_i}$ and $\beta_{u_i}$ .
For each $u_i$ , sample $N_v$ directions $v_{i,j} \sim \rho$ in $L^2(0,1)$ .
For each $v_{i,j}$ , project each 1D atomic measure $\mu$ to the scalar $\langle v_{i,j}, Q_\mu \rangle$ .
Compute $W_2$ between the resulting empirical distributions via quantile sorting.
Aggregate via a two-level Monte Carlo average.

The computational cost is $O(N_u N_v n \log n)$ , where $n$ is the maximum support of the atomic measures. In practice, $N_u N_v \ll N^2$ (with $N$ the number of meta-measure atoms), leading to considerable computational savings over the naive $O(N^3 \log N)$ and $O(N^2 n \log n)$ required for direct WoW on discrete meta-measures. The method avoids calculation of high-order moments and high-dimensional LPs, relying solely on quantile evaluation and random projections.

5. Empirical Properties and Practical Significance

Numerical experiments demonstrate:

For shape classification based on local distance distributions, DSW matches WoW in KNN accuracy while reducing computation time by orders of magnitude on large meshes.
In OTDD (Optimal Transport Dataset Distance) settings for batches of images, DSW correlates with the ground-truth OTDD (Pearson/Spearman $>0.9$ with $10^4$ projections), outperforming moment-based approaches that are unstable for high-order moments.
In generative point-cloud testing, DSW captures distributional phenomena (mode collapse, sensitivity to noise) similarly to OT-NNA, but with linear rather than cubic or quadratic cost in batch or point resolution.
For image patch distribution matching, DSW aligns with both Euclidean Wasserstein and KID metrics and maintains robustness under various image transformations and sampling artifacts.

These results establish DSW as a scalable, reliable surrogate for high-dimensional and meta-measure OT, without requiring parametric assumptions or higher-order statistics. The adoption of quantile-based isometry ensures numerical stability even with limited projection samples and irregular supports.

Quantile isometry and functional slicing generalize classical sliced-Wasserstein techniques (Bonnotte et al. 2015) and intersect with recent measures for datasets and distributions, including s-OTDD (Nguyen et al. 2025) and point-cloud OT (Piening & Beinert 2025). Functional slicing (integration over infinite-dimensional $L^2$ directions) parallels developments in random projections on Hilbert spaces (Han 2023).

A plausible implication is that the isometric embedding of probability measures into $L^2$ quantile space enables further dimensionality reduction and kernelization strategies for OT beyond Euclidean settings. The restriction to $p=2$ (quadratic cost) is crucial, as only for $W_2$ does the quantile isometry hold; analogous constructions for $p \neq 2$ would not generally preserve metric structure in $L^p$ .

Furthermore, the avoidance of high-order moments circumvents numerical instability endemic to earlier sliced meta-OT methods—particularly for empirical measures with heavy tails or irregular support—while still providing separation and stability guarantees for discrete cases.

Table: Sliced Wasserstein vs. Double-Sliced (Functional) Wasserstein

Metric	Slicing Domain	Direction Distribution
Sliced Wasserstein	$S^{d-1}$ in $\mathbb{R}^d$	Uniform (Haar)
DSW	$S^{d-1} \times L^2(0,1)$	Sphere $\times$ GP on $L^2$

This tabulation emphasizes the concatenated, two-level projection structure distinguishing DSW from traditional methods.

Overall, quantile isometry and functional slicing establish a theoretically sound and computationally efficient framework for optimal transport comparisons of both data and meta-data distributions, with direct impact on scalable learning and distributional geometry in high dimensions (Piening et al., 26 Sep 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Slicing Wasserstein Over Wasserstein Via Functional Optimal Transport (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Quantile Isometry and Functional Slicing.