Papers
Topics
Authors
Recent
Search
2000 character limit reached

Kernel Quantile Discrepancies (KQDs)

Updated 4 December 2025
  • Kernel Quantile Discrepancies (KQDs) are statistical distances that embed quantiles in reproducing kernel Hilbert spaces to provide a finer representation of probability distributions.
  • They extend traditional kernel mean embedding methods by incorporating generalized quantiles, capturing higher-order distributional differences and subsuming sliced Wasserstein distances.
  • Efficient estimation procedures with near-linear computational cost make KQDs a robust alternative for two-sample testing and high-dimensional generative model evaluation.

Kernel Quantile Discrepancies (KQDs) are a family of statistical distances between probability distributions, constructed via the kernel quantile embedding (KQE) operator in reproducing kernel Hilbert spaces (RKHS). Extending the classical kernel mean embedding paradigm exemplified by maximum mean discrepancy (MMD), KQDs incorporate generalized quantiles as distributional features, yielding probability metrics under weaker kernel conditions, admitting efficient estimation, and subsuming kernelized forms of sliced Wasserstein distances (Naslidnyk et al., 26 May 2025).

1. Kernel Quantile Embedding Operator

Let XX denote a Hausdorff, separable, σ-compact Borel space and k ⁣:X×XRk \colon X \times X \rightarrow \mathbb{R} a continuous, measurable, and separating kernel, with associated RKHS H\mathcal{H} and feature map ψ(x)=k(x,)\psi(x) = k(x, \cdot). The unit sphere in H\mathcal{H}, SH={uH:uH=1}S_\mathcal{H} = \{u \in \mathcal{H} : \|u\|_\mathcal{H} = 1\}, indexes the projections of ψ(X)\psi(X) onto one-dimensional subspaces.

For a direction uSHu \in S_\mathcal{H}, define the one-dimensional pushforward measure u#Pu\#P by (u#P)(B)=P({x:u(x)B})(u\#P)(B) = P(\{x : u(x) \in B\}). The usual quantile of level α[0,1]\alpha \in [0,1] is

ρu#Pα=inf{zR:(u#P)((,z])α}.\rho_{u\#P}^{\,\alpha} = \inf\{z \in \mathbb{R} : (u\#P)((-\infty, z]) \geq \alpha\}.

The kernel quantile embedding of PP along direction uu and level α\alpha is then

ρPα,u:=ρu#PαuH.\rho_P^{\alpha,u} := \rho_{u\#P}^{\,\alpha} u \in \mathcal{H}.

Equivalently, this is the α\alpha-quantile of the pushforward of the ψ#P\psi\#P in H\mathcal{H} along uu, leveraging the canonical representation u(x)=u,ψ(x)Hu(x) = \langle u, \psi(x) \rangle_\mathcal{H}.

2. Definitions and Properties of Kernel Quantile Discrepancies

For a probability measure ν\nu with full support on [0,1][0,1] (quantile levels) and γ\gamma with full support on SHS_\mathcal{H} (RKHS directions), define the directional quantile distance of order p1p \geq 1 between distributions P,QP, Q as

τp(P,Q;ν,u)=(01ρPα,uρQα,uHpν(dα))1/p.\tau_p(P, Q; \nu, u) = \left( \int_0^1 \|\rho_P^{\alpha,u} - \rho_Q^{\alpha,u}\|_\mathcal{H}^p \, \nu(d\alpha) \right)^{1/p}.

There are two aggregations:

  • Expected KQD (e-KQDp_p):

e-KQDp(P,Q;ν,γ)=(Euγ[τp(P,Q;ν,u)p])1/p.e\text{-KQD}_p(P, Q; \nu, \gamma) = \left( \mathbb{E}_{u \sim \gamma}[\tau_p(P, Q; \nu, u)^p] \right)^{1/p}.

  • Supremum KQD (sup-KQDp_p):

sup-KQDp(P,Q;ν)=(supuSHτp(P,Q;ν,u)p)1/p.\sup\text{-KQD}_p(P, Q; \nu) = \left( \sup_{u \in S_\mathcal{H}} \tau_p(P, Q; \nu, u)^p \right)^{1/p}.

Provided the kernel assumptions and support conditions on ν\nu and γ\gamma, both e-KQDp_p and sup-KQDp_p are metrics on the space of probability measures over XX. The injectivity of P{ρPα,u}P \mapsto \{\rho_P^{\alpha,u}\} is established by an RKHS Cramér–Wold theorem, ensuring KQDs are distinguishing (zero KQD implies equality in law).

3. Relationships to MMD and Sliced Wasserstein Distances

The classical maximum mean discrepancy is defined as

MMD(P,Q)=μPμQH\mathrm{MMD}(P, Q) = \|\mu_P - \mu_Q\|_\mathcal{H}

with μP\mu_P the kernel mean embedding. MMD is sensitive only to differences in first moments EP[u]EQ[u]\mathbb{E}_P[u] - \mathbb{E}_Q[u].

The KQD framework extends this, as any mean-characteristic kernel is also quantile-characteristic, so KQDs distinguish all pairs that MMD can, and more. Notably, with centered (mean-subtracted) KQEs and p=2p=2,

e-KQD22~(P,Q)=MMD2(P,Q)+e-KQD22(P,Q)Eu(EP[u]EQ[u])2,\widetilde{e\text{-KQD}_2^2}(P, Q) = \mathrm{MMD}^2(P, Q) + e\text{-KQD}_2^2(P, Q) - \mathbb{E}_u \left(\mathbb{E}_P[u] - \mathbb{E}_Q[u]\right)^2,

positioning the centered KQD as a sum of the classical MMD and a kernelized quantile-Wasserstein term.

If ν\nu is the uniform measure on [0,1][0,1], then e-KQDp_p coincides with a kernelized expected pp-sliced Wasserstein distance; for X=RdX = \mathbb{R}^d, k(x,y)=xyk(x, y) = x^\top y, and uniform γ\gamma on the sphere, this recovers standard expected sliced Wasserstein. Similarly, sup-KQDp_p under these conditions is a kernelized max-sliced Wasserstein.

4. Estimation Procedures and Computational Complexity

For empirical KQEs, the order-statistic estimator

ρu#Pnα=[u(x1:n)]αn\rho_{u\#P_n}^\alpha = [u(x_{1:n})]_{\lceil \alpha n \rceil}

satisfies ρPnα,uρPα,uH=O(n1/2)\|\rho_{P_n}^{\alpha,u} - \rho_P^{\alpha,u}\|_\mathcal{H} = O(n^{-1/2}) with high probability for fixed α,u\alpha, u.

Empirical e-KQDp_p is estimated by finite Monte Carlo approximation of γ\gamma (u1,,uu_1,\ldots,u_\ell). For p=1p=1, with probability 1δ1-\delta: e-KQD1(Pn,Qn;ν,γ)e-KQD1(P,Q;ν,γ)C(δ)(1/2+n1/2).\left|e\text{-KQD}_1(P_n, Q_n; \nu, \gamma_\ell) - e\text{-KQD}_1(P, Q; \nu, \gamma)\right| \leq C(\delta)(\ell^{-1/2} + n^{-1/2}).

A scalable estimator is implemented using "Gaussian directions": γ\gamma is taken as the projection of a centered Gaussian measure N(0,C)N(0, C) in H\mathcal{H}, with C[f](x)=Xk(x,y)f(y)ξ(dy)C[f](x) = \int_X k(x, y) f(y) \xi(dy) and zjξz_j \sim \xi. Sampled functions f(x)f(x) are propagated to u=f/fHu = f/\|f\|_\mathcal{H}, and for \ell directions, the main computational costs per direction are:

  • O(nm)\mathcal{O}(nm) for evaluating ff on nn points,
  • O(m2)\mathcal{O}(m^2) for computing fH\|f\|_\mathcal{H},
  • O(nlogn)\mathcal{O}(n\log n) for sorting.

With =m=logn\ell = m = \log n, the total cost of e-KQD estimation is O(nlog2n)\mathcal{O}(n\log^2 n), near-linear in the sample size.

5. Empirical Evaluation and Comparative Performance

KQD-based distances were empirically assessed in nonparametric two-sample testing with permutation thresholds. Baselines were classical MMD with quadratic complexity, fast MMD approximations (MMD-lin, MMD-Multi), and kernel max-sliced Wasserstein.

Key findings across several datasets include:

  • In a "power-decay" setting (multivariate Gaussian shift with increasing dimension), near-linear e-KQD retains power longer than MMD-Multi.
  • In one-dimensional tests (Laplace vs. Gaussian, same first two moments, polynomial kernel not mean-characteristic), MMD fails to distinguish; KQDs succeed.
  • On high-dimensional image data (Galaxy MNIST, CIFAR-10 vs. CIFAR-10.1), e-KQD2_2 and sup-KQD outperform fast MMD at comparable computational cost; the centered O(n2)O(n^2) KQD matches full MMD performance.
  • Type I error is maintained at the 5% level by permutation.

This suggests that KQDs, especially the efficient e-KQD estimator, offer a competitive and in several cases superior alternative to both classical MMD and fast MMD approximations, with rigorous metric and convergence guarantees (Naslidnyk et al., 26 May 2025).

6. Significance and Theoretical Implications

KQDs demonstrate that quantiles in RKHS can serve as a finer-grained representation of distributions than the mean function, circumventing the requirement for mean-characteristic kernels for metric properties. By unifying the kernel MMD and sliced Wasserstein paradigms, KQDs provide a general family of probability metrics with both theoretical rigor and practical efficiency. A plausible implication is that KQD-based methodologies may serve as the foundation for new nonparametric tests and generative model benchmarks where first-moment distinctions are insufficient.

7. Summary Table of Key Properties

Property KQDs (e-KQD, sup-KQD) MMD
Representation RKHS quantiles along directions RKHS mean
Metric under kernel conditions Weaker (quantile-characteristic suffices) Strong (mean-characteristic)
Recovers kernelized slices Yes (sliced Wasserstein limits) No
Time complexity (fastest) O(nlog2n)\mathcal{O}(n\log^2 n) O(n2)\mathcal{O}(n^2) (U-stat) / O(n)\mathcal{O}(n) (MMD-lin)
Empirical distinguishing Higher in several regimes Fails for some non-mean differences

These results position KQDs as a versatile and computationally attractive class of probability metrics, offering both theoretical generality and empirical power beyond MMD in a range of challenging regimes (Naslidnyk et al., 26 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Kernel Quantile Discrepancies (KQDs).