Papers
Topics
Authors
Recent
Search
2000 character limit reached

Wasserstein Dependency Measures

Updated 19 February 2026
  • Distance-maximizing Wasserstein dependency measures are a rigorous framework that quantifies statistical dependence using optimal transport metrics and clear thresholds for independence and maximal dependence.
  • They leverage convex duality and efficient computational algorithms, such as linear programming and entropic regularization, to estimate dependence in high-dimensional settings.
  • Applications span reinforcement learning, robust statistics, and Bayesian nonparametrics, offering actionable insights for representation learning and model evaluation.

Distance-maximizing Wasserstein dependency measures constitute a rigorous framework in which statistical dependence between random variables, groups, or more general random structures is quantified via optimal transport metrics. By formulating dependence as a distance—either from independence or from maximal dependence—in an appropriate metric geometry, these measures provide a conceptually unified and metrically sensitive approach to characterizing, estimating, and optimizing statistical relationships. These frameworks are instantiated in a diversity of contexts, including classical dependence assessment, unsupervised representation learning, reinforcement learning, robust statistics, Bayesian nonparametrics, and the geometric analysis of random measures.

1. Core Definitions and Formalism

Let (X,Y)(X, Y) be random variables with joint law PXYP_{XY} on a Polish space X×Y\mathcal X \times \mathcal Y, and marginals PXP_X, PYP_Y. The pp-Wasserstein distance between probability measures μ,ν\mu, \nu on a metric space (M,d)(\mathcal M, d) is

Wd,p(μ,ν)=[infπΓ(μ,ν)E(x,y)πd(x,y)p]1/pW_{d,p}(\mu, \nu) = \left[ \inf_{\pi \in \Gamma(\mu, \nu)} \mathbb E_{(x, y) \sim \pi} d(x, y)^p \right]^{1/p}

where Γ(μ,ν)\Gamma(\mu,\nu) is the set of couplings with the given marginals.

Distance to Independence and Maximal Dependence

  • Distance to Independence:

PXYP_{XY}0

PXYP_{XY}1 is zero if and only if PXYP_{XY}2 and PXYP_{XY}3 are independent (Catalano et al., 7 Oct 2025).

  • Distance to Maximal Dependence: Given a set of “maximally dependent” couplings PXYP_{XY}4 (e.g., PXYP_{XY}5 for measurable maps PXYP_{XY}6), the distance is

PXYP_{XY}7

and can be normalized as

PXYP_{XY}8

which yields PXYP_{XY}9 at maximal dependence (e.g., X×Y\mathcal X \times \mathcal Y0), and X×Y\mathcal X \times \mathcal Y1 at independence (Catalano et al., 7 Oct 2025).

  • Kantorovich–Rubinstein Duality: For X×Y\mathcal X \times \mathcal Y2, the Wasserstein distance admits dual formulation:

X×Y\mathcal X \times \mathcal Y3

2. Theoretical Properties and Characterizations

  • Independence and Maximal Dependence:
    • X×Y\mathcal X \times \mathcal Y4 if and only if X×Y\mathcal X \times \mathcal Y5 (independence).
    • Normalized indices (e.g., X×Y\mathcal X \times \mathcal Y6, X×Y\mathcal X \times \mathcal Y7 for random measures) attain X×Y\mathcal X \times \mathcal Y8 exactly at maximal dependence and X×Y\mathcal X \times \mathcal Y9 at independence (Catalano et al., 2021, Catalano et al., 7 Oct 2025).
  • Metric Invariance:
    • Wasserstein distances are invariant under isometries of the ground metric. In copula-based settings, indices can be made invariant under monotone reparametrizations of the marginals (Catalano et al., 7 Oct 2025).
  • Robustness:
    • Joint distances (e.g., PXP_X0) are Lipschitz continuous under weak convergence of probability measures; they respond linearly to small contamination with independent noise (Catalano et al., 7 Oct 2025).
  • Sample Complexity:
    • For empirical plug-in estimators, PXP_X1 achieves parametric rate PXP_X2 in low dimensions; rates degrade with ambient dimension PXP_X3 as PXP_X4 (Catalano et al., 7 Oct 2025).

3. Optimization and Computation

Convex and Dual Formulations

Distance-maximizing Wasserstein dependency measures often reduce to convex or saddle-point problems:

  • Piecewise-Algebraic Structure: For discrete variables, the distance to independence is the solution to a bilevel linear program, whose minimizer has an explicit piecewise algebraic structure governed by the geometry of the Lipschitz polytope and its dual unit ball. For the case of a Segre variety independence model, the explicit solution is computable via duality and polyhedral geometry (Çelik et al., 2020).
  • Numerical Quadrature: For dependent random measures, the Wasserstein index of dependence PXP_X5 reduces to a one-dimensional quadrature on tail integrals of the Lévy measures, making it practical for high-dimensional (PXP_X6) settings (Catalano et al., 2021).

Efficient Algorithms

  • Linear Programming: For discrete measures with finite support, Wasserstein distances are computable via network-flow LPs; complexity is PXP_X7 for PXP_X8 atoms (Catalano et al., 7 Oct 2025).
  • Entropic Regularization: Sinkhorn’s algorithm solves regularized OT with complexity PXP_X9 per PYP_Y0-accuracy (Paty et al., 2019, Catalano et al., 7 Oct 2025).
  • Frank–Wolfe with Eigen-Decomposition: For subspace-robust Wasserstein distances, maximization over low-rank projections is carried out using Frank–Wolfe and eigen-decomposition, keeping computation tractable in moderate dimensions (PYP_Y1) (Paty et al., 2019).

4. Specialized Distance-Maximizing Constructions

Wasserstein Distance Maximizing Intrinsic Control

In reinforcement learning, the “Wasserstein Distance Maximizing Intrinsic Control” (WIC) objective,

PYP_Y2

formalizes the goal of skill learning by maximizing the expected Wasserstein distance between the start-state Dirac measure and the skill-conditioned state visitation measure. This differs from mutual information–based methods (VIC/DIAYN) by explicitly incentivizing covering maximal state-space distance, rather than mere distinguishability. The dual is estimated via a neural 1-Lipschitz critic, which provides a shaped reward for policy optimization. Empirical results in various Atari environments demonstrate that WIC yields coverage and returns superior to KL-based diversity baselines, especially in environments with sparse or exploratory tasks (Durugkar et al., 2021).

Subspace Robust Wasserstein Distances

Robustness and interpretability in high dimensions are enhanced by the subspace-robust Wasserstein (SRW) distance:

PYP_Y3

where PYP_Y4 is the second-moment matrix under coupling PYP_Y5 and PYP_Y6 its ordered eigenvalues. By focusing on the PYP_Y7 largest variance directions, SRW down-weights noise and reveals the intrinsic dimensionality of dependence. It induces an increasing and concave PYP_Y8, with an observable “elbow” at the effective subspace dimension of dependency (Paty et al., 2019).

Wasserstein Dependency Measures for Representation Learning

For unsupervised representation learning, replacing the KL-divergence in mutual information objectives with the Wasserstein distance yields the Wasserstein Dependency Measure (WDM):

PYP_Y9

Practical estimation leverages 1-Lipschitz neural critics with gradient penalty. Unlike KL-based bounds, which suffer from exponential sample complexity in high mutual information, WDM-based estimation provides stable and complete representation learning, as confirmed in multi-factor synthetic and image-based benchmark tasks (Ozair et al., 2019).

5. Wasserstein Dependence Indices for Random Measures

The Wasserstein index of dependence for random measures, pp0, is defined for infinitely active, completely random vectors (CRVs) on a Polish space, parameterized by Lévy measures. Given pp1 (the joint Lévy measure), pp2 (comonotonic dependence, i.e., on the diagonal), and pp3 (independence, i.i.d.), the index is

pp4

where pp5 is an extended Wasserstein metric between possibly infinite-mass Lévy measures, allowing coupling mass to/from the null vector. pp6 uniquely characterizes independence (pp7) and maximal dependence (pp8), and is jointly sensitive to pp9 random measures without reduction to pairwise metrics. Numerical computation involves one-dimensional quadratures over transformations of the Lévy measure’s marginal and sum distributions, enabling practical criterion-based tuning of dependence in prior specification and fair comparison of Bayesian nonparametric models (Catalano et al., 2021).

6. Practical Applications and Empirical Results

Applications of distance-maximizing Wasserstein dependency measures include:

  • Skill discovery and exploration in RL: Wasserstein-based objectives provide effective intrinsic control mechanisms, outperforming mutual-information approaches in diverse environments (Durugkar et al., 2021).
  • Robust high-dimensional inference: Subspace robust distances stabilize dependency measurement and clustering under noise and irrelevant features (Paty et al., 2019).
  • Representation learning: Wasserstein dependency objectives yield more expressive features for downstream prediction, especially when standard mutual information maximization underfits due to insufficient sample complexity (Ozair et al., 2019).
  • Random measure modeling in Bayesian nonparametrics: The Wasserstein index of dependence enables prior specification and model comparison on a common dependence scale, confirmed by posterior comparisons when priors are matched for μ,ν\mu, \nu0 (Catalano et al., 2021).
  • Algebraic and geometric study of dependence: In discrete models, the explicit polyhedral and algebraic structure of the distance-to-independence functional allows detailed combinatorial and analytic investigation of dependency (Çelik et al., 2020).
  • General statistical dependency measure: Wasserstein-based indices systematically generalize classical correlation by quantifying both weak and strong, linear and nonlinear, dependence across a broad range of metric spaces (Catalano et al., 7 Oct 2025).

7. Limitations and Open Directions

Challenges include:

  • The need to solve nontrivial saddle point problems or to navigate the combinatorial complexity of polyhedral structures in large finite spaces.
  • The bias-variance tradeoff in entropic regularization and the selection of regularization parameters.
  • The enforcement of strict 1-Lipschitzness in deep neural critics remains practically limited; improved regularization techniques are sought.
  • Dependence indices are sensitive to the choice of ground metric μ,ν\mu, \nu1, which must be defensibly specified for application domains.
  • Computation of the full algebraic structure of dependency measures becomes demanding at scale, and further work seeks to optimize such algorithms and assess empirical degeneracy in practice (Paty et al., 2019, Çelik et al., 2020, Ozair et al., 2019, Catalano et al., 2021, Catalano et al., 7 Oct 2025).

Distance-maximizing Wasserstein dependency measures thus provide a unified, flexible, and metrically faithful approach to quantifying and optimizing dependence in diverse statistical and machine learning contexts, underpinned by principle, computable convex duality, and distinct geometric and probabilistic interpretations.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distance-Maximizing (Wasserstein Dependency Measures).