Projected Wasserstein Gradient Flows

Updated 27 January 2026

Projected Wasserstein gradient flows are methods that constrain the evolution of probability measures to finite-dimensional statistical manifolds, mitigating the curse of dimensionality.
They employ projections via linear subspace, neural parameterization, or model-family techniques, integrating Wasserstein and geometric metrics for robust scalable inference.
The approach guarantees convergence and error bounds, making it effective for high-dimensional Bayesian inverse problems, filtering, and deep generative modeling.

Projected Wasserstein gradient flows are a class of numerical and analytical methods that address the curse of dimensionality and model-manifold constraints in the computation of Wasserstein gradient flows (WGFs). These methods restrict the evolution of probability measures—originally governed by infinite-dimensional PDEs in the space of measures with finite second moment and 2-Wasserstein metric—onto finite-dimensional submanifolds or parametric families. This projection is achieved either by subspace restriction, neural parameterization, or explicit projection onto a statistical model family, and is equipped with Riemannian or information-geometric metrics consistent with the original Wasserstein structure. Projected Wasserstein gradient flows have become central in scalable Bayesian inference, filtering on nonlinear manifolds, discrete random variable estimation, and neural network–assisted transport and mean-field models.

1. Theoretical Formulation: Geometry, Projection, and Flow Construction

The underlying principle of projected Wasserstein gradient flows is to formulate the evolution of measures as steepest descent flows of an energy functional $E[\rho]$ under the 2-Wasserstein metric, but constrained to a finite-dimensional statistical manifold or parametric family. In the absence of constraints, the standard gradient flow of the KL energy,

$E[\rho] = D_{KL}(\rho \| \pi) = \int \rho(x)\left(\log\rho(x) - \log\pi(x)\right)\,dx,$

generates the continuity equation,

$\partial_t\rho_t + \nabla\cdot(\rho_t v_t) = 0, \qquad v_t(x) = \nabla \log \pi(x) - \nabla \log \rho_t(x).$

The projection may be realized in several forms:

(a) Linear Subspace Projection: A low-rank subspace of $\mathbb{R}^d$ is selected via a Fisher-information–informed eigenproblem, and the dynamics are projected onto this subspace using $P_r = \Psi_r\Psi_r^\top$ , where $\Psi_r$ collects the principal directions.
(b) Parametric/Neural Family Projection: The transport map $T_\theta$ is parameterized (e.g., by neural networks or basis expansions), and the flow for $\rho_t$ is induced via the pushforward $\rho_\theta = T_{\theta\sharp} \rho_{\mathrm{ref}}$ . The gradient flow is pulled back to the parameter space using either an exact or a relaxed pullback Wasserstein metric, as in

$\hat{G}(\theta)_{ij} = \int [\partial_{\theta_i}T_\theta(z)]^\top [\partial_{\theta_j}T_\theta(z)]\, d\lambda(z).$

(c) Model Manifold Projection (e.g., Discrete Families): After an unconstrained Wasserstein flow step, projection is carried out under the $W_2$ metric or a surrogate (such as MMD for infinite support), ensuring the updated measure stays within the model family.

2. Algorithmic Implementations and Computational Workflow

Implementation varies depending on the chosen projection scheme:

Linear Subspace (pWGD) Implementation (Wang et al., 2021):

At regular intervals, the Fisher matrix $H$ is computed from particle gradients and its principal eigenvectors $\{\psi_i\}_{i=1}^r$ are used to define a projection $\Psi_r$ .
Particles $\{x_i^k\}$ are projected to reduced coordinates $\{w_i^k = \Psi_r^\top x_i^k\}$ .
In the reduced space, the density $\rho_k^r$ is approximated by KDE, and gradients are computed as

$\nabla_w\log\rho_k^r(w) = \frac{\sum_j \nabla_w K_h(w-w_j)}{\sum_j K_h(w-w_j)}.$

Each iteration updates particles via

$w_i^{k+1} = w_i^k + \alpha_k [\nabla_w\log\pi^r(w_i^k) - \nabla_w\log\rho_k^r(w_i^k)],$

then lifts them back to full space as $x_i^{k+1} = \Psi_r w_i^{k+1} + x_i^{k,\perp}$ .

Neural Parameterization and Pullback Metric (Zuo et al., 2024, Jin et al., 2024):

The transport map $T_\theta$ (neural network) pushes forward a reference law $p_r$ .
Gradient flow is pulled back to parameter space using the Gram (information) matrix $G_W$ :

$d\theta/dt = -G_W(\theta)^{-1}\nabla_\theta F(\theta),$

discretized as

$\theta_{n+1} = \theta_n - \tau \, G_W(\theta_n)^{-1} \nabla_\theta F(\theta_n).$

Here, $\nabla_\theta F$ is computed via Monte Carlo expectations over $p_r$ .

Projection in Discrete Families (Cheng et al., 2019, Corenflos et al., 2023):

A Wasserstein (or surrogate) gradient flow step is performed, followed by projection onto a parametric family (e.g., Bernoulli or Gaussian) by minimizing Wasserstein distance or an alternative discrepancy (such as kernel MMD).

3. Statistical and Computational Properties

Projected Wasserstein flows provide rigorous statistical and computational guarantees:

Curse of Dimensionality Mitigation: For kernel density estimation in high dimension $d$ , the MSE of the log-density gradient is $O(N^{-4/(4+d)})$ , which is reduced to $O(N^{-4/(4+r)})$ after projection to $r \ll d$ dimensions (Wang et al., 2021). The effective sample complexity is thereby reduced exponentially in $d-r$ .
Error Bounds: The KL divergence of the posterior approximation via optimal-profile projection satisfies

$D_{KL}(\pi \| \pi^{\mathrm{proj}}) \leq \frac{\gamma}{2} \sum_{i>r} \lambda_i$

in terms of decaying Fisher-matrix eigenvalues.

Convergence: Under $\mu$ -strong log-concavity, both full and projected flows exhibit exponential decay of KL divergence:

$D_{KL}(\rho_t \| \pi) \leq e^{-2\mu t}D_{KL}(\rho_0 \| \pi), \qquad D_{KL}(\rho_t^r \| \pi^r) \leq e^{-2\mu^r t}D_{KL}(\rho_0^r \| \pi^r).$

For neural-projected flows and parameterized Wasserstein gradient flow (PWGF) (Jin et al., 2024), explicit $W_2$ -error and Polyak–Łojasiewicz–based exponential convergence bounds are proven.

Well-posedness and Consistency: Neural projections with ReLU networks are shown to yield global uniqueness and non-collapse of the map mesh. Truncation error is $O(m^{-p})$ in the number of neurons, $p=1,2$ depending on the parameterization (Zuo et al., 2024).

4. Connections to Discrete and Model-Constrained Flows

Projected Wasserstein gradient flows generalize the projection principle to non-Euclidean and manifold-constrained settings:

The Straight-Through estimator (ST) for discrete random variables is an instantiation of the projected WGF, where the process consists of a Wasserstein step in the ambient continuous space and projection onto the probability simplex (Cheng et al., 2019). For Bernoulli and Poisson families, projection may be achieved via $W_2$ or kernel-MMD, with closed-form or low-variance estimators for the relevant gradients.
In Gaussian filtering, the innovation step of the Bayesian filter is reformulated as a WGF of the KL energy projected onto the Gaussian family, yielding explicit ODEs for the mean and covariance (Corenflos et al., 2023). Higher-order moment or mixture models can be handled by extending the projection manifold to mixtures.

5. Applications and Empirical Performance

Projected Wasserstein gradient flows are applied across high-dimensional Bayesian inference, variational filtering, and scientific ML:

Bayesian Inverse Problems: For linear/nonlinear inverse problems and PDE-constrained inference, pWGD achieves accuracy and convergence rates independent of ambient dimension $d$ due to the rapid decay of Fisher matrix eigenvalues (typically $r\approx8$ –10) (Wang et al., 2021).
Parametric and Neural Model Approximation: Neural-projected schemes accurately resolve Fokker–Planck, porous-medium, and aggregation flows in dimensions up to $d=30$ using deep normalizing flows or shallow MLPs, with substantial speedup over PDE-based methods (Zuo et al., 2024, Jin et al., 2024).
Filter and Variational Inference: The variational Gaussian filter based on projected Wasserstein flow demonstrates robust performance on problems with multi-modal or non-Gaussian posteriors (Corenflos et al., 2023).
Discrete Latent Variable Learning: pWGF-based estimators outperform or match ST and related methods, especially for infinite-support discrete families, with lower variance and rapid convergence (Cheng et al., 2019).

Typical experiments report:

Exponential decay of KL-proxy and step-norm residuals at convergence rates matching strong log-concavity.
Maintenance of credible set coverage and sample spread in high-dimensional real datasets (COVID-19 social-mobility example at $d=96$ ) (Wang et al., 2021).
Efficient scalability with respect to both $N$ (particles/samples) and computation cores.

6. Open Problems and Frontiers

Current projected Wasserstein gradient flow methodology leaves several directions open:

Extension to stochastic and adaptive projections, including online selection of subspaces and adaptive model families.
Exploration of implicit pullback metrics and higher-order integration for sharper error guarantees in neural-parametric settings.
Characterization of projection-induced bias and convergence in non-log-concave or sharply multimodal posterior landscapes.
Unification with information geometric flows and interacting particle systems in large-scale scientific and engineering inference.

7. Summary Table: Core Forms of Projected Wasserstein Gradient Flow

Method	Projection Target	Key Computational Step
Subspace (pWGD)	Linear subspace $\mathbb{R}^r$	Fisher-eigenvector; low-rank KDE
Neural parametrization	Neural-manifold (NN)	Gram matrix $G_W$ , natural gradient
Model-family (ST/pWGF)	Simplex, Poisson, Gaussian, mixture	W2, MMD, or moment-matching projection
Gaussian filtering	Gaussian (mean/cov)	Moment ODEs under Wasserstein geometry

Projected Wasserstein gradient flows provide a principled, geometrically consistent means to reduce infinite-dimensional variational transport to scalable, practical algorithms for Bayesian inference, scientific computation, and deep generative modeling, with convergence guarantees and statistical-optimality properties grounded in the underlying geometry and projection schemes (Wang et al., 2021, Zuo et al., 2024, Jin et al., 2024, Cheng et al., 2019, Corenflos et al., 2023).