Koopman Spectral Wasserstein Gradient Descent

Updated 28 December 2025

KSWGD is a training-free, particle-based generative modeling method that leverages Koopman spectral techniques and optimal transport to drive particles toward unknown target distributions.
It approximates the inverse Langevin generator via a data-driven, finite-rank spectral surrogate, ensuring constant dissipation rates and accelerated convergence even in high dimensions.
Empirical results show that KSWGD achieves full support coverage and linear KL decay across benchmarks like S¹ uniform sampling, quadruple well, and MNIST latent generation.

Koopman Spectral Wasserstein Gradient Descent (KSWGD) is a training-free, particle-based generative modeling methodology uniting operator-theoretic spectral analysis with variational optimal transport theory. At its core, KSWGD leverages trajectory or time-series data to approximate the infinitesimal generator of overdamped Langevin dynamics through Koopman spectral techniques, and subsequently drives particles deterministically along a preconditioned Wasserstein gradient flow toward an unknown target distribution, achieving accelerated convergence without explicit knowledge of the target potential or reliance on neural network training (Xu et al., 21 Dec 2025).

1. Definition and Theoretical Framework

KSWGD targets generative sampling problems where only samples (possibly arranged temporally) from an unknown distribution $\pi(x) \propto \exp(-V(x))$ are available. The goal is to transform an initial empirical measure $\mu_0$ to approximate $\pi$ by discretizing the $\chi^2$ –Wasserstein gradient flow:

$\partial_t \mu_t = \operatorname{div} \left( \mu_t \nabla \kappa(\rho_t) \right), \quad \rho_t = \frac{d\mu_t}{d\pi}$

Here, $\kappa$ corresponds to the inverse Langevin generator $L^{-1}$ . The key innovation of KSWGD is to approximate $\kappa$ in a fully data-driven and finite-rank manner using Koopman spectral methods (such as Extended Dynamic Mode Decomposition, EDMD), resulting in a preconditioned flow with constant dissipation rate even in high dimensions. This approach operationalizes the same mathematical foundation as Laplacian-Adjusted Wasserstein Gradient Descent (LAWGD) but circumvents the need for target potential access or score network training.

The algorithm proceeds as follows:

Estimate the leading $r$ eigenpairs of the generator from time-ordered trajectory pairs.
Build a truncated spectral surrogate for the inverse Langevin operator.
Update $M$ particles along this Koopman-preconditioned Wasserstein gradient flow.

2. Mathematical Formulation and Algorithm

Wasserstein Gradient Flow and Spectral Preconditioning

The evolution of $\mu_t$ under the $\chi^2$ –Wasserstein gradient flow in $(\mathcal{P}_2(\mathbb{R}^d), W_2)$ is characterized by the velocity field

$v_t(x) = -\nabla(\rho_t(x) - 1) = -2\nabla \frac{d\mu_t}{d\pi}$

with preconditioning (as in LAWGD) by $L^{-1}$ leading to:

$\partial_t \mu_t = \operatorname{div} \left( \mu_t \nabla K_\pi(\rho_t) \right), \quad K_\pi = L^{-1} = \sum_{i=1}^\infty \frac{\varphi_i(\cdot)\varphi_i(\cdot)}{\lambda_i}$

where $\{(\lambda_i, \varphi_i)\}$ are eigenpairs of the Langevin generator $L = -\Delta + \nabla V \cdot \nabla$ (self-adjoint on $L^2_\pi$ ).

Koopman Spectral Approximation

Recognizing that the Koopman (backward Kolmogorov) generator $A = -L$ , KSWGD uses data-driven spectral approximation from trajectory pairs $(z_j, z_j^+)$ by EDMD to identify leading $r$ empirical eigenpairs $\{(\hat{\lambda}_i, \hat{\varphi}_i)\}_{i=1}^r$ of $-\hat{A}_N$ , yielding the truncated inverse:

$\widehat{K}_r f = \sum_{i=1}^r \frac{1}{\hat{\lambda}_i} \langle f, \hat{\varphi}_i \rangle_\pi \hat{\varphi}_i$

Discrete Particle Update

Given $M$ particles $\{x_t^{(i)}\}$ , a single KSWGD step of size $h$ is:

$x_{t+1}^{(i)} = x_t^{(i)} - \frac{h}{M} \sum_{j=1}^M \nabla_1 K_{\widehat K_r}(x_t^{(i)}, x_t^{(j)})$

where

$K_{\widehat K_r}(x, y) = \sum_{k=1}^r \frac{\hat{\varphi}_k(x) \hat{\varphi}_k(y)}{\hat{\lambda}_k}$

and $\nabla_1$ denotes gradient with respect to the first argument.

Pseudocode

The methodology splits naturally into Offline and Online phases:

Step	Description	Computational Cost
Offline (Spectral Est.)	Dictionary selection, data matrix formation, EDMD eigenproblem, obtain leading $(\hat{\lambda}_i, \hat{\varphi}_i)$	$O(n^3)$ (for dictionary size $n$ )
Online (Updates)	Iteratively update particles using Koopman-preconditioned flow	$O(M^2 r d)$ per iteration

Auxiliary notes:

Computing eigenfunction gradients $\nabla \hat{\varphi}_k(x)$ may require analytic expressions or finite-difference approximations.
The bias-variance trade-off is controlled by rank $r$ (truncation) and step size $h$ .

3. Theoretical Properties

Spectral Preconditioning and Dissipation

The preconditioned flow with truncated spectral surrogate $K_r$ satisfies:

$\partial_t \mu_t = \operatorname{div} (\mu_t \nabla K_r \rho_t),\quad K_r = \sum_{i=1}^r \varphi_i \langle \cdot, \varphi_i \rangle / \lambda_i$

with dissipation identity:

$\frac{d}{dt} KL(\mu_t\,\|\,\pi) = -\|\Pi_r(\rho_t - 1)\|^2_{L^2_\pi}$

where $\Pi_r$ is projection onto the retained $r$ -dimensional eigenspace.

Under mild regularity and a tail bound $\eta_r$ , the ideal convergence rate is:

$KL(\mu_t\,\|\,\pi) \le KL(\mu_0\,\|\,\pi) e^{-t} + \eta_r^2(1-e^{-t})$

Data-Driven Error Bounds

If the Koopman spectral approximation error meets:

$|\langle f_t, \delta_r(f_t) \rangle_\pi| \le \epsilon_r \|f_t\|_{L^2_\pi}^2$

then

$KL(\mu_t\,\|\,\pi) \le e^{-(1-\epsilon_r)t} KL(\mu_0\,\|\,\pi) + \frac{\eta_r^2}{1-\epsilon_r} (1 - e^{-(1-\epsilon_r)t})$

Discrete-time convergence in the Approximate Gradient Flow (AGF) setting yields, for step size $h$ :

$KL(\mu_{t+1}\,\|\,\pi) \le (1-\alpha h) KL(\mu_t\,\|\,\pi) + h\beta + O(h^2),~~ \text{with}~ \alpha = 1-\epsilon_r,~\beta = \eta_r^2$

implying geometric decay to bias $\beta/\alpha$ modulated by $O(h)$ .

Feynman–Kac Perspective

With $U \equiv 0$ in the Feynman–Kac formula,

$v(x, t) = \mathbb{E}[f(X_t) \mid X_0 = x]$

the Koopman semigroup encodes unconditional observable expectations, underlining the probabilistic foundation of KSWGD sampling. Extension to $U \neq 0$ would encompass conditional or rare-event inference.

4. Experimental Validation and Benchmarking

KSWGD's empirical performance was examined across diverse generative modeling milieus:

Task/Dataset	Particles	Koopman Method	Key Metric(s)
S $^1$ uniform sampling	700	kernel-EDMD (RBF/poly)	KL decay, movement rate, coverage
Quadruple well	500	SDMD (neural dict.)	KL, well coverage, movement rate
MNIST (latent)	64	CNN+EDMD dict. learning	Visual sample, KL, $\chi^2$ divergence
Allen–Cahn SPDE	150	EDMD (poly features)	Visual fidelity, distributional prediction

Baselines examined include DMPS (diffusion maps), DDPM, VAE, RealNVP, WGAN-GP. Notable empirical results:

On S $^1$ –uniform and quadruple well, KSWGD achieved full support coverage in $<500$ iterations, while DMPS required $\gtrsim 2000$ .
Empirical KL decay was linear, validating theoretical predictions.
On MNIST latent code, KSWGD produced discernible digit samples; DMPS failed under parallel conditions.
On Allen–Cahn SPDE, KSWGD matched or surpassed DDPM, VAE, normalizing flows, and GANs in latent space sample quality.

5. Computational and Practical Considerations

Key computational and modeling constraints include:

Offline spectral estimation incurs $O(n^3)$ eigendecomposition, while each online step scales as $O(M^2 r d)$ .
Gradients of eigenfunctions $\nabla \hat{\varphi}_k(x)$ may demand basis-specific analytic or finite-difference evaluation.
Bias-variance trade-offs hinge on truncation rank $r$ (reducing $\eta_r$ at increased computational cost) and step size $h$ (smaller $h$ reduces $O(h)$ discretization bias but necessitates longer runs).
Quality of latent autoencoders or dictionary selection directly bounds generative accuracy in high-dimensional settings.
Assumptions of ergodicity and self-adjointness of the generator are essential; extensions to non-reversible dynamics necessitate additional research.

6. Connections, Scope, and Limitations

KSWGD synthesizes the spectral guarantees of LAWGD with the data-driven practicality of EDMD, providing a theoretically justified and computationally accessible recipe for generative particle-based sampling. It eliminates the need for explicit potential evaluation or neural-network-based score learning. Current limitations encompass the necessity for high-quality spectral/dictionary approximations, computational scaling with rank and particle count, and the restriction to detailed-balance (oscillating) dynamical systems. Future work may address extensions to irreversible (non-detailed-balance) generators and autonomous dictionary adaptation (Xu et al., 21 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Generative Modeling through Spectral Analysis of Koopman Operator (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Koopman Spectral Wasserstein Gradient Descent (KSWGD).