Papers
Topics
Authors
Recent
Search
2000 character limit reached

Koopman Spectral Wasserstein Gradient Descent

Updated 28 December 2025
  • KSWGD is a training-free, particle-based generative modeling method that leverages Koopman spectral techniques and optimal transport to drive particles toward unknown target distributions.
  • It approximates the inverse Langevin generator via a data-driven, finite-rank spectral surrogate, ensuring constant dissipation rates and accelerated convergence even in high dimensions.
  • Empirical results show that KSWGD achieves full support coverage and linear KL decay across benchmarks like S¹ uniform sampling, quadruple well, and MNIST latent generation.

Koopman Spectral Wasserstein Gradient Descent (KSWGD) is a training-free, particle-based generative modeling methodology uniting operator-theoretic spectral analysis with variational optimal transport theory. At its core, KSWGD leverages trajectory or time-series data to approximate the infinitesimal generator of overdamped Langevin dynamics through Koopman spectral techniques, and subsequently drives particles deterministically along a preconditioned Wasserstein gradient flow toward an unknown target distribution, achieving accelerated convergence without explicit knowledge of the target potential or reliance on neural network training (Xu et al., 21 Dec 2025).

1. Definition and Theoretical Framework

KSWGD targets generative sampling problems where only samples (possibly arranged temporally) from an unknown distribution π(x)exp(V(x))\pi(x) \propto \exp(-V(x)) are available. The goal is to transform an initial empirical measure μ0\mu_0 to approximate π\pi by discretizing the χ2\chi^2–Wasserstein gradient flow:

tμt=div(μtκ(ρt)),ρt=dμtdπ\partial_t \mu_t = \operatorname{div} \left( \mu_t \nabla \kappa(\rho_t) \right), \quad \rho_t = \frac{d\mu_t}{d\pi}

Here, κ\kappa corresponds to the inverse Langevin generator L1L^{-1}. The key innovation of KSWGD is to approximate κ\kappa in a fully data-driven and finite-rank manner using Koopman spectral methods (such as Extended Dynamic Mode Decomposition, EDMD), resulting in a preconditioned flow with constant dissipation rate even in high dimensions. This approach operationalizes the same mathematical foundation as Laplacian-Adjusted Wasserstein Gradient Descent (LAWGD) but circumvents the need for target potential access or score network training.

The algorithm proceeds as follows:

  • Estimate the leading rr eigenpairs of the generator from time-ordered trajectory pairs.
  • Build a truncated spectral surrogate for the inverse Langevin operator.
  • Update MM particles along this Koopman-preconditioned Wasserstein gradient flow.

2. Mathematical Formulation and Algorithm

Wasserstein Gradient Flow and Spectral Preconditioning

The evolution of μt\mu_t under the χ2\chi^2–Wasserstein gradient flow in (P2(Rd),W2)(\mathcal{P}_2(\mathbb{R}^d), W_2) is characterized by the velocity field

vt(x)=(ρt(x)1)=2dμtdπv_t(x) = -\nabla(\rho_t(x) - 1) = -2\nabla \frac{d\mu_t}{d\pi}

with preconditioning (as in LAWGD) by L1L^{-1} leading to:

tμt=div(μtKπ(ρt)),Kπ=L1=i=1φi()φi()λi\partial_t \mu_t = \operatorname{div} \left( \mu_t \nabla K_\pi(\rho_t) \right), \quad K_\pi = L^{-1} = \sum_{i=1}^\infty \frac{\varphi_i(\cdot)\varphi_i(\cdot)}{\lambda_i}

where {(λi,φi)}\{(\lambda_i, \varphi_i)\} are eigenpairs of the Langevin generator L=Δ+VL = -\Delta + \nabla V \cdot \nabla (self-adjoint on Lπ2L^2_\pi).

Koopman Spectral Approximation

Recognizing that the Koopman (backward Kolmogorov) generator A=LA = -L, KSWGD uses data-driven spectral approximation from trajectory pairs (zj,zj+)(z_j, z_j^+) by EDMD to identify leading rr empirical eigenpairs {(λ^i,φ^i)}i=1r\{(\hat{\lambda}_i, \hat{\varphi}_i)\}_{i=1}^r of A^N-\hat{A}_N, yielding the truncated inverse:

K^rf=i=1r1λ^if,φ^iπφ^i\widehat{K}_r f = \sum_{i=1}^r \frac{1}{\hat{\lambda}_i} \langle f, \hat{\varphi}_i \rangle_\pi \hat{\varphi}_i

Discrete Particle Update

Given MM particles {xt(i)}\{x_t^{(i)}\}, a single KSWGD step of size hh is:

xt+1(i)=xt(i)hMj=1M1KK^r(xt(i),xt(j))x_{t+1}^{(i)} = x_t^{(i)} - \frac{h}{M} \sum_{j=1}^M \nabla_1 K_{\widehat K_r}(x_t^{(i)}, x_t^{(j)})

where

KK^r(x,y)=k=1rφ^k(x)φ^k(y)λ^kK_{\widehat K_r}(x, y) = \sum_{k=1}^r \frac{\hat{\varphi}_k(x) \hat{\varphi}_k(y)}{\hat{\lambda}_k}

and 1\nabla_1 denotes gradient with respect to the first argument.

Pseudocode

The methodology splits naturally into Offline and Online phases:

Step Description Computational Cost
Offline (Spectral Est.) Dictionary selection, data matrix formation, EDMD eigenproblem, obtain leading (λ^i,φ^i)(\hat{\lambda}_i, \hat{\varphi}_i) O(n3)O(n^3) (for dictionary size nn)
Online (Updates) Iteratively update particles using Koopman-preconditioned flow O(M2rd)O(M^2 r d) per iteration

Auxiliary notes:

  • Computing eigenfunction gradients φ^k(x)\nabla \hat{\varphi}_k(x) may require analytic expressions or finite-difference approximations.
  • The bias-variance trade-off is controlled by rank rr (truncation) and step size hh.

3. Theoretical Properties

Spectral Preconditioning and Dissipation

The preconditioned flow with truncated spectral surrogate KrK_r satisfies:

tμt=div(μtKrρt),Kr=i=1rφi,φi/λi\partial_t \mu_t = \operatorname{div} (\mu_t \nabla K_r \rho_t),\quad K_r = \sum_{i=1}^r \varphi_i \langle \cdot, \varphi_i \rangle / \lambda_i

with dissipation identity:

ddtKL(μtπ)=Πr(ρt1)Lπ22\frac{d}{dt} KL(\mu_t\,\|\,\pi) = -\|\Pi_r(\rho_t - 1)\|^2_{L^2_\pi}

where Πr\Pi_r is projection onto the retained rr-dimensional eigenspace.

Under mild regularity and a tail bound ηr\eta_r, the ideal convergence rate is:

KL(μtπ)KL(μ0π)et+ηr2(1et)KL(\mu_t\,\|\,\pi) \le KL(\mu_0\,\|\,\pi) e^{-t} + \eta_r^2(1-e^{-t})

Data-Driven Error Bounds

If the Koopman spectral approximation error meets:

ft,δr(ft)πϵrftLπ22|\langle f_t, \delta_r(f_t) \rangle_\pi| \le \epsilon_r \|f_t\|_{L^2_\pi}^2

then

KL(μtπ)e(1ϵr)tKL(μ0π)+ηr21ϵr(1e(1ϵr)t)KL(\mu_t\,\|\,\pi) \le e^{-(1-\epsilon_r)t} KL(\mu_0\,\|\,\pi) + \frac{\eta_r^2}{1-\epsilon_r} (1 - e^{-(1-\epsilon_r)t})

Discrete-time convergence in the Approximate Gradient Flow (AGF) setting yields, for step size hh:

KL(μt+1π)(1αh)KL(μtπ)+hβ+O(h2),  with α=1ϵr, β=ηr2KL(\mu_{t+1}\,\|\,\pi) \le (1-\alpha h) KL(\mu_t\,\|\,\pi) + h\beta + O(h^2),~~ \text{with}~ \alpha = 1-\epsilon_r,~\beta = \eta_r^2

implying geometric decay to bias β/α\beta/\alpha modulated by O(h)O(h).

Feynman–Kac Perspective

With U0U \equiv 0 in the Feynman–Kac formula,

v(x,t)=E[f(Xt)X0=x]v(x, t) = \mathbb{E}[f(X_t) \mid X_0 = x]

the Koopman semigroup encodes unconditional observable expectations, underlining the probabilistic foundation of KSWGD sampling. Extension to U0U \neq 0 would encompass conditional or rare-event inference.

4. Experimental Validation and Benchmarking

KSWGD's empirical performance was examined across diverse generative modeling milieus:

Task/Dataset Particles Koopman Method Key Metric(s)
S1^1 uniform sampling 700 kernel-EDMD (RBF/poly) KL decay, movement rate, coverage
Quadruple well 500 SDMD (neural dict.) KL, well coverage, movement rate
MNIST (latent) 64 CNN+EDMD dict. learning Visual sample, KL, χ2\chi^2 divergence
Allen–Cahn SPDE 150 EDMD (poly features) Visual fidelity, distributional prediction

Baselines examined include DMPS (diffusion maps), DDPM, VAE, RealNVP, WGAN-GP. Notable empirical results:

  • On S1^1–uniform and quadruple well, KSWGD achieved full support coverage in <500<500 iterations, while DMPS required 2000\gtrsim 2000.
  • Empirical KL decay was linear, validating theoretical predictions.
  • On MNIST latent code, KSWGD produced discernible digit samples; DMPS failed under parallel conditions.
  • On Allen–Cahn SPDE, KSWGD matched or surpassed DDPM, VAE, normalizing flows, and GANs in latent space sample quality.

5. Computational and Practical Considerations

Key computational and modeling constraints include:

  • Offline spectral estimation incurs O(n3)O(n^3) eigendecomposition, while each online step scales as O(M2rd)O(M^2 r d).
  • Gradients of eigenfunctions φ^k(x)\nabla \hat{\varphi}_k(x) may demand basis-specific analytic or finite-difference evaluation.
  • Bias-variance trade-offs hinge on truncation rank rr (reducing ηr\eta_r at increased computational cost) and step size hh (smaller hh reduces O(h)O(h) discretization bias but necessitates longer runs).
  • Quality of latent autoencoders or dictionary selection directly bounds generative accuracy in high-dimensional settings.
  • Assumptions of ergodicity and self-adjointness of the generator are essential; extensions to non-reversible dynamics necessitate additional research.

6. Connections, Scope, and Limitations

KSWGD synthesizes the spectral guarantees of LAWGD with the data-driven practicality of EDMD, providing a theoretically justified and computationally accessible recipe for generative particle-based sampling. It eliminates the need for explicit potential evaluation or neural-network-based score learning. Current limitations encompass the necessity for high-quality spectral/dictionary approximations, computational scaling with rank and particle count, and the restriction to detailed-balance (oscillating) dynamical systems. Future work may address extensions to irreversible (non-detailed-balance) generators and autonomous dictionary adaptation (Xu et al., 21 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Koopman Spectral Wasserstein Gradient Descent (KSWGD).