Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows

Published 6 Jun 2025 in stat.ME, math.NA, stat.CO, and stat.ML | (2506.05905v1)

Abstract: We consider the problem of sampling from a probability distribution $π$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise the Kullback--Leibler divergence from $π$. We consider several partial differential equations (PDEs) whose solution is a minimiser of the Kullback--Leibler divergence from $π$ and connect them to well-known Monte Carlo algorithms. We focus in particular on PDEs obtained by considering the Wasserstein--Fisher--Rao geometry over the space of probabilities and show that these lead to a natural implementation using importance sampling and sequential Monte Carlo. We propose a novel algorithm to approximate the Wasserstein--Fisher--Rao flow of the Kullback--Leibler divergence which empirically outperforms the current state-of-the-art. We study tempered versions of these PDEs obtained by replacing the target distribution with a geometric mixture of initial and target distribution and show that these do not lead to a convergence speed up.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel algorithm that combines Langevin dynamics and importance sampling to approximate WFR gradient flows for improved convergence.
The paper reformulates sampling as a gradient flow optimization problem over probability distributions by minimizing the KL divergence with respect to the target.
Empirical results demonstrate that the SMC-WFR method outperforms traditional approaches, especially for complex, multimodal, or non-log-concave targets.

Sequential Monte Carlo Approximations of Wasserstein--Fisher--Rao Gradient Flows

Introduction

The paper "Sequential Monte Carlo Approximations of Wasserstein--Fisher--Rao Gradient Flows" (2506.05905) presents a novel approach to sampling from a probability distribution by framing it as an optimization problem over the space of probability distributions. This involves minimizing the Kullback-Leibler (KL) divergence from a target distribution $\pi$ . The authors explore several partial differential equations (PDEs) derived from the Wasserstein--Fisher--Rao (WFR) geometry and propose a new algorithm that uses importance sampling and Sequential Monte Carlo (SMC) methods to approximate the WFR flow, showing superior empirical performance.

Gradient Flow PDEs for Sampling

The central task of sampling from a known probability distribution is reformulated as a gradient flow problem. Traditional methods like variational inference or Langevin-based algorithms relate to this perspective, where the optimization landscape is the Wasserstein space, equipped with the Wasserstein-2 distance. The gradient descent along this space is effectively captured by the Wasserstein gradient flow, modeled using the Fokker-Planck equation. Similarly, Fisher--Rao (FR) gradients utilize birth-death dynamics. The WFR gradient combines these geometries, offering enhanced convergence properties.

Figure 1: Evolution of mean, variance, and KL along different PDE flows in the 1D Gaussian case with $\mu_0(x) = \mathcal{N}(x; 0, 1)$ and $\pi(x) = \mathcal{N}(x; 20, 0.1)$ .

Convergence Properties

The convergence properties of these flows are crucial. The WFR flow, in particular, demonstrates superior convergence under less restrictive conditions compared to pure Wasserstein or Fisher--Rao flows. Notably, the WFR flow combines diffusion characteristics with gradient descent, accelerating convergence for a wider class of target distributions, especially those that are multimodal or lack log-concavity.

Approximation Algorithm

The proposed algorithm exploits the structure of SMC methods to model the WFR flow. By alternating between the Wasserstein component—approximated via Langevin dynamics—and the Fisher--Rao component—approximated through importance sampling—superior convergence speeds are achieved without the instability commonly seen in birth-death processes. Resampling techniques are incorporated to maintain sample diversity and counteract weight degeneracy.

Figure 2: Comparison of evolution of mean, variance, and KL of the exact PDE flows and approximations provided by the SMC approximation.

Tempted PDEs for Sampling

The paper critically examines the common practice of incorporating tempered dynamics in PDE flows to potentially enhance convergence. By introducing time-varying targets, the authors demonstrate that these tempered PDEs offer no speed advantage over their standard counterparts. In fact, for some applications, they could introduce biases that counteract efficient sampling.

Figure 3: Convergence of approximations to $\pi$ given by tempered and standard PDE flow with target the Gaussian mixture.

Sequential Monte Carlo Samplers

The connection between SMC samplers and gradient flows is expanded beyond theory into practical algorithmic designs. SMC methods provide a robust framework for approximating diverse gradient flows. The proposed SMC-WFR algorithm embodies this connection, ensuring convergence towards the target distribution while balancing computational cost and accuracy.

Conclusion

Overall, the paper significantly enhances the utility of WFR flows in practical sampling tasks. By unifying the SMC framework with WFR gradients, the authors provide a scalable and theoretically sound method to approximate complex distributional flows. Future research can explore other combinations of optimal transport and information geometries to further enhance the efficiency and applicability of sampling algorithms in high-dimensional spaces.

Markdown