- The paper presents a novel algorithm that combines Langevin dynamics and importance sampling to approximate WFR gradient flows for improved convergence.
- The paper reformulates sampling as a gradient flow optimization problem over probability distributions by minimizing the KL divergence with respect to the target.
- Empirical results demonstrate that the SMC-WFR method outperforms traditional approaches, especially for complex, multimodal, or non-log-concave targets.
Sequential Monte Carlo Approximations of Wasserstein--Fisher--Rao Gradient Flows
Introduction
The paper "Sequential Monte Carlo Approximations of Wasserstein--Fisher--Rao Gradient Flows" (2506.05905) presents a novel approach to sampling from a probability distribution by framing it as an optimization problem over the space of probability distributions. This involves minimizing the Kullback-Leibler (KL) divergence from a target distribution π. The authors explore several partial differential equations (PDEs) derived from the Wasserstein--Fisher--Rao (WFR) geometry and propose a new algorithm that uses importance sampling and Sequential Monte Carlo (SMC) methods to approximate the WFR flow, showing superior empirical performance.
Gradient Flow PDEs for Sampling
The central task of sampling from a known probability distribution is reformulated as a gradient flow problem. Traditional methods like variational inference or Langevin-based algorithms relate to this perspective, where the optimization landscape is the Wasserstein space, equipped with the Wasserstein-2 distance. The gradient descent along this space is effectively captured by the Wasserstein gradient flow, modeled using the Fokker-Planck equation. Similarly, Fisher--Rao (FR) gradients utilize birth-death dynamics. The WFR gradient combines these geometries, offering enhanced convergence properties.


Figure 1: Evolution of mean, variance, and KL along different PDE flows in the 1D Gaussian case with μ0(x)=N(x;0,1) and π(x)=N(x;20,0.1).
Convergence Properties
The convergence properties of these flows are crucial. The WFR flow, in particular, demonstrates superior convergence under less restrictive conditions compared to pure Wasserstein or Fisher--Rao flows. Notably, the WFR flow combines diffusion characteristics with gradient descent, accelerating convergence for a wider class of target distributions, especially those that are multimodal or lack log-concavity.
Approximation Algorithm
The proposed algorithm exploits the structure of SMC methods to model the WFR flow. By alternating between the Wasserstein component—approximated via Langevin dynamics—and the Fisher--Rao component—approximated through importance sampling—superior convergence speeds are achieved without the instability commonly seen in birth-death processes. Resampling techniques are incorporated to maintain sample diversity and counteract weight degeneracy.


Figure 2: Comparison of evolution of mean, variance, and KL of the exact PDE flows and approximations provided by the SMC approximation.
Tempted PDEs for Sampling
The paper critically examines the common practice of incorporating tempered dynamics in PDE flows to potentially enhance convergence. By introducing time-varying targets, the authors demonstrate that these tempered PDEs offer no speed advantage over their standard counterparts. In fact, for some applications, they could introduce biases that counteract efficient sampling.


Figure 3: Convergence of approximations to π given by tempered and standard PDE flow with target the Gaussian mixture.
Sequential Monte Carlo Samplers
The connection between SMC samplers and gradient flows is expanded beyond theory into practical algorithmic designs. SMC methods provide a robust framework for approximating diverse gradient flows. The proposed SMC-WFR algorithm embodies this connection, ensuring convergence towards the target distribution while balancing computational cost and accuracy.
Conclusion
Overall, the paper significantly enhances the utility of WFR flows in practical sampling tasks. By unifying the SMC framework with WFR gradients, the authors provide a scalable and theoretically sound method to approximate complex distributional flows. Future research can explore other combinations of optimal transport and information geometries to further enhance the efficiency and applicability of sampling algorithms in high-dimensional spaces.