Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variational Flow Matching (VFM)

Updated 25 January 2026
  • Variational Flow Matching (VFM) is a probabilistic framework that recasts flow matching as a variational inference problem to model transitions between simple and complex distributions.
  • It learns a time-dependent vector field via a variational posterior, enabling effective interpolation and control across Euclidean, discrete, and geometric domains.
  • VFM extends to multimodal and controlled generation, demonstrating practical benefits in applications such as graph generation, robot manipulation, and vector-quantized image synthesis.

Variational Flow Matching (VFM, also referred to as VFP)

Variational Flow Matching (VFM), sometimes denoted as Variational Flow Policy (VFP) in control settings, is a probabilistic framework for generative modeling and control that recasts classical flow matching (FM) as a variational inference problem. By learning a time-dependent vector field via a variational approximation to pathwise posteriors, VFM generalizes conditional flow matching (CFM), enables natural extensions to discrete and geometric domains, multimodal transport, uncertainty quantification, and controlled or equivariant generation. VFM has been instantiated in a wide range of domains, including graph and tabular data, vector-quantized images, Riemannian manifolds, robot manipulation, and generative flow networks.

1. Core Mathematical Formulation

Let p0p_0 be a simple source distribution, p1p_1 a complex data or target distribution, and define an interpolation path xtx_t between source and target: xt=αtx1+σtx0,t[0,1],x0p0,x1p1x_t = \alpha_t x_1 + \sigma_t x_0, \qquad t \in [0,1], \quad x_0 \sim p_0, \quad x_1 \sim p_1 The model learns a time-dependent velocity field vθ(t,x)v_\theta(t,x) (or vθ(x,t)v_\theta(x,t)) that drives a flow from p0p_0 to p1p_1 along the ODE: ddtxt=vθ(t,xt)\frac{d}{dt} x_t = v_\theta(t, x_t) In standard flow matching, the target velocity is the conditional expectation over endpoints: ut(x)=Ept(x1x)[ut(xx1)]u_t(x) = \mathbb{E}_{p_t(x_1|x)}[u_t(x|x_1)] where ut(xx1)u_t(x|x_1) is the known conditional velocity for interpolation (e.g., (x1x)/(1t)(x_1 - x)/(1-t) for OT paths).

VFM posits a parameterized variational posterior qtθ(x1x)q_t^\theta(x_1|x) and defines the learned field as: vθ(t,x)=Eqtθ(x1x)[ut(xx1)]v_\theta(t, x) = \mathbb{E}_{q_t^\theta(x_1|x)}[u_t(x|x_1)]

The VFM loss is the expected negative log-likelihood of the variational posterior under the true joint,

LVFM(θ)=Et,x1,x[logqtθ(x1x)]\mathcal{L}_{\mathrm{VFM}}(\theta) = -\mathbb{E}_{t,x_1,x}[\log q_t^\theta(x_1|x)]

This objective is equivalent to minimizing Et,xKL(pt(x1x)qtθ(x1x))\mathbb{E}_{t,x}\mathrm{KL}(p_t(x_1|x)\,\|\,q_t^\theta(x_1|x)) (Eijkelboom et al., 2024, Nasution et al., 30 Nov 2025, Guzmán-Cordero et al., 6 Jun 2025, Zaghen et al., 18 Feb 2025).

2. Methodological Extensions: Discrete, Multimodal, and Geometric Cases

Discrete and Categorical Data

For discrete or categorical domains (e.g., graphs), VFM instantiates the variational posterior as a factorized categorical distribution: qtθ(x1x)=d=1DCat(x1d  θtd(x))q_t^\theta(x_1|x) = \prod_{d=1}^D \mathrm{Cat}(x_1^d ~|~ \theta_t^d(x)) The loss simplifies to the cross-entropy between generated and true code indices or labels. The induced vector field is a linear interpolation in the simplex: vtθ,d(x)=θtd(x)xd1tv_t^{\theta,d}(x) = \frac{\theta_t^d(x) - x^d}{1-t} This principle underlies methods such as CatFlow, which achieves state-of-the-art results in molecular and graph generation (Eijkelboom et al., 2024).

Multimodal Flows and Latent Variables

Standard FM and VFM may collapse multimodal transport to a mean path. Several VFM variants introduce latent variables zz to represent mode-specific flow directions. For example, Variational Rectified Flow Matching (V-RFM) models the velocity field as a function of both the input and a latent zz drawn from a learnable posterior (Guo et al., 13 Feb 2025, Zhai et al., 3 Aug 2025): (θ,ϕ)=E(x0,x1,t)[Ezqϕ(z)vθ(xt,t,z)(x1x0)2+KL[qϕ(z)p(z)]]\ell(\theta, \phi) = \mathbb{E}_{(x_0, x_1, t)} \Big[ \mathbb{E}_{z \sim q_\phi(z|\cdot)}\|v_\theta(x_t, t, z) - (x_1 - x_0)\|^2 + \mathrm{KL}[q_\phi(z|\cdot)\parallel p(z)] \Big] This enables learning multiple plausible velocity directions at each location, critical for highly multimodal tasks such as complex robot manipulation (Zhai et al., 3 Aug 2025).

Geometric and Riemannian Domains

RG-VFM generalizes VFM to Riemannian manifolds, employing a Riemannian Gaussian as the variational posterior, with a geometry-respecting metric (Zaghen et al., 18 Feb 2025): qtθ(x1x)exp(dM(x1,μtθ(x))22σ(x)2)q_t^\theta(x_1|x) \propto \exp\left(-\frac{d_\mathcal{M}(x_1, \mu_t^\theta(x))^2}{2\sigma(x)^2}\right) On homogeneous manifolds with closed-form geodesics: LRGVFM=Et,x1,xlogx1(μtθ(x))g2\mathcal{L}_{\mathrm{RG-VFM}} = \mathbb{E}_{t,x_1,x} \|\log_{x_1}(\mu_t^\theta(x))\|^2_{\mathbf{g}} This approach preserves geometric consistency and enables generative modeling on spheres, hyperbolic spaces, and other manifolds.

3. Algorithmic Implementation and Training Procedures

A generic VFM training pipeline consists of:

  1. Sampling an endpoint x1p1x_1 \sim p_1, a base sample x0p0x_0 \sim p_0, and time tUniform(0,1)t \sim \mathrm{Uniform}(0,1).
  2. Computing the interpolated state xtx_t (Euclidean, geodesic, or problem-specific interpolation).
  3. For geometry-aware cases: computing geodesics, logarithmic and exponential maps as needed.
  4. Evaluating the variational posterior qtθ(x1xt)q_t^\theta(x_1|x_t), often via a neural network.
  5. Calculating the appropriate loss (e.g., cross-entropy, mean squared error in the Riemannian metric, or Bregman divergence for exponential family posteriors).
  6. Backpropagating and updating parameters.

Sampling from a trained VFM model generally involves integrating the learned ODE defined by vθv_\theta (or an SDE if a score term is learned), from t=0t=0 to t=1t=1, starting from x0p0x_0 \sim p_0 (Nasution et al., 30 Nov 2025, Guzmán-Cordero et al., 6 Jun 2025, Zaghen et al., 18 Feb 2025).

4. Connections to Score-Based, Stochastic, and Flow-Based Models

VFM unifies deterministic continuous normalizing flows (CNFs), stochastic score-based (diffusion) models, and optimal control frameworks. The variational score,

stθ(x)=Eqtθ[xlogpt(xx1)]s_t^\theta(x) = \mathbb{E}_{q_t^\theta}[\nabla_x \log p_t(x | x_1)]

enables constructing SDE-based samplers: dx=(vtθ(x)+gt22stθ(x))dt+gtdwdx = \left(v_t^\theta(x) + \frac{g_t^2}{2}s_t^\theta(x)\right)dt + g_t dw The reweighted VFM objective yields a likelihood bound for the induced stochastic model (Eijkelboom et al., 2024, Nasution et al., 30 Nov 2025). This alignment with variational inference principles extends across domains, including generative flow networks (GFNs), where VFM generalizes trajectory-balance and allows control-variated gradient estimators for variance reduction (Zimmermann et al., 2022).

5. Practical Applications and Empirical Results

VFM and its extensions have demonstrated strong empirical performance in several domains.

  • Graph and Molecular Generation: CatFlow leverages VFM with categorical posteriors and achieves the lowest MMD scores and the highest validity and uniqueness on molecular tasks (e.g., 99.8% validity, 99.95% uniqueness, FCD 0.44 on QM9) (Eijkelboom et al., 2024).
  • Tabular Data Synthesis: Exponential-Family VFM (EF-VFM) extends VFM to mixed continuous/discrete variables and achieves state-of-the-art shape and trend errors, as well as improved α-precision and Wasserstein distance on synthetic benchmarks (Guzmán-Cordero et al., 6 Jun 2025, Nasution et al., 30 Nov 2025).
  • Vector-Quantized Image Generation: Purrception adapts VFM to VQ latents, enabling temperature control of categorical posteriors and outperforms continuous and discrete flow matching baselines in convergence speed and sample quality (e.g., FID=4.72 vs best-in-class models at comparable training steps) (Matişan et al., 1 Oct 2025).
  • Robot Manipulation: VFP policies with multimodal latent and MoE decoders achieve a 49% relative improvement in success rate over prior flow-based and diffusion policy baselines, at lower inference cost (14 ms/action, single ODE step) (Zhai et al., 3 Aug 2025).
  • Riemannian Generative Modeling: RG-VFM, when applied to data on curved manifolds (e.g., checkerboards on spheres), ensures norm-consistent sampling and sharper feature recovery compared to Euclidean and vanilla FM baselines (Zaghen et al., 18 Feb 2025).
  • Controlled and Equivariant Generation: VFM supports property-conditional and symmetry-respecting generation for both discrete and continuous molecular data, achieving high validity, uniqueness, and state-of-the-art conditional MAE for properties like polarizability (e.g., MAE=2.05 vs 2.76 for EDM) without retraining (Eijkelboom et al., 23 Jun 2025).

6. Extensions, Limitations, and Theoretical Insights

VFM extensions include:

  • Exponential-Family Parameterization: Any exponential family can be used for qtθq_t^\theta, yielding Bregman divergence-based losses that generalize mean-squared error and cross-entropy (Guzmán-Cordero et al., 6 Jun 2025).
  • Geometry-Awareness: Riemannian generalizations require exponential/logarithmic maps and add computational cost, especially in high dimensions (Zaghen et al., 18 Feb 2025).
  • Score-Based SDEs: VFM can interpolate between deterministic ODE flows and stochastic SDE sampling, controlling the utility–privacy trade-off and exactness of marginal recovery.
  • Variance Reduction: In GFlowNet, VFM provides a unified family of objectives combining forward/reverse KLs, admits learned or leave-one-out control variates, and justifies the trajectory-balance technique as a variance-reduced KL estimator (Zimmermann et al., 2022).

Known limitations include marginal computational cost for geometric or multimodal flows, inability to generalize to highly singular manifolds without trustworthy geodesic approximations, and performance dependence on the choice of base and variational family (Zaghen et al., 18 Feb 2025, Guzmán-Cordero et al., 6 Jun 2025, Zhai et al., 3 Aug 2025).

7. Summary Table: VFM Variants and Domains

Variant Posterior Family Domain/Support Empirical Highlights
CatFlow Factorized categorical Graphs, molecules (discrete) SOTA QM9, fast convergence
TabbyFlow/EF-VFM Exponential family Tabular (mixed data) Best shape/trend/Wasserstein
Purrception Factorized categorical VQ-latent images Fast, competitive FID, UQ
RG-VFM Riemannian Gaussian Spheres/manifolds Manifold-consistency, sharp
V-RFM Gaussian with latent z Images, high-dim vision Multimodal flow, FID gains
VFP/MoE Latent + experts Control, manipulation +49% multi-modal tasks
Equivariant cVFM Group-equivariant Gauss Molecules (3D, joint) High MAE control, symmetry

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variational Flow Matching (VFP).