Papers
Topics
Authors
Recent
Search
2000 character limit reached

Flow Matching Models in Generative Modeling

Updated 4 February 2026
  • Flow Matching Models (FMs) are continuous-time generative models that deterministically transport samples using learned ODE-based vector fields.
  • They unify continuous normalizing flows and score-based diffusion models, enabling simulation-free training and high-quality synthesis.
  • Recent advances include conditional and latent extensions, variance-reduction techniques, and accelerated sampling via distillation and optimal transport.

Flow Matching Models (FMs) are a class of continuous-time generative models that learn time-dependent vector fields to deterministically transport samples from a noise distribution to a complex target distribution by integrating ordinary differential equations (ODEs). This approach generalizes and unifies concepts from continuous normalizing flows (CNFs) and score-based diffusion models, enabling efficient, simulation-free generative modeling across images, audio, molecules, time series, functions, structured data, and more. FM models provide state-of-the-art sample quality, strong theoretical guarantees, and scalable computational performance, and admit a large number of extensions for diverse tasks and data modalities (Holderrieth et al., 2 Jun 2025, Lipman et al., 2024).

1. Mathematical Formulation and Theoretical Foundations

In the continuous-time FM paradigm, the key object is a time-dependent vector field fθ(t,x)f_\theta(t, x), parameterized by neural networks, that transports input samples along a trajectory between a starting (simple) distribution p0p_0 (usually Gaussian) and a target data distribution p1p_1. The core dynamical equation is:

ddt x(t)=fθ(t,x(t)),x(0)∼p0.\frac{d}{dt}\,x(t) = f_\theta(t, x(t)), \quad x(0) \sim p_0.

Given a suitable fθf_\theta, the pushforward of p0p_0 at t=1t=1 approximates p1p_1. The FM training loss is based on the expected mean squared error between the neural vector field and the true "ideal" velocity field v(t,x)v(t, x) along reference trajectories, typically defined via a coupling between (x0,x1)(x_0, x_1):

L(θ)=Et∼U[0,1],x0∼p0,x1∼p1∥v(t,x(t))−fθ(t,x(t))∥2,L(\theta) = \mathbb{E}_{t \sim U[0,1], x_0 \sim p_0, x_1 \sim p_1} \Bigl\| v\bigl(t, x(t)\bigr) - f_\theta\bigl(t, x(t)\bigr) \Bigr\|^2,

with x(t)=(1−t)x0+tx1x(t) = (1-t)x_0 + t x_1 for linear interpolation (Holderrieth et al., 2 Jun 2025, Lipman et al., 2024).

The FM framework subsumes both CNFs and score-based diffusion models:

  • In CNFs, samples are generated by integrating an invertible ODE.
  • In diffusion models, stochastic differential equations (SDEs) are used, often introducing noise at each step. FM can be derived as the deterministic, zero-noise limit of diffusion models, or by directly matching expected path increments along interpolants (Holderrieth et al., 2 Jun 2025).

Key theoretical properties include existence/uniqueness of flows under Lipschitz conditions, mass conservation via the continuity equation, and statistical consistency of conditional flow matching (CFM) estimators for the true marginal velocity (the "Marginalization Trick") (Lipman et al., 2024).

2. Algorithms, Couplings, and Losses

Conditional Flow Matching (CFM)

Since the ground-truth marginal velocity is typically inaccessible, FM deploys Conditional Flow Matching by introducing couplings between the source and target samples (e.g., (x0,x1)(x_0, x_1) pairs). The most common choices are:

  • Independent CFM (I-CFM): x0∼p0x_0 \sim p_0, x1∼p1x_1 \sim p_1 (product coupling).
  • Optimal Transport CFM (OT-CFM): The coupling π∗\pi^* that minimizes quadratic cost, inducing straight (geodesic) interpolants for efficient flows.

CFM regresses the network velocity onto conditional path velocities—known in closed-form for chosen interpolants—yielding unbiased gradients for the marginal FM objective (Lipman et al., 2024, Holderrieth et al., 2 Jun 2025).

Sampling Procedure

After training, samples are drawn by solving the generative ODE backward from t=1t=1 to t=0t=0, starting from x1∼p1x_1 \sim p_1. Black-box ODE solvers—e.g., RK4, Dormand–Prince—may be used for integration. For NN steps, discrete updates take the form: xn=xn+1+(Δt/6)(k1+2k2+2k3+k4),x_n = x_{n+1} + (\Delta t/6)(k_1 + 2k_2 + 2k_3 + k_4), with k1k_1, ..., k4k_4 as standard RK4 increments (Holderrieth et al., 2 Jun 2025).

Model Architecture and Implementation

Neural parameterization of fθf_\theta typically relies on U-Nets or MLPs with time (and possibly label) embeddings; time sampling and loss weighting strategies (e.g., power-law schedules) are flexible (Lipman et al., 2024). Simulation-free training and backpropagation avoid ODE adjoint methods, in contrast to classical CNFs.

3. Variants and Extensions

Variant/Extension Focus/Application Areas Key Ideas/Mechanisms
Explicit Flow Matching (ExFM) (Ryzhakov et al., 2024) Variance reduction, theory Analytically integrates out velocity noise, producing lower-variance, unbiased gradients
Functional Flow Matching (FFM) (Kerrigan et al., 2023) Infinite-dimensional function spaces Defines probability paths on function spaces, learns vector fields via analytical conditionals
Switched Flow Matching (SFM) (Zhu et al., 2024) Multimodal, non-diffeomorphic mappings Uses a mixture of conditional ODEs ("switches") to overcome ODE singularity limits
Local Flow Matching (LFM) (Xu et al., 2024) Training efficiency, blockwise learning Splits flow into short local steps with separate models for each, reduces total training cost
Latent-CFM (Samaddar et al., 7 May 2025) Structured, manifold, multimodal data Incorporates pretrained latent variables or GMMs/VAEs into coupling to improve sample efficiency
Functional/Sequence Flows (Wei et al., 2024) Pathwise or streamwise modeling Introduces GP-modeled "streams" to reduce marginal vector variance, supports time series/multimodal
Flow on Manifolds/Lie Groups (Sherry et al., 1 Apr 2025) Non-Euclidean (e.g., SO(3), SE(3)) data Uses geodesic or exponential-map interpolants for group-equivariant flows
Federated FM (Wang et al., 25 Sep 2025) Decentralized/Privacy settings Local or global OT couplings across clients, semi-dual OT for global straightness/privacy

ExFM provides unbiased, analytically denoised velocity targets, lowering the estimator variance and improving convergence and sample sharpness, with exact solutions for Gaussian cases (Ryzhakov et al., 2024). FFM generalizes FM to infinite-dimensional Hilbert spaces, crucial in scientific computing and stochastic PDE contexts (Kerrigan et al., 2023). SFM addresses the ODE singularity issue in multimodal distributions by introducing latent "switch" variables to enable multiple ODEs, yielding non-intersecting, locally-optimal flows (Zhu et al., 2024). LFM trains sub-models incrementally across intermediate marginals, improving training efficiency for high-dimensional or computationally constrained settings (Xu et al., 2024).

Hybridization with diffusion models is possible: Diff2Flow transfers pretrained diffusion priors to FM, accelerating finetuning and leveraging existing diffusion architectures (Schusterbauer et al., 2 Jun 2025).

4. Applications, Sampling Efficiency, and Empirical Results

FM models achieve competitive or superior empirical performance to diffusion and CNF counterparts in high-dimensional image, video, tabular, time-series, and molecular generation:

Sampling efficiency and quality are a function of coupling/path choice (OT vs. VP), ODE solver (adaptive vs. fixed-step), and whether deterministic (ODE) or stochastic (SDE) flows are used.

5. Advances in Sampling Acceleration and Distillation

The main computational bottleneck of FM models is the need to solve ODEs with many neural forward passes. Recent advances include:

  • Flow Generator Matching (FGM): Distills a pretrained multi-step FM into a one-step neural generator, preserving (and even sometimes surpassing) the sample quality of the original teacher, with 50×–300× inference speedups. On CIFAR-10, one-step FGM achieves FID 3.08 (vs. 3.67 original) (Huang et al., 2024).
  • Switched FM (SFM): By introducing a switch variable that partitions the data, SFM removes the ODE singularity barrier, yielding lower-curvature, straight, and efficient transports even under multimodal data and optimal-transport couplings (Zhu et al., 2024).
  • OAT-FM: Builds a second-order (accelerations) optimal transport theory to yield trajectories with minimal action, further straightening FM paths and reducing both energy and FID for a fixed NFE. This paradigm allows for two-phase training—pretrain with FM, fine-tune with OAT-FM for improved straightness (Yue et al., 29 Sep 2025).
  • Diff2Flow: Enables direct FM finetuning from pretrained diffusion models by aligning interpolant paths, rescaling time, and constructing compatible velocity fields—accelerating convergence and improving quality (Schusterbauer et al., 2 Jun 2025).

6. Biases, Limitations, and Theoretical Considerations

While population-level FM objectives can recover gradient (OT) fields, empirical FM with finite samples almost never produces a gradient field; this introduces rotational components (curl) and increases total kinetic energy above the OT minimum, leading to energetically suboptimal flows (Lim, 18 Dec 2025). The choice of coupling, path, and source distribution determines both tail behavior and kinetic energy concentration—Gaussian sources yield exponential tails; heavy-tailed sources yield polynomial tails. Mitigation strategies include explicit curl penalties, Input-Convex Network parameterizations, or architectural bias toward gradient flows.

Singularities arise when flows must "split" mass; standard ODE theory precludes such solutions. SFM bypasses this via switching, and mini-batch OTs help further (Zhu et al., 2024).

Adaptation to new data distributions or tasks (e.g., fine-tuning) can lead to suboptimal paths or instability if naive approaches are used. Gradual Fine-Tuning (GFT) interpolates drifts between pretrained and target distributions in a temperature-annealed way, providing theoretical guarantees on convergence while preserving efficient, straight paths (Thorkelsdottir et al., 30 Jan 2026).

7. Conditional and Structured Extensions

FMs have been extended to myriad settings:

8. Concluding Remarks and Ongoing Directions

Flow Matching Models offer a unifying, simulation-free, and highly extensible approach to generative AI via ODE-based mass transport, theoretically grounded in optimal transport, SDE/ODE analysis, and regression of time-dependent vector fields. Advances in coupling strategies, variance-reduction losses, structured extensions (latent/streamwise), and acceleration/distillation have made FM a central architecture for state-of-the-art synthesis in vision, audio, tabular, and scientific domains. Open directions include scalable OAT-based solvers, single-step closed-form sampling, dual/gradient-regularization for optimality, integration with advanced architectures (e.g., foundation models), and fine-tuning under data shifts or privacy constraints (Holderrieth et al., 2 Jun 2025, Lipman et al., 2024, Yue et al., 29 Sep 2025, Thorkelsdottir et al., 30 Jan 2026).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Flow Matching Models (FMs).