Preconditioned Score and Flow Matching

Published 2 Mar 2026 in cs.LG, cs.AI, and cs.CV | (2603.02337v1)

Abstract: Flow matching and score-based diffusion train vector fields under intermediate distributions $p_t$, whose geometry can strongly affect their optimization. We show that the covariance $Σ_t$ of $p_t$ governs optimization bias: when $Σ_t$ is ill-conditioned, and gradient-based training rapidly fits high-variance directions while systematically under-optimizing low-variance modes, leading to learning that plateaus at suboptimal weights. We formalize this effect in analytically tractable settings and propose reversible, label-conditional \emph{preconditioning} maps that reshape the geometry of $p_t$ by improving the conditioning of $Σ_t$ without altering the underlying generative model. Rather than accelerating early convergence, preconditioning primarily mitigates optimization stagnation by enabling continued progress along previously suppressed directions. Across MNIST latent flow matching, and additional high-resolution datasets, we empirically track conditioning diagnostics and distributional metrics and show that preconditioning consistently yields better-trained models by avoiding suboptimal plateaus.

Abstract PDF Upgrade to Chat

Summary

The paper presents preconditioning techniques for continuous-time generative models to mitigate geometric bottlenecks in optimization.
It demonstrates that normalizing flow-based preconditioning stabilizes training by isotropizing intermediate distributions and lowering condition numbers.
Empirical results on datasets like MNIST and LSUN validate the method’s enhanced convergence and improved sample fidelity.

Preconditioned Score and Flow Matching: Conditioning, Geometry, and Optimization in Continuous-Time Generative Models

Introduction and Motivation

Continuous-time generative modeling frameworks such as flow matching and score-based diffusion fundamentally rely on learning vector fields to transport a simple base distribution (e.g., Gaussian) to a complex target distribution. Training these models involves a regression problem over intermediate distributions $p_t$ , whose geometry can critically impede optimization. This paper demonstrates that the covariance structure $\Sigma_t$ of these distributions governs a strong optimization bias: if $\Sigma_t$ becomes ill-conditioned, gradient-based methods rapidly fit directions of high variance but suppress optimization along low-variance directions, resulting in suboptimal convergence and a plateauing of training loss before model capacity is exhausted.

The authors provide a formal analytic study of such bottlenecks using tractable cases (Gaussian, Gaussian mixture) and introduce preconditioning frameworks—reversible label-conditional maps that isotropize $p_t$ —to address these obstacles without altering the underlying generative model's capacity or architecture. This approach substantially mitigates optimization stagnation and enables superior trained models.

Figure 1: Standard flow matching versus preconditioned transport, outlining the direct path from reference to data manifold and the two-stage (preconditioned) process that reshapes data geometry for improved convergence.

Geometric Conditioning and Optimization Barriers

The core insight is that both flow and score matching reduce to a sequence of least-squares regression problems parameterized by $t$ , with each step sampling from a distribution $p_t$ interpolating between base and target. When the target is highly anisotropic or multimodal, the covariance $\Sigma_t$ for intermediate $t$ becomes increasingly ill-conditioned. As $t \to 1$ , $\Sigma_t$ inherits the bad conditioning of the data space, which forces SGD and its variants into regimes where only high-variance principal directions are fit efficiently.

Detailed spectral analysis in the linear Gaussian case illustrates that the eigenvalues of $\Sigma_t$ take the form:

$\sigma_i(t) = (1-t)^2 + t^2 \lambda_i,$

where $\lambda_i$ are the eigenvalues of the target covariance. The condition number $\kappa(\Sigma_t)$ increases with $t$ , so as training proceeds towards the data manifold, optimization becomes progressively hampered in suppressed directions.

Figure 2: Growth in condition number of $\Sigma_t$ and spectral disparities along interpolation, visualizing the transition from isotropic to highly ill-conditioned distributions.

The effect is not remedied by increased model capacity, architectural modifications, or data volume, and is exacerbated in multimodal cases (Gaussian mixtures), where convergence becomes dominated by the hardest-to-fit component.

Preconditioned Flow and Score Matching

Drawing an analogy with preconditioning for linear solvers, the authors propose to invertibly map the target data to a space where its geometry is more isotropic—specifically, to "whiten" the data with a learned (normalizing flow or flow-based) preconditioner $\mathcal{P}$ . The generative training is performed in this preconditioned latent space and then mapped back to the original data manifold. Theoretical guarantees demonstrate that this transformation renders the regression problems better conditioned (condition number close to 1), enabling optimization to proceed without stagnation even in previously suppressed directions.

Figure 3: Comparison of 2D Gaussian transport with and without preconditioning; preconditioning yields lower MMD, prevents stagnation, and achieves dramatically improved conditioning of the transport operator.

The preconditioning operator can be constructed in several ways:

Normalizing flow-based preconditioning: mapping the target to an approximately Gaussian distribution via invertible, tractable flows.
Low-capacity flow-based preconditioning: using a lightweight flow matching model to push forward the target data into a more isotropic configuration, then applying standard flow matching in this transformed space.

These approaches are algorithmically simple and compatible with any core flow or score matching implementation.

Empirical Validation

The paper presents comprehensive experiments on both low-dimensional tasks (e.g., Swiss roll, Checkerboard to Swiss Roll transport) and complex image datasets (MNIST, LSUN Churches, Oxford Flowers-102, AFHQ Cats).

Swiss Roll Transport: Direct flow matching without preconditioning suffers from misaligned, distorted, and poorly covering sample trajectories.

Figure 4: Flow matching without preconditioning, showing ground-truth Swiss roll samples and poor transport by the plain model.

By contrast, preconditioned approaches result in more isotropic intermediate representations and significantly improved generative alignment:

Figure 5: Preconditioning via normalizing flow straightens the Swiss roll, permitting easier learning of Gaussian transport.

Figure 6: Even a low-accuracy, low-capacity flow used as a preconditioner can substantially improve the downstream learned transport.

Checkerboard to Swiss Roll: Preconditioning achieves not only improved target coverage but also prevents premature stagnation, as reflected in the monotonic decrease of distributional metrics (e.g., MMD).

Figure 7: With preconditioning, transport flows are smoother, target alignment is enhanced, and final distributional errors are significantly lower.

MNIST and High-Resolution Images: When flow matching is performed in the latent space of a VAE with isotropic preconditioning, there is a substantial qualitative and quantitative improvement in sample fidelity (as measured by FID), with normalizing flow preconditioning yielding the lowest FID.

Figure 8: MNIST generations under different preconditioning regimes, highlighting improvements in sample clarity, diversity, and overall FID.

For complex datasets, preconditioned flows yield sharper, more coherent structure and better spatial consistency across generations.

Figure 9: LSUN Churches—preconditioning delivers improved global structure and spatial organization.

Figure 10: Oxford Flowers—preconditioning enhances visual coherence and sample consistency.

Figure 11: AFHQ Cats—preconditioned sampling produces more realistic sample morphologies and textures.

Condition number diagnostics confirm that preconditioned intermediates dramatically suppress ill-conditioning throughout the transport process.

Figure 12: Preconditioning reduces the condition number of $\Sigma_t$ at all $t$ , leading to improved numerical stability and convergence on MNIST.

Implications and Future Directions

The results provide an explicit demonstration that optimization geometry, not only model capacity, governs the practical learnability and sample quality of continuous-time generative models. Preconditioning operates as an architectural and algorithmic layer that can be flexibly composed with arbitrary downstream generative models. This framework is theoretically sound, compatible with high-capacity neural architectures, and yields measurable empirical gains.

Bold claim: The research shows that, contrary to some current practice, loss plateaus and slow convergence in flow/diffusion-based generation are not indicative of model or data limitations, but rather, a manifestation of geometric conditioning bottlenecks—which can be counteracted entirely by geometric preconditioning transforms.

Further directions include the development of adaptive, time-dependent preconditioners, analysis of SGD in nonlinear network regimes via local linearizations, and extending these preconditioning techniques to other generative frameworks beyond ODE-based flows.

Conclusion

This study introduces a general, theoretically motivated framework for preconditioning flow and score matching objectives in continuous-time generative models. By directly addressing the conditioning of intermediate distributions, preconditioning transforms—realized via invertible normalizing flows or lightweight flows—enable efficient, robust optimization and higher-quality sample generation on both synthetic and high-resolution real datasets. The findings indicate that careful attention to problem geometry is fundamental for advancing the training and application of modern generative models, with immediate impact on both methodological understanding and practical implementation across generative modeling in machine learning.

(2603.02337)

Markdown Report Issue