A unified perspective on fine-tuning and sampling with diffusion and flow models

Published 30 Apr 2026 in stat.ML, cs.LG, and math.OC | (2605.00229v1)

Abstract: We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities and reward fine-tuning of pre-trained models. This problem can be approached from a stochastic optimal control (SOC) perspective, using adjoint-based or score matching methods, or from a non-equilibrium thermodynamics perspective. We provide a unified framework encompassing these approaches and make three main contributions: (i) bias-variance decompositions revealing that Adjoint Matching/Sampling and Novel Score Matching have finite gradient variance, while Target and Conditional Score Matching do not; (ii) norm bounds on the lean adjoint ODE that theoretically support the effectiveness of adjoint-based methods; and (iii) adaptations of the CMCD and NETS loss functions, along with novel Crooks and Jarzynski identities, to the exponential tilting setting. We validate our analysis with reward fine-tuning experiments on Stable Diffusion 1.5 and 3.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a unified framework for diffusion and flow models using exponential tilting to bridge reward-based fine-tuning and energy-based sampling.
It provides a rigorous bias-variance analysis showing finite gradient variance for adjoint-based and novel score matching methods, while highlighting infinite variance issues in path-KL weighted approaches.
Thermodynamic loss adaptations, derived via principles like the Jarzynski equality, are validated empirically on text-to-image models, demonstrating improved performance and practical implications for generative modeling.

Unified Framework for Fine-Tuning and Sampling with Diffusion and Flow Models

Problem Formulation and Theoretical Foundations

The paper introduces a unified perspective for training diffusion and flow generative models to sample from target distributions defined via exponential tilting of a base density. Exponential tilting, $p^*(x) \propto p_{\text{base}}(x) \exp(r(x))$ , generalizes both reward-based fine-tuning of pre-trained models and sampling from unnormalized densities. This connects two established families of generative modeling problems—one seeking to improve sample quality through reward optimization, and another targeting inference in energy-based models.

The theoretical backbone leverages stochastic optimal control (SOC) and thermodynamics. The SOC formulation reinterprets the tilting as an entropy-regularized control problem, typically controlled via memoryless SDEs and adjoint-state methods. The thermodynamic approach frames the problem as a generalized free-energy minimization, adapting classical results like Jarzynski equality and Crooks fluctuation theorem to this setting, yielding novel loss functions such as CMCD and NETS.

Algorithmic Analysis: Bias-Variance Decomposition

A core contribution is a precise bias-variance decomposition for a broad array of algorithms, including adjoint-based methods (Adjoint Matching/Sampling) and score-matching variants (Target, Conditional, and Novel Score Matching). Notably:

Adjoint Matching/Sampling and Novel Score Matching manifest finite gradient variance. This is rigorously established via norm bounds on the lean adjoint ODE; variance terms remain bounded under convexity assumptions on the base density. This ensures these methods are well-posed for gradient-based optimization and theoretically underpins their empirical success.
Target and Conditional Score Matching exhibit infinite variance with path-KL weights, precluding probabilistically interpretable loss minimization in practice. This is a fundamentally negative result, cautioning against naive application in loss-weighted scenarios aligned with probabilistic path divergences.

The mathematical analysis clarifies which algorithms are appropriate for unbiased reward fine-tuning and reliable sampling from unnormalized densities under exponential tilting. In particular, the finite variance result for adjoint-based and NSM approaches provides theoretical justification for their use in large-scale generative modeling.

Thermodynamic Loss Adaptation and Identity Derivation

Building on recent advances in non-equilibrium thermodynamics, the paper adapts CMCD and NETS loss functions to the exponential tilting framework. The derivations yield analogs of Crooks fluctuation theorem and Jarzynski equality, offering principled estimators for free energy differences and efficient sampling in the exponentially tilted regime.

Explicit loss formulas are provided, including KL-CMCD and Log-Variance CMCD, and physics-informed PINN NETS losses tailored to the tilted distribution. These results extend established algorithms for path-space variational inference, showing applicability to both reward-based fine-tuning and energy-based sampling.

Empirical Validation: Reward Fine-Tuning of Diffusion Models

Experimental results focus on reward fine-tuning using Adjoint Matching in the context of text-to-image diffusion models (Stable Diffusion 1.5 and 3). Key findings:

Trade-offs between per-prompt diversity (DreamSim variance) and quality metrics (ImageReward, CLIPScore, HPSv2, Aesthetic Score) are systematically explored. The experiments demonstrate nuanced differences in performance across inference noise schedules (memoryless vs. default DDIM) and reward multipliers, confirming the theoretical predictions.
Norm bounds on the adjoint state translate to predictable variance in empirical gradients, with schedule and noise choices influencing Pareto-optimal trade-offs. Ablation studies on initial variance show inconclusive impact, aligning with theoretical analysis that bounds are variance-independent on $\sigma_0$ within Adjoint Matching.

While the evaluation centers on text-to-image models, the applicability of the framework extends to other domains (e.g., protein design, DNA sequence generation), contingent on future empirical work.

Implications and Future Directions

This unified framework establishes a rigorous bridge between SOC, thermodynamics, and score matching within the context of exponential tilting. Practical implications include:

Algorithm selection for reward fine-tuning and sampling should be guided by bias-variance characteristics. Methods with finite gradient variance (AM/AS, NSM) are preferable for large-scale applications requiring stable optimization.
Theoretical guarantees offer principled routes for algorithmic design, supporting further development of efficient samplers and reward-based fine-tuning schemes across diverse generative models.
Novel thermodynamic losses facilitate statistical inference in settings previously dominated by heuristic sampling and RL-based reward optimization, promising broader impact in scientific modeling and large-scale generative tasks.

Future research should expand empirical evaluation to additional domains (e.g., molecular generative modeling), and systematically explore thermodynamics-based algorithms. Extending the bias-variance analysis to encompass comprehensive algorithmic families remains an important avenue, with potential for refining understanding of stability and reliability in diffusion model training.

Conclusion

This work provides a comprehensive theoretical and algorithmic synthesis for fine-tuning and sampling with diffusion and flow models under exponential tilting. It clarifies which algorithms are underpinned by finite variance, adapts thermodynamic identities to the tilting regime, and validates the implications empirically for text-to-image models. The unified approach positions SOC, score matching, and thermodynamics as mutually reinforcing paradigms, offering robust guidance for algorithmic selection and development in generative modeling.

Markdown Report Issue