Formal relationship between noise-expectation and gradient-expectation objectives
Determine the formal relationship between the noise-expectation and gradient-expectation training objectives for diffusion policies that target the Boltzmann distribution over actions induced by a Q-function in online reinforcement learning, and ascertain whether these objectives can be synthesized into a single unified general formulation.
References
Yet, it remains unclear how these objectives relate formally or if they can be synthesized into a more general formulation.
— Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies
(2601.08136 - Li et al., 13 Jan 2026) in Abstract