Identify critic parameterizations and compute-allocation strategies
Identify the appropriate parameterization for reinforcement learning critics and determine how to allocate variable test-time compute to integration or iterative computation in order to best exploit the interplay between fast inner-loop adaptation and slow outer-loop weight updates.
References
But what is the right parameterization for critics and how to spend variable computation are open questions for critics.
— What Does Flow Matching Bring To TD Learning?
(2603.04333 - Agrawalla et al., 4 Mar 2026) in Section 6, Discussion and Perspectives on Future Work