Theoretical guarantees for action chunking Q-learning with arbitrary off-policy data
Establish formal theoretical guarantees for Q-learning with action chunking critics when trained on arbitrary off-policy datasets, beyond settings where the dataset is collected by an action chunking policy. Specifically, characterize conditions under which convergence and near-optimality hold and quantify any biases that arise without assuming the data originates from an action chunking policy.
References
However, theoretical guarantees of action chunking Q-learning, especially on arbitrary off-policy data, are still an open problem as existing analysis (e.g., in \citet{li2025reinforcement}) only considers the case where the data is collected by an action chunking policy.
— Decoupled Q-Chunking
(2512.10926 - Li et al., 11 Dec 2025) in Section 1 (Introduction)