Convergence of A3C’s asynchronous updates
Prove convergence of the Asynchronous Advantage Actor–Critic (A3C) algorithm’s lock-free asynchronous parameter updates in the parallel-worker setting, or derive rigorous conditions under which these asynchronous updates converge.
References
Rigorous convergence theory for A3C's lock-free asynchronous parameter updates remains an open problem; existing analyses of asynchronous stochastic approximation \citep{qu2020async} address classical asynchrony (different state-action pairs updated at different times on a single trajectory), not the parallel-worker gradient setting.
— A Survey of Reinforcement Learning For Economics
(2603.08956 - Rawat, 9 Mar 2026) in Subsection “Actor-Critic Methods (2000)” (A3C footnote)