Scaling ETR to Large LLMs
Determine the computational efficiency, convergence stability, and task performance of Elastic Trust Regions (ETR) when applied to large-scale language models (e.g., models with 10B or more parameters).
References
The computational efficiency, convergence stability, and performance of ETRâs adaptive thresholding when scaled to such sizes remain unexamined.
— ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization
(2601.03723 - Zhang et al., 7 Jan 2026) in Section: Limitations