Applicability of ETR to Open-Ended Tasks
Ascertain the applicability and effectiveness of ETR’s signal-quality proxies—micro-level adjustment based on advantage magnitude and macro-level adjustment based on group pass rate—in open-ended tasks such as open-domain dialogue and creative writing, where outputs are highly diverse and evaluation criteria are ambiguous.
References
The applicability of ETRâs assessment proxies to scenarios with high output diversity and ambiguous evaluation criteria remain unexplored.
— ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization
(2601.03723 - Zhang et al., 7 Jan 2026) in Section: Limitations