Applicability of ETR to Open-Ended Tasks

Ascertain the applicability and effectiveness of ETR’s signal-quality proxies—micro-level adjustment based on advantage magnitude and macro-level adjustment based on group pass rate—in open-ended tasks such as open-domain dialogue and creative writing, where outputs are highly diverse and evaluation criteria are ambiguous.

Background

ETR is evaluated on structured mathematical reasoning tasks where outcomes are objectively verifiable, enabling clear advantage computation and group pass-rate variance estimation. The method’s design leverages these outcome statistics to adjust clipping bounds dynamically.

The paper explicitly states that ETR has not been extended to open-ended tasks like dialogue or creative writing, where evaluation criteria are ambiguous and outputs are highly diverse. It remains unresolved whether ETR’s proxies for signal quality (advantage magnitude and group variance via pass rate) generalize to such settings, which is essential for broader applicability beyond math reasoning.

References

The applicability of ETRâs assessment proxies to scenarios with high output diversity and ambiguous evaluation criteria remain unexplored.

— ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization (2601.03723 - Zhang et al., 7 Jan 2026) in Section: Limitations

Applicability of ETR to Open-Ended Tasks

Background

References

Related Problems