Recovering the policy competence radius without policy evaluation

Determine whether the policy competence radius R—defined as the effective radius of states around a given state for which a learned goal-conditioned value function provides a clear learning signal for extracting a goal-conditioned policy—can be recovered or effectively approximated in practice without evaluating an extracted policy.

Background

The paper defines an "effective radius"—termed the policy competence radius R—as the region of states for which the value function learned via temporal-difference updates provides a sufficiently strong signal-to-noise ratio to guide policy extraction. This radius becomes critical in long-horizon, sparse-reward settings due to compounding value approximation errors.

Because the true value approximation error is unknown, R is not known a priori. The authors note that it is not clear whether R can be recovered or approximated without evaluating an extracted policy. Their framework sidesteps this by imposing a test-time reachability constraint using a tunable threshold, but a principled method to estimate R directly would benefit hierarchical goal-conditioned RL.

References

This quantity is not known apriori because we do not have access to the true value approximation error, and it is not clear if it can be recovered or effectively approximated in practice without evaluating an extracted policy.

— Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion (2602.02722 - Haramati et al., 2 Feb 2026) in Appendix, Section "The Benefits of Decoupling Training Goal Distributions Across the Hierarchy"

Recovering the policy competence radius without policy evaluation

Background

References

Related Problems