Effect of sparse goal rewards on one-step FB in OGBench

Investigate whether, in state-based OGBench domains with sparse goal-conditioned indicator rewards, the reward structure induces a single backward representation and thereby makes these domains challenging for the one-step forward–backward method compared to baselines that learn goal-conditioned distance functions.

Background

In the reported experiments, one-step FB underperforms some baselines on certain state-based OGBench tasks while performing strongly on other domains. The authors hypothesize that the nature of sparse, indicator-style goal rewards may collapse the diversity needed in backward representations.

If sparse rewards do induce a single backward representation, this could limit the expressivity of one-step FB during zero-shot adaptation, whereas methods explicitly learning goal-conditioned distances (e.g., HILP, ICVF) might be better suited to such tasks. Validating or refuting this conjecture would clarify when one-step FB is preferable.

References

We conjecture that the state-based OGBench domains are challenging for one-step FB because the sparse reward function (goal-conditioned indicator rewards) induces a single backward representation.

Can We Really Learn One Representation to Optimize All Rewards?  (2602.11399 - Zheng et al., 11 Feb 2026) in Section 5.3 (Comparing One-Step FB to Prior Unsupervised RL Methods)