Zero-shot adaptation performance of non-hierarchical baselines

Ascertain how standard model-free reinforcement learning baselines such as Rainbow DQN and goal-conditioned DQN perform zero-shot adaptation to previously mastered goals when the agent starts from a novel initial state, in the absence of hierarchical options and an abstract world model for planning, and rigorously characterize their capabilities and limitations under these conditions.

Background

The paper demonstrates that AgentOWL can achieve previously mastered goals from novel starting states by composing existing hierarchical options and planning within its abstract world model. This zero-shot capability is showcased in Private Eye by adding an option to navigate back to the original starting state and then composing learned options without additional training.

In contrast, the authors note uncertainty regarding how baseline agents lacking hierarchical options and abstract world model planning would handle zero-shot adaptation. The baselines discussed include Rainbow DQN and goal-conditioned DQN; these policies do not have the compositional skill hierarchy or planning capabilities that AgentOWL leverages. Clarifying their performance would help delineate the benefits and limitations of hierarchical, model-based approaches compared to standard model-free methods in novel-situation generalization.

References

It is unclear, on the other hand, how other baselines would perform zero-shot adaptation to novel situations without hierarchical options to compose sub-options and an abstract world model to plan on.

Joint Learning of Hierarchical Neural Options and Abstract World Model  (2602.02799 - Piriyakulkij et al., 2 Feb 2026) in Experimental Results, Zero-shot generalization to novel situations