Merging pixel-derived abstract world models with hierarchical option training

Develop an integrated approach that jointly learns abstract symbolic world models directly from pixel inputs (e.g., via methods such as VisualPredicator, ExoPredicator, or predicate-learning from vision) together with hierarchical neural option training, enabling end-to-end acquisition of neurosymbolic world models and compositional skills from raw visual observations.

Background

AgentOWL currently assumes symbolic input states (OCAtari) to facilitate sample-efficient abstract world modeling using PoE-World and to focus on the skill-learning problem. This design choice avoids the complexities of representation learning from pixels but limits applicability in raw-visual settings.

Recent work aims to learn abstract symbolic world models directly from pixels, offering a path to bridge perception and abstract decision-making. The authors explicitly state that combining such pixel-to-symbol methods with hierarchical option training remains an open direction, highlighting a key integration challenge for end-to-end neurosymbolic agents.

References

There is ongoing effort to learn abstract symbolic world models directly from pixels , but merging that line of work with option training remains open.

Joint Learning of Hierarchical Neural Options and Abstract World Model  (2602.02799 - Piriyakulkij et al., 2 Feb 2026) in Section: Limitations and Future Direction