Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time
The paper "Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time" presents a methodological advancement in the realm of solving Partially Observable Markov Decision Processes (POMDPs). This work integrates temporal logical reasoning with POMDPs to enhance decision-making processes under uncertainty by leveraging macro-actions.
Methodology
The authors propose the use of Linear Temporal Logic (LTL) fragments based on Event Calculus (EC) to generate persistent macro-actions. These macro-actions lead to substantial reductions in inference time while maintaining robust performance. The integration of these macro-actions with POMDP solvers like Partially Observable Monte Carlo Planning (POMCP) and Determinized Sparse Partially Observable Tree (DESPOT) is central to the methodology. Inductive Logic Programming (ILP), employed here, facilitates learning these macro-actions from minimal execution traces, thereby alleviating the need for complex manual heuristics.
This approach is an extension of the authors' previous methodology, relying on Answer Set Programming (ASP) to learn EC theories, which provide a more expressive representation of POMDP dynamics through temporal abstraction. The learned EC theories serve as interpretable heuristics, guiding exploration in MCTS-based planners, thereby improving computational efficiency and extending the planning horizon.
Experimental Findings
The paper demonstrates the efficacy of the proposed method in the Pocman and Rocksample benchmark scenarios. In these domains, learned macro-actions exhibited notable computational efficiency improvements compared to both handcrafted and neural network-derived heuristics.
For the Rocksample domain, temporal heuristics showcased greater expressiveness by capturing directional constraints tied to rock sampling tasks and were computationally advantageous. In Pocman, persistent macro-actions effectively guided agents in environments characterized by extended planning horizons, outperforming local heuristics and sometimes even handcrafted ones.
In comparison to state-of-the-art neural architectures for large-scale POMDPs, the proposed methodology offered superior performance in average discounted returns and generalization capabilities, alongside notable reductions in computational and data requirements.
Implications and Future Work
The implications of this work are twofold: practically, it offers a more scalable and interpretable approach to POMDP planning, reducing computational overhead while improving policy performance; theoretically, it underscores the potential of combining symbolic reasoning with probabilistic methods in complex decision-making environments, promoting interpretability and transparency in AI systems.
Future developments may involve expanding this framework to accommodate more intricate logical representations, like full Linear Temporal Logic, and applying it to a broader spectrum of real-world tasks, including robotics and autonomous systems. The exploration of continuous domains and further refinement of logical predicate definitions derived from environment models presents a promising avenue for extending the capabilities and applicability of this methodology.
This paper makes a significant contribution to the ongoing efforts to bridge the gap between symbolic logic and automated decision-making in partially observable environments, offering insightful pathways for future research and development in AI planning strategies.