Discrete Hierarchical Planning (DHP)
- Discrete Hierarchical Planning is a framework that decomposes complex planning tasks into tractable subproblems using discrete, hierarchical abstractions.
- It integrates techniques from hybrid generative models, hierarchical MDPs, and symbolic planning to effectively manage sparse rewards and large state spaces.
- Empirical results demonstrate enhanced sample efficiency and rapid skill transfer in robotics and AI, validating the approach for long-horizon control.
Discrete Hierarchical Planning (DHP) is a class of planning algorithms and modeling frameworks that leverage hierarchical task, skill, or mode abstractions at a discrete level, enabling efficient, scalable, and sample-efficient solutions for long-horizon decision-making, planning, and control. DHP synthesizes ideas from hybrid generative models, hierarchical MDPs, discrete skill libraries, automated planning languages, and hierarchical reinforcement learning. Approaches in this domain have demonstrably advanced planning in robotics, reinforcement learning, and classical AI, especially in environments with sparse rewards, large or unstructured state spaces, and multi-level temporal abstraction.
1. Foundations and Formal Structures
DHP systems are characterized by multi-level architectures where planning at a higher level unfolds over discrete abstractions—such as options, intentions, modes, or compound tasks—while lower levels handle continuous execution or primitive actions. The separation of concerns enables tractability and interpretability in otherwise intractable or high-dimensional systems.
Discrete hierarchical abstraction can be instantiated in multiple ways:
- Hybrid generative models: e.g., recurrent Switching Linear Dynamical Systems (rSLDS), where continuous dynamics are governed by discrete mode variables (Collis et al., 2024).
- Hierarchical MDPs and options: Construction of abstract MDPs where each “state” is a discrete region (mode, intention, or skill) and transitions are governed by learned or specified adjacency, reward, and uncertainty models (Collis et al., 2024, Ha et al., 2020, Morere et al., 2019, Guestrin et al., 2012, Sharma et al., 4 Feb 2025).
- Hierarchical Task Networks (HTN): Hierarchically decomposed planning languages (HDDL, HDDL 2.1) enable symbolic DHP by specifying abstract and primitive tasks, decomposition methods, subtask ordering, and constraints (Höller et al., 2019, Pellier et al., 2022).
- Hierarchical RL planners and advantage estimators: Binary reachability, subgoal trees, and discrete branching structures for compositional plans and return estimation (Sharma et al., 4 Feb 2025).
The essential structural property across frameworks is the explicit or implicit constraint that higher-level nodes control when and how low-level controllers, skills, or actions are invoked, thus supporting both temporal abstraction and sample-efficient exploration.
2. Model Architectures and Learning Principles
DHP approaches are unified by their reliance on discrete, symbolic, or categorical structures at higher planning levels, linked via learned or engineered transition, reward, and feasibility models. Notable realizations include:
- rSLDS Planning: The rSLDS parameterizes both the continuous state transitions:
and the discrete mode transitions via a softmax over the current :
Model learning uses conjugate matrix-normal-inverse-Wishart priors and Laplace-Variational EM, alternately inferring and updating parameters (Collis et al., 2024).
- Low-Dimensional Latent Planning: Latent variable models encode high-dimensional observations into compact latent states and plan discrete "intentions" by simulating latent transitions. Planning proceeds in this space using particle filtering and reward shaping (Ha et al., 2020).
- Discrete Option Construction and Skill Libraries: Hierarchical abstraction operates by identifying (and recursively constructing) abstract actions (“skills” or “options”) with local precondition-effect structure, enabling backward planning and rapid skill reuse (Morere et al., 2019).
- Recursive Subgoal Trees and Reachability: A policy decomposes a long-horizon goal into a binary tree of subgoals, with each node corresponding to a reachability test over a finite horizon . Tree-shaped return estimators favor both completeness and plan brevity (Sharma et al., 4 Feb 2025).
3. Planning Algorithms and Execution Mechanisms
Central to DHP is the decomposition of complex planning into tractable subproblems via discrete abstraction:
- High-Level Discrete MDP Planning: Given discrete modes (options, intentions, skills), a high-level Bayesian MDP is constructed where actions select target modes or subgoals. Planning minimizes cumulative costs and includes information-theoretic exploration bonuses (parameter and state IG), leading to active uncertainty reduction (Collis et al., 2024). The objective may be formalized as:
- Low-Level Controllers: Primitive actions or fine-grained continuous control is encapsulated in controllers such as LQRs (for each ordered mode pair, with precomputed Riccati gains cached for efficient deployment), neural feedback policies conditioned on intention embeddings, or primitive action models (Collis et al., 2024, Ha et al., 2020).
- Discrete Hierarchical Backward Planning: In symbolic (HTN or skill-based) regimes, the planner regresses from goal specifications through skill effects, recursively generating subplans to achieve preconditions. Aggressive hierarchy is enforced by bounding recursion depth or favoring long abstract skills (Morere et al., 2019).
- Tree-Structured Plan Expansion: Recursive binary decomposition, as in hierarchical RL, builds a planning tree where each subtask must be feasible under the lower-level policy in a bounded number of steps. Achievement is verified by an explicit reachability check rather than value approximation (Sharma et al., 4 Feb 2025).
- Distributed Planning in Hierarchical MDPs: In multi-agent or factored settings, message-passing algorithms coordinate local plans via reward-sharing over tree-structured decompositions, yielding globally consistent solutions with reuse of flows and value functions among isomorphic subproblems (Guestrin et al., 2012).
4. Temporal Abstraction, Exploration Strategies, and Task Discovery
DHP supports temporal abstraction and sample-efficient exploration by identifying, validating, and exploiting discrete subgoals and skill boundaries:
- Subgoal/Option Discovery: Discrete modes or intentions are mapped to polyhedral regions in continuous or latent space; each transition is associated with a temporally extended "option" or skill whose completion triggers re-planning (Collis et al., 2024, Ha et al., 2020). Targets are chosen via gradient ascent in parameterized softmax transition models.
- Curriculum and Skill Refinement: New abstract skills and their success conditions are directly learned from successful trajectories, forming hierarchical DAGs for recursive skill application. Curriculum learning schedules goal complexity to ensure skill sets expand as needed (Morere et al., 2019).
- Information-Theoretic and Intrinsic Exploration: Planning objectives include information-gain bonuses (KL-divergence over Dirichlet counts, transition entropy), and exploration agents may be intrinsically rewarded for high reconstruction error under contrastive or variational models, thereby generating new, informative training examples not reliant on expert data (Collis et al., 2024, Sharma et al., 4 Feb 2025).
- Advantage and Return Estimation: Specialized estimators (e.g., “min-tree” return) ensure that shorter, complete plans are favored and that no partial solutions are encouraged. The operator is a contraction and admits stable policy gradient updates (Sharma et al., 4 Feb 2025).
5. Representational Formalisms and Expressivity
DHP is realized in both statistical and symbolic planning formalisms:
- PDDL/HDDL, HTN Extensions: Languages such as HDDL and HDDL 2.1 enable explicit encoding of hierarchical tasks, methods, and primitive actions, with partial or total ordering, variable-constraint logic, and (in HDDL 2.1) durative actions, numeric fluents, and complex temporal constraints. These models undergird symbolic planners for domains with concurrency, multi-agent coordination, and hybrid temporal structure (Höller et al., 2019, Pellier et al., 2022).
- Latent Variable, CVAE, and RSSM Implementations: For high-dimensional or unstructured domains (visual planning), latent state representations are constructed via variational methods, and reachability is evaluated as cosine similarity in a compact state or transition space, avoiding direct value regression and reducing sample complexity (Sharma et al., 4 Feb 2025, Ha et al., 2020).
- Hybrid Models and Polyhedral Partitioning: In rSLDS and related models, piecewise-linear regions of state space correspond to discrete high-level behavioral units, supporting both model-based planning and model-free control (Collis et al., 2024).
6. Empirical Results and Theoretical Guarantees
DHP frameworks achieve marked improvements in both sample efficiency and planning quality:
- Continuous Mountain Car: rSLDS-based DHP achieves state-space coverage in 10k steps (vs. 20% without IG bonuses), and solves the sparse goal in episodes, outperforming SAC and standard Actor-Critic methods (which fail in 20 episodes) (Collis et al., 2024).
- Long-Horizon Visual Navigation: DHP delivers 99% success and 71-step average in 25-room maze planning under visual observations, compared to 82%/158-step for the best prior method (Sharma et al., 4 Feb 2025).
- Symbolic Planning and Robotic Transfer: Hierarchical planners with effect/condition skill annotation solve environments with up to states, with plan lengths reduced from 73 to and planning time from seconds to ms; skills trained in simulation transfer directly to real-robot manipulation (Morere et al., 2019).
- Distributed and Factored MDPs: Message passing in hierarchical MDPs scales planning to large, multi-agent or multi-room settings, reusing cached flows and message tables among repeated classes and instances (Guestrin et al., 2012).
- Theoretical Guarantees: Min-tree and related operators are -contractions, ensuring the stable convergence of value and policy iterates in tree-structured hierarchical RL (Sharma et al., 4 Feb 2025).
7. Impact, Limitations, and Forward Directions
DHP represents a crosscutting advance in both practical AI planning and the theory of hierarchical control. The integration of discrete abstraction with learned and engineered models addresses the curse of dimensionality and long-horizon credit assignment. However, limitations remain, including sensitivity to representation quality (latent spaces, adjacencies), the need for robust continuous dynamics models, and restrictions inherited from the expressivity of underlying planning languages.
Future directions include combining DHP with richer temporal and symbolic reasoning (e.g., hold-between, numeric fluents in HDDL 2.1 (Pellier et al., 2022)), extending reachability estimation to text or high-level specification spaces, and deploying DHP variants in real-time, safety-critical control for robotic and multi-agent domains.
References:
- (Collis et al., 2024) Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control
- (Ha et al., 2020) Distilling a Hierarchical Policy for Planning and Control via Representation and Reinforcement Learning
- (Morere et al., 2019) Learning to Plan Hierarchically from Curriculum
- (Guestrin et al., 2012) Distributed Planning in Hierarchical Factored MDPs
- (Höller et al., 2019) HDDL -- A Language to Describe Hierarchical Planning Problems
- (Pellier et al., 2022) HDDL 2.1: Towards Defining an HTN Formalism with Time
- (Sharma et al., 4 Feb 2025) DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents