Feasibility-Guided Planning over Multi-Specialized Locomotion Policies

Published 8 Feb 2026 in cs.RO | (2602.07932v1)

Abstract: Planning over unstructured terrain presents a significant challenge in the field of legged robotics. Although recent works in reinforcement learning have yielded various locomotion strategies, planning over multiple experts remains a complex issue. Existing approaches encounter several constraints: traditional planners are unable to integrate skill-specific policies, whereas hierarchical learning frameworks often lose interpretability and require retraining whenever new policies are added. In this paper, we propose a feasibility-guided planning framework that successfully incorporates multiple terrain-specific policies. Each policy is paired with a Feasibility-Net, which learned to predict feasibility tensors based on the local elevation maps and task vectors. This integration allows classical planning algorithms to derive optimal paths. Through both simulated and real-world experiments, we demonstrate that our method efficiently generates reliable plans across diverse and challenging terrains, while consistently aligning with the capabilities of the underlying policies.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel framework that fuses multi-specialized RL policies with Feasibility-Net models for explicit feasibility estimation.
It employs a sliding window approach to convert elevation maps into directional tensors, enabling dynamic policy selection via graph-based planning.
The framework achieves superior performance with 100% training success in simulation and 70% real-world success, demonstrating robust, adaptable navigation.

Feasibility-Guided Planning Framework for Multi-Specialized Locomotion Policies

Motivation and Problem Formulation

Robotic navigation over unstructured terrain is fundamentally constrained by the diversity and specialization of underlying locomotion policies. The proliferation of terrain-specific strategies from RL and MPC render classical planning inadequate due to their reliance on generic costmaps and hard thresholding, which fail to capture the nuanced capabilities required for traversing varied environmental features. Hierarchical RL approaches, though capable of skill chaining and modulation, suffer from interpretability loss and require complete retraining to incorporate new policies, which limits their scalability. The presented framework directly addresses the imperative for a scalable, interpretable, and adaptable planning paradigm by incorporating multi-specialized locomotion policies through policy-specific feasibility representations.

Figure 1: Feasibility-guided planning enables optimal path selection and policy switching over mixed terrain via policy-specific feasibility representations.

Framework Architecture and Joint Training Paradigm

The core of the methodology lies in the concurrent optimization of locomotion policies and their paired Feasibility-Net models. Each policy specializes in a distinct terrain configuration, employing a PPO actor-critic architecture for RL. Feasibility-Net leverages supervised regression to predict velocity tracking rewards, parameterized as normalized feasibility scores. These predictions are modulated by a VAE branch that models the terrain height distribution, enabling OOD detection and robust deployment across unseen environments. The joint loss integrates both feasibility prediction and distribution modeling, eliminating the need for separate data pipelines and enabling seamless policy integration.

Figure 2: Overview of the feasibility-aware planning framework, where policies and Feasibility-Net models are jointly trained using shared environment rollouts, allowing integrated feasibility prediction and terrain modeling.

Elevation Map Transformation and Directional Feasibility Representation

Deployment hinges on transforming elevation maps into directional feasibility tensors. The methodology extracts localized heightmap patches through a sliding window, associates synthetic velocity command vectors, and systematically rotates patches to produce multi-directional feasibility estimates. The resulting tensors encode the terrain-specific capability of each policy for movement along discrete directions, modulated by terrain familiarity via OOD weights derived from VAE reconstruction error.

Figure 3: The sliding window methodology converts elevation maps into feasibility tensors by extracting local patches and generating directional feasibility predictions.

Feasibility Tensor Fusion and Interpretability in Graph-Based Planning

Planning across multiple specialists is enabled by fusing individual feasibility tensors via element-wise maximum operations, yielding a unified representation that captures optimal capability at each spatial-directional locus. The planning cost function is the inverse of this fused feasibility, which satisfies the requirements of Dijkstra's algorithm for optimal pathfinding given the loss of heuristic admissibility in direction-dependent terrains. Policy selection is achieved via argmax assignment over individual feasibility scores, maintaining transparent decision provenance.

Figure 4: Multi-policy feasibility tensor fusion combines policy-specific representations into a unified cost function, enabling transparent graph search and policy selection.

Dynamic Adaptation and Skill Evolution in Simulation

The system demonstrates dynamic adaptability to evolving policy capabilities. Planning adapts its trajectory as policy skills mature, consistently selecting shorter, more efficient routes that reflect the improved feasibility predictions. Quantitative evaluation shows persistent 100% success rate across training stages, with SPL rising from 0.19 to 0.51 as policy proficiency improves, validating fine-grained policy-planner alignment.

Figure 5: Feasibility-guided planning adapts routing as locomotion skills evolve, selecting progressively more direct paths on stepped terrain.

Multi-Terrain and Policy Coordination: Comparative Performance

Comprehensive simulation studies evaluate the framework in single- and mixed-terrain environments. Four specialized policies (steps, gaps, bridge, valley) are trained with their respective Feasibility-Net models. The feasibility-guided framework achieves near-optimal performance across all terrains, matching domain specialist success rates and attaining 98.60% success on mixed terrain, compared to the complete failure of generalist policies outside their trained domains.

Figure 6: Training and evaluation terrain configurations, differentiating specialization for steps, gaps, bridge, and valley.

Real-World Validation and Robustness

The framework was validated in a physical heterogeneous environment scanned by LiDAR and transformed into feasibility tensors. Real-world deployment on a Unitree A1 quadruped achieved 70% overall success across four terrain types, significantly outperforming generalist policies, which failed on gaps and exhibited poor performance on bridge and valley configurations. Policy switching was interpretable and effective; the main sources of failure were localization imprecision and zero policy switching cost assumptions, indicating areas for future improvement.

Figure 7: Real-world planning on mixed terrains (steps, gaps, bridge, valley).

Figure 8: Transformation of a real elevation map into policy-specific feasibility tensors, enabling optimal policy selection and path planning based on terrain characteristics.

Implications and Prospects for Future Research

The feasibility-guided coordination paradigm addresses the interpretability, adaptability, and scalability deficits observed in classical and hierarchical RL-based planning. By decoupling policy specialization from planning logic and grounding feasibility in direct proficiency estimates, the framework allows seamless integration of new skills without retraining or reward engineering, supports transparent planning decisions, and enhances robustness to environmental variation via OOD modeling. Future research prospects include refinement of switching cost models, more sophisticated localization integration, expansion to broader locomotion modalities, and deployment in highly dynamic or multi-agent scenarios. The feasibility estimation-based planning paradigm is a promising platform for modular skill composition in adaptive robotic navigation and may influence both theoretical approaches to RL-based planning and practical robotic system design.

Conclusion

This framework introduces a scalable and interpretable planning methodology for navigation across heterogeneous unstructured terrain by fusing multiple policy-specific feasibility representations. Joint policy and Feasibility-Net training yields robust terrain specialization and deployment-time adaptability. Theoretical and numerical validation in simulation and real-world environments confirm superior success rates, dynamic path optimization, and transparent policy selection compared to baseline generalist approaches. The modularity and extensibility of the feasibility-guided planner make it a compelling foundation for future advances in adaptive legged locomotion and terrain-aware navigation.

Markdown Report Issue