PUMA: Perception-driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour

Published 22 Jan 2026 in cs.RO, cs.AI, and cs.LG | (2601.15995v1)

Abstract: Parkour tasks for quadrupeds have emerged as a promising benchmark for agile locomotion. While human athletes can effectively perceive environmental characteristics to select appropriate footholds for obstacle traversal, endowing legged robots with similar perceptual reasoning remains a significant challenge. Existing methods often rely on hierarchical controllers that follow pre-computed footholds, thereby constraining the robot's real-time adaptability and the exploratory potential of reinforcement learning. To overcome these challenges, we present PUMA, an end-to-end learning framework that integrates visual perception and foothold priors into a single-stage training process. This approach leverages terrain features to estimate egocentric polar foothold priors, composed of relative distance and heading, guiding the robot in active posture adaptation for parkour tasks. Extensive experiments conducted in simulation and real-world environments across various discrete complex terrains, demonstrate PUMA's exceptional agility and robustness in challenging scenarios.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a unified, perception-driven RL framework using egocentric polar foothold priors for high-agility quadruped parkour.
The system fuses depth and proprioceptive data through a CNN-GRU pipeline and employs a multi-critic architecture to enhance velocity tracking and maneuver execution.
Results demonstrate state-of-the-art sim-to-real transfer with 100% success in real-world tests, highlighting the significance of PAS and geometry-driven control.

Perception-Driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour (PUMA): Technical Analysis

Introduction

The PUMA framework addresses the persistent challenge of robust, agile navigation in quadruped robots executing parkour-style maneuvers on discrete, complex terrains. In contrast to traditional hierarchical methods that decouple foothold planning and low-level control, PUMA proposes an end-to-end perception-driven architecture that integrates geometric reasoning through egocentric foothold priors, fusing onboard depth sensing and proprioceptive signals. The resultant system demonstrates state-of-the-art sim-to-real transfer, robust adaptive locomotion, and significant advances in exploiting terrain affordances for enhanced locomotive capability.

Methodology

Unified Single-Stage Learning Pipeline

PUMA employs a unified, single-stage RL pipeline based on an asymmetric actor-critic architecture. Both proprioceptive and exteroceptive (depth image) observations are combined and processed through a CNN, followed by temporal modeling with GRU layers and a multi-headed MLP to estimate egocentric foothold priors, base velocity, and latent terrain features. These predictions are concatenated with the current observation vector and forwarded to the policy network, which is optimized using PPO. All neural modules are trained concurrently, eschewing pre-training or hierarchy.

Figure 1: Overview of the PUMA training framework, detailing the fusion of proprioception and depth, egocentric foothold prior estimation, concatenated policy input, and multi-critic reward structure with PAS strategy for input annealing.

Egocentric Polar Foothold Prior

Central to PUMA is the egocentric polar foothold prior, which eschews explicit Cartesian target following in favor of relative distance and heading representations for the front feet. This design simplifies regression complexity and allows the policy to utilize terrain affordances, such as inclined walls, for high-agility maneuvers (e.g., wall-assisted jumps). The prior guides velocity tracking by representing footholds as a vector of four scalars: distances from each forefoot to the expected point and heading errors to the current and subsequent footholds.

Probability Annealing Selection (PAS)

To address instability in foothold regression early in training, PUMA introduces PAS: a curriculum-based annealing approach that probabilistically blends ground-truth and predicted priors as actor inputs. The probability of using the estimator's prediction gradually increases over training, ensuring stable convergence.

Multi-Critic Reinforcement Structure

A critical architectural innovation is PUMA's multi-critic (MuC) framework, in which separate value networks are optimized for distinct reward groups (task, foothold, and regularization). Each critic estimates the advantage for its designated reward, and the weighted, normalized advantages are summed for use in policy updates. Empirical analysis demonstrates that MuC is essential for balancing the competing objectives of velocity tracking and dynamic terrain exploitation, especially during phase transitions in complex maneuvers.

Figure 3: Temporal evolution of body velocity and total contact forces during a complete parkour jump; PUMA (orange) maintains superior velocity tracking and impulse generation compared to single-critic baselines.

Terrain Curriculum and Domain Randomization

PUMA’s training curriculum spans a range of procedurally generated terrains—wall-assisted gaps, surmounting platforms, and stepping stones—with difficulty ramping via gap widths, heights, and inclinations.

Figure 2: Terrain progression from left to right, with increasing roughness and inclination to systematically expand policy capability.

Heavy domain randomization in both sensor modalities and physical parameters, as well as asynchronous input delay modeling, underpins successful sim-to-real transfer.

Experimental Results

Quantitative Performance

PUMA sets new benchmarks in both simulated and physical environments for success rate (SR) and traverse rate (TR) across multiple challenging terrain settings. Notably, the ablation studies reveal:

Foothold prior is critical: Removing the prior or relative distance terms degrades SR on wall-assisted and surmounting terrains by over 30–80 percentage points.
MuC is essential on complex terrain: Single-critic architectures (w/o MuC) exhibit catastrophic failure modes, primarily due to inadequate estimation of the foothold-related reward landscape.
Polar prior superior to Cartesian: Egocentric polar priors outperform both explicit and implicit Cartesian foothold objectives in MSE regression accuracy and final maneuver success rates.

Training Efficiency and PAS

PAS enables stable and rapid convergence by avoiding premature reliance on an immature foothold estimator. Shortening or eliminating the annealing period induces policy collapse or slows training considerably.

Figure 5: PUMA outperforms single-critic and alternative prior architectures in foothold reward accumulation (left), and PAS minimizes sample complexity to convergence (right).

Real-World Sim2Real Transfer

The PUMA policy, trained exclusively in simulation, achieved 100% success rate in real-world tests on a DeepRobotics Lite3 quadruped for both wall-assisted gaps up to 80°, surmounting tasks up to 0.7 m high, and stepping stone traversals. Baselines and ablation variants suffered from instability, missed contacts, or insufficient body posture adaptation.

Figure 4: Demonstrated parkour performance on discrete and highly inclined real-world terrains for wall-assisted gap, stepping stones, and platform surmounting tasks.

Analysis of Failure Cases

Failure is typically attributed to erroneous estimation of foothold priors under proprioceptive or perceptual noise, inadequate postural adaptation (especially with yaw-only or Cartesian prior variants), and inability to coordinate timing and force during contact-rich transitions.

Figure 6: Failure modes include roll misadaptation, misorientation, erroneous foothold estimation, and low-force contacts impeding dynamic maneuvers.

Theoretical and Practical Implications

The proposed egocentric polar foothold prior constitutes a minimalist, geometry-driven representation that supports robust generalization, improves regression tractability, and decouples spatial aspects of locomotion. The single-stage, end-to-end design eliminates brittle dependencies on explicit plan following, facilitating real-time reactivity and improved exploration in RL. Multi-critic RL architectures demonstrate clear superiority for multi-objective, contact-rich robotic tasks. The PAS curriculum is validated as a critical component for stabilizing simultaneous estimator and policy learning.

Practically, PUMA considerably reduces pipeline complexity, sensor fidelity requirements, and sim-to-real transfer barriers for dynamic agile locomotion on quadrupedal robots. These advances are directly applicable to field robotics, search-and-rescue, and infrastructure inspection domains.

Future Directions

While PUMA demonstrates robust geometric reasoning, it does not explicitly incorporate semantic terrain understanding, material adaptation, or dynamic environment perception. Integrating learned semantic priors, visual affordance predictors, and temporal scene understanding remains a pertinent direction. Furthermore, extending the approach to adaptive gaits, deformable terrain contacts, multi-agent coordination, and closed-loop visual feedback for hazard anticipation could further amplify real-world applicability.

Conclusion

PUMA introduces a compact, perception-driven paradigm for encoding motion guidance via egocentric foothold priors within a unified, end-to-end learning framework, empirically validated on demanding parkour benchmarks and sim-to-real transfer tasks. The work substantiates the necessity of reward-space decomposition via multi-critic RL and curriculum-based estimator input blending. PUMA sets a new reference point for future research on perception-guided, robust and adaptive legged locomotion (2601.15995).

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We found no open problems mentioned in this paper.

PUMA: Perception-driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour

Summary

Perception-Driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour (PUMA): Technical Analysis

Introduction

Methodology

Unified Single-Stage Learning Pipeline

Egocentric Polar Foothold Prior

Probability Annealing Selection (PAS)

Multi-Critic Reinforcement Structure

Terrain Curriculum and Domain Randomization

Experimental Results

Quantitative Performance

Training Efficiency and PAS

Real-World Sim2Real Transfer

Analysis of Failure Cases

Theoretical and Practical Implications

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (7)

Collections

Tweets

PUMA: Perception-driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour

Summary

Perception-Driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour (PUMA): Technical Analysis

Introduction

Methodology

Unified Single-Stage Learning Pipeline

Egocentric Polar Foothold Prior

Probability Annealing Selection (PAS)

Multi-Critic Reinforcement Structure

Terrain Curriculum and Domain Randomization

Experimental Results

Quantitative Performance

Training Efficiency and PAS

Real-World Sim2Real Transfer

Analysis of Failure Cases

Theoretical and Practical Implications

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (7)

Collections

Tweets