LimX Oli Humanoid Robot
- Oli is a full-size 31-DoF humanoid platform featuring advanced actuation, real-time proprioceptive and exteroceptive sensing, and integrated onboard computation.
- It is benchmarked on velocity tracking and motion imitation tasks using data-efficient and planner-guided reinforcement learning frameworks such as PvP and FastStair.
- Empirical evaluations show superior dynamic control performance and agile stair-climbing, establishing Oli as a cutting-edge platform for humanoid learning research.
The LimX Oli humanoid robot is a full-size, 31-degree-of-freedom (DoF) robotic platform designed for advanced whole-body control and dynamic locomotion in complex environments. It is distinguished by its comprehensive actuation, proprioceptive and exteroceptive sensing modalities, onboard computational capabilities, and its prominent role in benchmarking data-efficient reinforcement learning (RL) and planner-guided motor skill frameworks. Oli is the primary physical platform for empirical evaluation in both the PvP contrastive SRL paradigm (Yuan et al., 15 Dec 2025) and the FastStair planner-guided locomotion system (Liu et al., 15 Jan 2026), highlighting its relevance for state-of-the-art humanoid learning research.
1. Mechanical and Sensing Specifications
LimX Oli stands 1.65 m tall, weighing 55 kg, with a kinematic structure that supports 31 independently actuated joints: 6 per leg, 7 per arm, 3 at the waist, and 2 in the head. Each joint employs a brushless DC motor, harmonic-drive reduction, and low-backlash gearboxes. Precise torque tracking is achieved via onboard joint-level PID control loops. The platform provides high-resolution rotary encoders for joint position and velocity estimation, motor current-based torque sensing, and base angular velocity and acceleration from an IMU.
Proprioceptive observations available at runtime include:
- Joint positions , velocities
- Base angular velocity (3D)
- Estimated gravity vector (3D)
- Commanded planar velocity
- Previous action , gait clock signals, and real-time local elevation maps processed by a torso-mounted Intel RealSense D435i depth camera.
Privileged (simulation-only) observations include root linear velocity, root pose, per-link pose/velocity data, external contact forces, terrain features, and model-based planner suggestions.
Onboard computation is distributed across an NVIDIA Jetson Orin NX (terrain reconstruction) and a Rockchip RK3588 SBC (policy inference, low-level joint control), communicating by UDP at up to 100 Hz.
2. Control Tasks and Evaluation Frameworks
Oli is evaluated primarily on two representative control tasks:
- Velocity Tracking (LimX-Oli-31dof-Velocity):
- The robot tracks commanded planar velocity resampled every 10 s, within m/s, m/s, rad/s.
- Reward is computed via Gaussian exponentiation of velocity and rotational errors, penalizing deviations in base height, action smoothness, joint power, and command limits.
- Key metrics: overall discounted return, velocity-tracking accuracy ( error), and action smoothness (second-order difference).
- Motion Imitation (LimX-Oli-31dof-Mimic):
- Oli is tasked to imitate one of 20 human-motion clips (max length 4300 frames). Rewards emphasize joint-position tracking ( exponentiated error, weight 2.0), foot alignment, waist pitch, and penalize excessive actuator activity.
- Evaluation metrics include global imitation return and joint alignment error.
These tasks serve as benchmarks for state representation learning (SRL) integration with RL, supporting modular evaluation via the SRL4Humanoid framework.
3. PvP Contrastive Representation Learning on Oli
PvP (Proprioceptive-Privileged contrastive learning) is instantiated on Oli to address sample inefficiency and partial observability in RL. States are partitioned into:
- Proprioceptive state : solely real observations
- Privileged state : proprioceptive data augmented with sim-only signals ()
A positive pair is constructed where is zero-masked at privileged entries. Both vectors are passed through a shared MLP encoder (512-256-128, ELU activation) and a predictor (128-128-128). PvP optimizes negative-cosine similarity losses with stop-gradient regularization:
The learned encoder is reused in the PPO policy , ensuring compact, task-relevant feature extraction. Updates to PvP loss are interval-triggered every 50 RL steps to avoid early collapse—a practice shown essential by ablation studies.
4. Planner-Guided Stair-Climbing and Control Integration
Oli achieves high-agility stair climbing through the FastStair framework, which integrates a GPU-parallel model-based Divergent Component of Motion (DCM) foothold planner into RL:
- At each swing phase, the DCM-based planner generates admissible foothold candidates by minimizing a quadratic cost subject to VHIP stair dynamics and terrain constraints.
- Rewards combine linear velocity tracking, planner-guided foothold error, posture, energy use, and stumble penalties.
- The RL training is multi-staged: base policy prioritizing foothold safety (Stage 1), speed-specialized expert fine-tuning (Stage 2), and unification via Low-Rank Adaptation (LoRA) (Stage 3), which injects trainable adapters into frozen actor network weights (, ).
The control loop operates at 100 Hz, integrating exteroceptive elevation maps and proprioceptive signals for real-time operation.
5. Empirical Results and Key Performance Indicators
Quantitative results on Oli demonstrate substantial gains from these frameworks:
| Task | PPO | PPO+VAE | PPO+SPR | PPO+SimSiam | PPO+PvP |
|---|---|---|---|---|---|
| Velocity | 0.61±0.04 | 0.63±0.05 | 0.66±0.03 | 0.67±0.04 | 0.85±0.02 |
| Imitation | 0.65±0.03 | 0.60±0.06 | 0.72±0.04 | 0.74±0.03 | 0.78±0.02 |
- In velocity tracking, PPO+PvP reaches average return of 0.85 after 200k steps (vanilla PPO at ≈ 0.6). In motion imitation, PPO+PvP converges to ≈ 0.78 (PPO alone ≈ 0.65).
- PvP exhibits 40% faster reduction in action jerk penalty compared to PPO and achieves lowest joint tracking error in imitation.
- For stair ascent, Oli maintains commanded speed up to 1.65 m/s, traverses a 33-step spiral staircase (17 cm rise/step) in 12 s with zero falls over repeated trials. The system demonstrates robustness to step height variation and high-speed commands (>70% success at 2.0 m/s, 80%+ up to 1.5 m/s).
- A plausible implication is that the combination of high-frequency control, proprioceptive-rich SRL, and planner-guided curriculum yields superior whole-body agility and stability in challenging terrain.
6. Best Practices, Generalization, and Practical Guidelines
Insights distilled from the above frameworks include:
- Sim-only privileged signals (e.g., link-COM, contact forces, terrain) should be leveraged as pseudo-augmentations for contrastive representation, superseding hand-crafted augmentation schemes.
- SRL loss should remain focused on the policy encoder to prevent degradation of value estimation; mixing with critic encoders impairs velocity and imitation performance.
- PvP interval updates every 30–100 steps outperform per-update schedules for low-diversity data.
- The SRL weight is most robust in the range [0.1, 1.0]; found effective for the afforded task diversity.
- SRL4Humanoid’s modular toolkit allows adaptable encoder architectures (network widths 512–128 generalize to 20–50 dimension state spaces).
These principles provide foundational guidance for extending data-efficient, robust whole-body control to other humanoid platforms with diverse morphologies.
7. Research Significance and Outlook
Oli embodies a benchmark platform for rigorous, data-efficient humanoid learning and planner-guided control. It demonstrates that SimSiam-style contrastive SRL, when harmonized with high-frequency RL and GPU-accelerated model-based planning, bridges the gap between sample-efficient representation learning and the real-time performance demands of dynamic human-scale robots. Ongoing research using the SRL4Humanoid framework and modular LoRA adapter integration on Oli suggests continued advancements in generalizable, high-performance humanoid motor skill acquisition.