Joint Torque Perturbation Injection

Updated 11 December 2025

The paper presents JT-SPI, a method that injects state-dependent torque perturbations to expand the range of dynamics discrepancies during simulation.
It utilizes an MLP to generate zero-mean perturbations applied to nominal joint torques, thereby improving robustness against unmodeled disturbances.
Empirical results show JT-SPI achieves 100% zero-shot success on real hardware, outperforming traditional domain randomization and ERFI methods.

Joint torque space perturbation injection (JT-SPI) is a methodology for improving the robustness of learned control policies for legged robots, particularly in the context of sim-to-real transfer for humanoid locomotion. Unlike standard domain randomization, which varies simulation parameters within a finite set, JT-SPI introduces direct, state-dependent perturbations to the joint torque inputs during simulation. This technique exposes control policies to a much broader and more abstract class of reality gaps, including those not easily parameterizable in standard simulators, thereby enabling superior generalization and resilience to unmodeled disturbances (Cha et al., 9 Apr 2025).

1. Mathematical Foundations

JT-SPI operates by perturbing the nominal joint torques generated by a learned policy $\pi_\theta$ for each control step in simulation. The key components are:

The policy produces normalized actions $a_t \in [-1,1]^n$ , for $n$ robot joints, mapping to torques via element-wise scaling with torque limits $\tau_\text{lim}$ :

$\tau_\text{nom}(s_t) = \tau_\text{input}, t = \tau_\text{lim} \odot a_t$

A multi-layer perceptron (MLP) $\tau_\phi : O_\text{priv} \rightarrow \mathbb{R}^n$ injects a zero-mean, state-dependent perturbation:

$\delta \tau_t = \tau_\phi(o_{\text{priv},t}) = \sigma_\text{lim} \cdot \tanh(\text{MLP}_\phi(\text{normalize}(o_{\text{priv}, t})))$

$\sigma_\text{lim}$ is the maximum perturbation magnitude.

This perturbation can be modeled as sampling from a conditional distribution:

$\delta \tau_t \sim \mathcal{N}(0, \Sigma(s_t))$

where the covariance $\Sigma(s_t)$ is implicitly encoded by the MLP’s output structure and scaling. Layer biases are set to zero, ensuring $\text{MLP}_\phi(0) = 0$ , and inputs are normalized by running standard deviation.

The forward dynamics are modified as:

$s_{t+1} = f_\text{sim}(s_t, \tau_\text{nom}(s_t) + \delta \tau_t)$

with the joint-space equation:

$\bar{M}(q)\,\ddot{q} + \bar{C}(q,\dot{q}) + \bar{G}(q) + \bar{\tau}_\text{contact}(s) = \tau_\text{input} + \delta \tau$

Effectively, JT-SPI substitutes the simulator’s parametric error model $\tau_{DR}(s; p_{DR})$ with a sample from a broad family of nonlinear, state-dependent torque errors.

2. Training Protocol and Implementation

JT-SPI is incorporated into standard on-policy reinforcement learning pipelines such as PPO. The procedure comprises:

Use parallelized simulation environments, with perturbations injected into half the environments and the other half left unmodified to prevent overfitting to the perturbed domain.
At each training rollout, sample new MLP weights $\phi$ via Xavier initialization; $\phi$ is not learned, but randomly resampled per episode.
Each simulation step applies:
- Nominal policy torque as above.
- If in a perturbed environment: evaluate $\delta \tau$ using the privileged observation $o_\text{priv}$ , including normalized simulator state quantities (base pose, velocities, joint angles and velocities, input torque, contact wrenches).
- The robot is then simulated with $\tau_\text{nom} + \delta \tau$ .
Policy and value updates follow standard PPO with an additional adversarial motion-prior loss (AMP) and a small gradient penalty, aggregating as:

$L_\text{total} = L_\text{PPO} + L_\text{AMP} + 0.002 \cdot L_\text{gradpen}$

Perturbation and network parameters are typically: $\sigma_\text{lim, joint} = 50$ Nm, $\sigma_\text{lim, base} = 80$ N, a $2 \times 256$ -unit ReLU MLP with tanh output, zero biases, and $\phi$ resampled per episode.

3. Comparative Evaluation

Performance was benchmarked against:

Domain Randomization (DR): Standard randomization of simulation parameters (masses, inertias, friction, damping, actuator properties).
ERFI: State-independent random force-injection (baseline from Campanaro et al. 2024).
JT-SPI: As described, with state-dependent torque perturbations.

Key comparative findings:

Scenario	DR	ERFI	JT-SPI
Nominal, target 0.4 m/s	$\pm 0.02$ m/s error	as DR	as DR
Actuator gap (stiffness 250)	0/3 success (all fall)	3/3 stable, <0.05	3/3 stable, <0.03
Contact gap (soft ground)	0/3 success (all fall)	0/3	3/3 ≈0.38 m/s
Real robot, uneven floor	2/3, RMS ≈0.06 m/s	0/3	3/3, RMS ≈0.04

Perturbing only half the environments was found to be critical; using all perturbed environments led to policy pathologies, while proper input normalization and zero-bias in $\tau_\phi$ were necessary for stable learning.

JT-SPI achieves $100\%$ zero-shot success on real hardware, compared to approximately $67\%$ for DR and $0\%$ for ERFI, indicating a significant expansion in the set of reality gaps handled (Cha et al., 9 Apr 2025).

4. Practical Implementation Guidelines

JT-SPI is compatible with physics engines such as MuJoCo or IsaacGym. Direct torque injection occurs via the control interface (ctrl vector). For floating-base robots, base forces may also be perturbed, though base moment perturbations are often omitted.

Efficient parallelization is advised: privileged observation normalization and MLP evaluation should be vectorized across environments and, if possible, directly implemented on the GPU (e.g., through custom PyTorch operators in IsaacGym).

Recommended hyperparameters include:

$\sigma_\text{lim, joint} = 50$ Nm
$\sigma_\text{lim, base} = 80$ N
MLP size $256 \to 256$ with ReLU activations, tanh output, and zero bias
PPO learning rate $3 \times 10^{-4}$ , minibatch size $98,304$, $5,000$ updates
AMP loss weight $\alpha \approx 0.5$ , gradient penalty $0.002$
Privileged state channels normalized by running standard deviation (zero maps to zero)

5. Key Empirical Results

JT-SPI-trained policies demonstrate the following:

Across all tested dynamics perturbations (including actuator stiffness and soft ground), JT-SPI policies exhibit higher success rates and lower command-tracking errors than those trained with DR or ERFI.
Equivalent nominal performance: Under standard simulation conditions, all methods achieve similar performance, with tracking error within $\pm 0.02$ m/s for a $0.4$ m/s target velocity.
Superior resilience to unmodeled disturbances: JT-SPI policies yield more stable center-of-mass trajectories and contact force profiles.
In sim-to-real transfer to TOCABI humanoid hardware on an irregular laboratory floor, JT-SPI yields $3/3$ success (RMS error $\approx 0.04$ m/s), whereas DR and ERFI attain $2/3$ and $0/3$, respectively.

6. Limitations and Potential Extensions

Current limitations of JT-SPI include:

The perturbation network weights $\phi$ are sampled randomly per episode; meta-learning of adversarial or worst-case $\phi$ via adversarial RL (cf. RARL) is a potential extension.
The model assumes zero-mean perturbations; real hardware may present biased errors, which could be incorporated by learning bias terms.
Only joint torques and base (but not moments) are perturbed; extending to contact-frame perturbations or state-channel noise may further enrich modeled reality gaps.
Safety under very large $\sigma_\text{lim}$ is not guaranteed over long horizons; scheduled annealing of perturbation magnitude during training could mitigate this risk.

This suggests that further extension of JT-SPI could yield even broader robustness, particularly by integrating learned bias, adversarial perturbation, or contact-channel noise modeling.

7. Broader Implications

By allowing arbitrary, state-dependent distortions in torque space, JT-SPI systematically enlarges the diversity of dynamics discrepancies to which policies are exposed during training. This method enables control policies to generalize beyond the finite families of simulated parameter uncertainty typically considered in domain randomization, thus achieving higher rates of zero-shot real-world success in complex humanoid locomotion tasks (Cha et al., 9 Apr 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Joint Torque Space Perturbation Injection.

Joint Torque Perturbation Injection

1. Mathematical Foundations

2. Training Protocol and Implementation

3. Comparative Evaluation

4. Practical Implementation Guidelines

5. Key Empirical Results

6. Limitations and Potential Extensions

7. Broader Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Joint Torque Perturbation Injection

1. Mathematical Foundations

2. Training Protocol and Implementation

3. Comparative Evaluation

4. Practical Implementation Guidelines

5. Key Empirical Results

6. Limitations and Potential Extensions

7. Broader Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research