Delta Action Modeling (ASAP)

Updated 8 February 2026

Delta action modeling in ASAP is a direct action-space correction method that learns a residual policy to align simulated actions with real-world dynamics.
The approach leverages a two-stage workflow where a nominal policy is first trained in simulation and then fine-tuned with real-world data to reduce tracking errors.
Quantitative results demonstrate that ASAP’s delta action framework consistently outperforms conventional techniques in sim-to-sim and sim-to-real transfers.

Delta action modeling, as operationalized within the ASAP framework (Aligning Simulation and Real-World Physics), is a direct action-space compensation method for mitigating the dynamics gap between simulation and real-world control in agile humanoid robotics. Rather than tuning simulation parameters or randomizing environmental properties, a learned residual action policy outputs corrections that, when added to standard simulated actions, induce simulated transitions that closely reproduce real-world robot trajectories. This approach enables whole-body policies with high agility and coordination, surpassing conventional system identification, domain randomization, and delta dynamics learning in both sim-to-sim and sim-to-real transfer regimes (He et al., 3 Feb 2025).

1. Conceptual Foundations of Delta Action Modeling

Delta action modeling is predicated on the observation that even high-fidelity simulators exhibit systematic discrepancies from real-world robot dynamics, especially for agile, highly-actuated humanoid behaviors. Instead of correcting for these mismatches at the level of physical parameters (as in system identification), or relying on aggressive domain randomization (which can yield overly conservative policies), the delta modeling approach operates directly in the action space. Formally, a corrective (residual) action model $\pi^\Delta$ is learned such that

$a_{t}^{\rm exec} = a_{t}^{\rm sim} + \Delta a_t,$

where $a_{t}^{\rm sim}$ is the nominal simulated action, and $\Delta a_t = \pi^\Delta(s_t, a_{t}^{\rm sim})$ is the residual produced for the state-action pair. When applying $a_{t}^{\rm exec}$ in simulation, the subsequent transition $s_{t+1}^{\rm sim} = f^{\rm sim}(s_t, a_{t}^{\rm exec})$ is trained to match the real-world state $s_{t+1}^{r}$ as closely as possible. This structure allows the original policy to remain robust to unmodeled dynamics after fine-tuning within this residual-corrected simulator.

2. Mathematical Architecture and Training Protocol

Let $\hat\pi(s_t)$ denote the nominal policy pre-trained entirely in simulation:

$a_{t}^{\rm sim} = \hat\pi(s_t).$

The delta action policy $\pi^\Delta_\theta$ is parameterized as a multilayer perceptron. Its input is a concatenation of $s_t \in \mathbb{R}^{59}$ and $a_{t}^{r} \in \mathbb{R}^{23}$ (or $\mathbb{R}^4$ for the reduced ankle model), giving a total input dimension of 82 (resp. 63). The network comprises two shared hidden layers (width 256, ReLU activations; layer normalization before each), an output linear layer matching the action dimension, and a scaled tanh for range. L2 regularization with weight $10^{-4}$ is applied.

The reward for delta model training at each step incorporates squared error in state prediction, as well as a penalty on the action norm:

$r_t = -\|s_{t+1}^{\rm sim} - s_{t+1}^{r}\|^2 + \alpha\left(\exp(-\|\Delta a_t\|) - 1\right) + \text{(additional penalties/regularizers)}$

3. Two-Stage ASAP Workflow

Stage 1: Nominal Policy Training in Simulation

Human motion data is retargeted (via TRAM [wang2025tram]) and cleaned for robot feasibility using MaskedMimic in IsaacGym, yielding physical motion tracks for imitation learning.
The policy is trained with PPO using actor–critic MLPs (2×256), domain randomization on friction, PD gains, and delay, and a phase curriculum. Actions are target joint angles for a low-level PD controller. The reward prioritizes reference tracking and regularizes joint and base motion.

Stage 2: Delta Action Learning and Fine-Tuning

The sim-trained policy is deployed on a Unitree G1 robot. Full proprioception (base position, velocities, joint states) is logged at 100 Hz via MoCap.
The delta policy is trained using PPO in IsaacGym, replaying real-world action/state pairs, initializing the simulator at each real $s_t^{r}$ , and maximizing reward for sim-to-real state alignment.
Once converged, the trained $\pi^\Delta$ is integrated into the simulator. The nominal policy $\hat\pi$ is then fine-tuned in this residual-corrected simulation, ensuring robustness to real-world discrepancies.
The policy is deployed in the real environment with the learned residual for closed-loop evaluation.

Pseudocode for these stages is provided in the ASAP manuscript (see Technical Report, Sec. 5) (He et al., 3 Feb 2025).

4. Quantitative Evaluation and Empirical Performance

Evaluation metrics include global mean per-joint position error ( $E_{\rm g\text -mpjpe}$ ), root-relative MPJPE, acceleration error ( $E_{\rm acc}$ ), velocity error ( $E_{\rm vel}$ ), and success rate (percentage of rollouts remaining within 0.5 m of reference). Closed-loop performance was examined in sim-to-sim settings (IsaacGym-to-IsaacSim, IsaacGym-to-Genesis) and sim-to-real on Unitree G1.

Table: Representative Results for Sim-to-Sim and Sim-to-Real Transfer

Scenario	Method	Succ (%)	g-MPJPE (mm)	MPJPE (mm)
IsaacGym→IsaacSim	Vanilla	100	107	45.4
	ASAP	100	106	44.3
IsaacGym→Genesis	Vanilla	100	140	70.1
	ASAP	100	125	73.5
Unitree G1 (Kick)	Vanilla	–	61.2	43.5
	ASAP	–	50.2	40.1

ASAP’s delta action approach consistently yields lower tracking errors and higher success rates. In the LeBron "Silencer" out-of-distribution test, ASAP achieves $47.5$ mm MPJPE versus $55.3$ mm for the vanilla policy, and reduces g-MPJPE from $159.0$ mm (vanilla) to $112.0$ mm.

Ablations confirm that naïve action-noise injection, fixed-point, or gradient-based compensation underperform relative to RL-trained delta action models. Residual actions learned via delta modeling yield direct improvements in tracking and agility for both in-distribution and OOD motions.

5. Comparative Analysis with Preceding Sim-to-Real Techniques

Delta action modeling delivers fundamental differences from:

System Identification (SysID): Instead of exhaustive search or experiment-driven parameter fitting, delta action modeling obviates explicit actuator or contact parameter selection, and does not require torque sensing.
Domain Randomization (DR): Large-scale randomization leads to policies robust but often conservative. Delta action modeling retains agility by focusing correction on the minimal action-space residuals required for sim-to-real consistency.
Delta Dynamics Learning: Modeling residuals in the state transition domain may suffer from accumulating prediction errors; direct action-space correction avoids this compounding and leverages RL objectives to target rollout accuracy.

6. Strengths, Limitations, and Extension Directions

Strengths

Direct action-space correction bridges sim-to-real mismatch without high-dimensional parameter tuning.
The residual correction is low-dimensional and data-efficient, requiring fewer real-world samples than full-dynamics identification.
Integrating a frozen delta action model during fine-tuning prevents instability due to drifting compensation or overfitting.

Limitations

Training full 23-DoF delta models currently faces thermal and hardware limitations on real robots, partially addressed by focusing on the ankle joints.
Dependence on optical motion capture for ground truth state acquisition; more scalable alternatives could include markerless vision or onboard estimation.
Data demand for residual learning could be reduced through meta-learning or online adaptation methodologies.

Future Extensions

Application of delta action models to other legged platforms, such as quadrupeds or exoskeletons.
Integration with probabilistic dynamics models or learned priors for hybrid sim-to-real adaptation.
Reduction in real-world data requirements via few-shot learning techniques.

7. Role within the Broader Reinforcement Learning and Robotics Landscape

Delta action modeling in ASAP exemplifies a contemporary focus on sim-to-real transfer via direct function approximation in action space, rather than indirect adjustment at the parameter or transition level. This enables rapid deployment of expressive and highly agile humanoid skills. The methodology can be seen as a generalizable corrective strategy, potentially complementary to techniques that promote action smoothness, such as the action-predictive ASAP formulation for RL oscillation reduction (Kwak et al., 26 Jan 2026), but uniquely tailored for high fidelity dynamical consistency between synthetic and physical platforms (He et al., 3 Feb 2025).

Markdown Report Issue Upgrade to Chat

References (2)

ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills (2025)

Enhancing Control Policy Smoothness by Aligning Actions with Predictions from Preceding States (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Delta Action Modeling (ASAP).

Delta Action Modeling (ASAP)

1. Conceptual Foundations of Delta Action Modeling

2. Mathematical Architecture and Training Protocol

3. Two-Stage ASAP Workflow

Stage 1: Nominal Policy Training in Simulation

Stage 2: Delta Action Learning and Fine-Tuning

4. Quantitative Evaluation and Empirical Performance

5. Comparative Analysis with Preceding Sim-to-Real Techniques

6. Strengths, Limitations, and Extension Directions

Strengths

Limitations

Future Extensions

7. Role within the Broader Reinforcement Learning and Robotics Landscape

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Delta Action Modeling (ASAP)

1. Conceptual Foundations of Delta Action Modeling

2. Mathematical Architecture and Training Protocol

3. Two-Stage ASAP Workflow

Stage 1: Nominal Policy Training in Simulation

Stage 2: Delta Action Learning and Fine-Tuning

4. Quantitative Evaluation and Empirical Performance

5. Comparative Analysis with Preceding Sim-to-Real Techniques

6. Strengths, Limitations, and Extension Directions

Strengths

Limitations

Future Extensions

7. Role within the Broader Reinforcement Learning and Robotics Landscape

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research