ContactNets: Data-Driven Contact Dynamics
- ContactNets are data-driven frameworks that model rigid-body contact dynamics using smooth implicit functions for signed distances and contact-frame Jacobians.
- They implement mechanics-inspired loss functions optimized via differentiable quadratic programs to enforce complementarity and friction constraints.
- ContactNets demonstrate superior performance in robotics simulations by reducing penetration errors and integrating seamlessly with control and vision systems.
ContactNets are a family of data-driven frameworks for modeling rigid-body contact dynamics using smooth, implicit representations of inter-body signed distance and contact-frame Jacobians. These methodologies provide physically consistent predictions for systems exhibiting discontinuous contact behavior such as impact, stiction, and friction, and integrate seamlessly with robotics simulation, control, and planning environments by learning representations compatible with complementarity and maximum dissipation principles (Pfrommer et al., 2020, Sun et al., 2023).
1. Mathematical Foundations
Central to ContactNets is the implicit learning of key geometric and physical quantities defining the contact manifold:
- Inter-body Signed Distance: The contact gap for up to potential contacts is modeled as a smooth, differentiable function
where is the generalized configuration and parameterizes either direct geometric attributes or neural network weights (Pfrommer et al., 2020).
- Contact-Frame Jacobians: For each contact ,
yielding capturing both normal and tangential directions.
- Complementarity and Friction Laws: Each contact enforces the complementarity condition
and Coulomb friction via
- Integration into Discrete-Time Dynamics: The time-stepping rigid-body update with contact impulses is
where is the inertia matrix and are applied non-contact forces (Sun et al., 2023).
2. Loss Function and Optimization
ContactNets adopt a mechanics-inspired loss, minimized via a differentiable quadratic program (QP) over contact impulses:
with component costs:
- Prediction error between predicted and true contact impulse;
- Spurious-force penalty for activation in non-contacting states;
- Penalty for non-penetration violation under forward Euler step;
- Maximum-dissipation violation in friction (Pfrommer et al., 2020).
The QP is tractable and differentiable (e.g., via OSQP, osqpth), enabling end-to-end training through sensitivity analysis. For cyclic frameworks relying on vision-dynamics feedback, additional loss terms penalize penetration, complementarity violation, and dissipation error (Sun et al., 2023).
3. Network Architectures and Training
ContactNets employ different architectures according to geometric prior knowledge:
- ContactNets Polytope: Minimal-parameter model where encodes geometric priors (e.g., corners of a cube), and gap/Jacobian calculations are explicit.
- ContactNets Deep: Augments the above with a residual -MLP mapping to gap corrections: .
- End-to-End Baseline: Four-layer ReLU MLP trained directly on contact impulse regression.
Training commonly uses AdamW (learning rates: for ContactNets, for baseline), no weight decay for ContactNets, and explicit normalization/orthogonality regularizers for Jacobians. Entire datasets are often used as batches per epoch, and stopping is based on validation-loss plateauing (Pfrommer et al., 2020).
For instance-agnostic settings, ContactNets uses an MLP with 4-6 layers (width 128–256) for each of the gap and Jacobian functions, initialized from coarse, vision-derived priors (e.g., BundleSDF SDFs) (Sun et al., 2023).
4. Empirical Performance
ContactNets have demonstrated compelling empirical results in high-fidelity contact tasks. On a dataset of 570 10 cm cube tosses tracked at 148 Hz:
| Model | Pos Error (cm) | Rot Error (deg) | Penetration (%) | Pos Error (cm) | Rot Error (deg) | Penetration (%) |
|---|---|---|---|---|---|---|
| 32 Tosses | 256 Tosses | |||||
| ContactNets Polytope | 1.5 | 5.2 | 2.0 | 0.8 | 3.1 | 1.8 |
| ContactNets Deep | 2.0 | 6.0 | 3.1 | 0.9 | 3.5 | 2.7 |
| End-to-End | 4.5 | 12.4 | 15.0 | 3.5 | 10.2 | 20.5 |
Notably, both ContactNets variants converge after only 32 tosses, whereas the direct end-to-end baseline requires approximately 256. The baseline exhibits significant ground penetration (up to 20% of the cube width) and rotational drift, while ContactNets maintains physical plausibility with sub-6% penetrations even in low-data regimes. ContactNets Polytope outperforms Deep when geometric priors are highly informative, while Deep is advantageous for non-polytope or curved objects (Pfrommer et al., 2020).
5. Integration with Simulation, Control, and Vision
At inference, learned gap and Jacobian functions directly replace geometric/collision modules in simulators such as MuJoCo, Bullet, or Drake without modification to controllers or planner interfaces. For contact-aware planners (e.g., complementarity-based LQR), the learned models are directly compatible (Pfrommer et al., 2020).
In instance-agnostic, vision-integrated systems, ContactNets refines both geometry and dynamics: the learned signed-distance function produces a 3D point cloud by extracting the zero-level set, which is then fed into vision modules for pose tracking refinement (e.g., via ICP and reprojection-based alignment), achieving an end-to-end loop for unsupervised geometry and dynamics learning without CAD priors (Sun et al., 2023).
6. Extensions, Practical Limits, and Computational Aspects
ContactNets are amenable to several extensions:
- Learning elastic restitution functions alongside signed distance;
- Modeling continuous sub-contact forces with additional neural penalty networks;
- Integrating vision-based object representations, such as via latent keypoint spaces or SDFs;
- Scaling to multi-body scenarios by increasing the potential contact count .
The computational load is dominated by per-sample QP solves during training (typically a few hours/GPU), with test-time cost comparable to traditional rigid-body engines. Practical settings for hyperparameters include learning rates in , weight decay $0$–, two hidden layers of 256 units, and a regularizer weight of 0.3 for Jacobian normalization. Non-penetration is enforced both as a hard constraint and as a loss penalty (Pfrommer et al., 2020, Sun et al., 2023).
7. Significance and Applications
ContactNets represent a principled advance in learning contact dynamics by embedding complementarity and friction constraints into the learning objective while leveraging geometric neural priors. This approach resolves a longstanding gap in modeling nonsmooth phenomena such as stick-slip, impact, and stiction in a data-efficient, differentiable manner. ContactNets have been shown to generalize both to cases with strong shape priors (cube, polytope) and to instance-agnostic settings jointly learning shape, dynamics, and pose from RGBD video (Pfrommer et al., 2020, Sun et al., 2023). These capabilities facilitate integration into robotic planning, control pipelines, and vision-based system identification without reliance on accurate CAD data or motion capture.