SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
Abstract: Robotic manipulation with deformable objects represents a data-intensive regime in embodied learning, where shape, contact, and topology co-evolve in ways that far exceed the variability of rigids. Although simulation promises relief from the cost of real-world data acquisition, prevailing sim-to-real pipelines remain rooted in rigid-body abstractions, producing mismatched geometry, fragile soft dynamics, and motion primitives poorly suited for cloth interaction. We posit that simulation fails not for being synthetic, but for being ungrounded. To address this, we introduce SIM1, a physics-aligned real-to-sim-to-real data engine that grounds simulation in the physical world. Given limited demonstrations, the system digitizes scenes into metric-consistent twins, calibrates deformable dynamics through elastic modeling, and expands behaviors via diffusion-based trajectory generation with quality filtering. This pipeline transforms sparse observations into scaled synthetic supervision with near-demonstration fidelity. Experiments show that policies trained on purely synthetic data achieve parity with real-data baselines at a 1:15 equivalence ratio, while delivering 90% zero-shot success and 50% generalization gains in real-world deployment. These results validate physics-aligned simulation as scalable supervision for deformable manipulation and a practical pathway for data-efficient policy learning.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about (in simple terms)
Teaching robots to handle soft, floppy things—like T‑shirts, towels, or cloth—is much harder than teaching them to pick up solid objects. The shapes bend and change, and tiny differences in how you grab them really matter. Collecting real robot practice for these tasks takes a lot of time and money.
This paper shows a way to make very realistic “practice worlds” (simulations) that match the real world closely enough that a robot can train only in simulation and then work on a real robot right away. The big idea: ground the simulator in real physics and real measurements so the fake practice is useful for the real job.
What questions the researchers asked
They focused on three simple questions:
- Can a robot policy (its learned behavior) trained only in simulation work as well as one trained on real‑world data?
- Does training with lots of diverse simulated experiences make the robot better at handling new, slightly different situations?
- Is simulated data an efficient “data scaler”? In other words, how many simulated examples equal one real demonstration?
How they did it (step by step, with easy analogies)
The team built a “real‑to‑sim‑to‑real” pipeline. Think of it like creating a high‑quality digital twin of the real setup, practicing in that twin, and then sending the skills back to the real robot.
1) Build a faithful digital twin of the real scene
- They scan real clothes (like a T‑shirt) using a professional 3D scanner to get a super accurate 3D model—down to tiny wrinkles.
- They load the actual robot’s design into the simulator so its movements and limits match the real robot.
- They recreate the table and room setup to the right size and position.
Why this matters: If the digital world is the wrong size or shape, the robot learns the wrong moves—like practicing with a different bat size before a game.
2) Use physics that keep soft materials realistic and stable
- Regular game/physics engines are great at hard objects, but soft things like cloth can “cheat” and stretch too much in fast simulations.
- The team added a special “strain guard” to the cloth physics: if any piece of the cloth stretches too far, virtual elastic forces pull it back. This keeps the cloth behaving like real fabric during robot contact.
- They calibrate the simulator by making the real robot and the simulated robot perform the same motions and then adjust the simulator’s settings (like friction and stiffness) until the cloth looks and moves the same in both.
Analogy: It’s like tuning a guitar by ear so that a note played on the simulator sounds like the real one.
3) Create lots of realistic training demos automatically
- They split human‑guided examples into two parts: the grasps (where the robot actually holds the cloth) and the in‑between moves.
- They reuse the good grasps (so contact is valid) and use a “diffusion” model to fill in the in‑between motions smoothly. Diffusion here acts like a smart “fill‑in‑the‑blanks” system that makes natural‑looking motions.
- They automatically filter out bad simulations (for example, if the cloth penetrates the table or the motion looks odd) using simple rules and a video‑based checker.
- They render each good example with varied lighting, textures, and camera angles so the robot sees many different looks—kind of like practicing on many different fields.
What they found and why it matters
- Training only in simulation worked: On a real T‑shirt folding task, policies trained purely on synthetic (simulated) data reached very high success rates (up to around 90% in zero‑shot deployment), close to or matching policies trained on real data.
- Simulation improves generalization: When they changed the real‑world setup (different lighting, textures, minor position shifts), the simulation‑trained policies handled these changes better—up to about 50% better in some tests—because they had already practiced with lots of variety in sim.
- Simulation scales efficiently: About 15 simulated examples gave similar training value as 1 real example for in‑domain tasks (and roughly 5:1 for some generalization cases). This means you can “grow” your dataset cheaply and quickly in sim.
- It works from scratch: Even when starting with no pretraining, the synthetic data allowed the robot to learn the folding task, while limited real data alone didn’t.
- Key ingredients matter: Accurate 3D scans and the stabilized cloth physics were crucial. Without them, the simulated cloth behaved unrealistically and learning didn’t transfer well.
Why this is important
- Cuts cost and speeds up training: You can gather most of your robot’s practice in a computer instead of spending tons of time collecting real demos.
- Safer and more flexible: Robots can train on risky or delicate tasks in simulation first.
- Opens doors to more soft‑object skills: Beyond T‑shirts, this approach can help with towels, bags, bedding, packaging, or even soft materials in healthcare.
A simple wrap‑up
This paper shows that if you ground a simulator in real measurements and physics—build accurate digital twins, stabilize the cloth behavior, and generate lots of realistic, varied practice—you can train robots to handle soft, wiggly objects using only synthetic data and still succeed on real robots right away. It’s a practical recipe for teaching robots tricky tasks without needing huge amounts of expensive real‑world data.
Note on limitations: Material tuning still needs expert adjustment for each new cloth type. In the future, automating that calibration and expanding to more fabrics and tasks would make this even more powerful.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
The paper leaves several concrete issues unresolved that future work can address:
- Material parameter estimation and calibration
- No automated pipeline to infer cloth physical parameters (e.g., density, , , anisotropy, damping) from data; calibration is expert-guided and qualitative (visual matching), making it non-scalable and non-reproducible.
- Lack of quantitative calibration metrics (e.g., edge-length distributions, drape contours, wrinkle statistics, force–displacement curves) and benchmarks to assess sim–real dynamic fidelity.
- No validation that the tuned parameters correspond to physically plausible values or generalize across tasks and contact regimes.
- Solver modeling and physical fidelity
- The proposed strain-constrained AVBD-style solver does not model key cloth properties (anisotropy along warp/weft, bending/shear coupling, viscoelasticity/hysteresis, thickness) and may not capture real woven/knit behavior.
- No quantitative comparison to established soft-body solvers (e.g., PBD/FEM/MPM) on accuracy, stability, and convergence under identical conditions with ground truth references.
- Absence of analysis on whether the strain-limiting mechanism injects non-physical energy or biases deformation modes, and how this affects learned policies.
- Missing robustness study under challenging contact regimes (stick–slip, high friction contrast, rapidly changing contact patches, self-contact entanglement).
- Unclear scalability and real-time performance: no reporting of mesh resolution, iteration counts, time-steps, FPS, or CPU/GPU utilization; unclear how performance degrades with higher-resolution cloths or multi-garment scenes.
- Contact and gripper interaction modeling
- Gripper compliance, surface micro-geometry, pressure-dependent friction, and adhesion effects are not modeled or validated; stick–slip transitions remain unsystematically handled.
- No study of how texture/appearance randomization aligns with physically meaningful contact/friction variations (i.e., appearance–physics coupling is unmodeled).
- Real-to-sim scene digitization and initialization
- The garment is scanned on a mannequin, yielding a static, watertight mesh; there is no method to recover rest-shape, thickness, or warp/weft directions needed for accurate cloth simulation.
- No approach to initialize crumpled or arbitrary starting states from real images; how to bridge from scanned rest geometry to realistic draped or wrinkled initial conditions remains open.
- Practical constraints (cost, time, failure modes) and robustness of high-precision scanning for diverse fabrics (e.g., glossy, thin, dark, patterned) are not characterized.
- Data generation and motion synthesis
- Trajectory synthesis reuses grasp configurations from demonstrations; the system does not learn/plan novel grasp points for new garments or unseen geometries.
- Diffusion-based motion generation is only used between fixed interaction segments; a fully closed-loop synthesis that reasons about deformable state feedback is not explored.
- Validity filtering relies on heuristic “vibe coding” thresholds and a binary video discriminator; the false positive/negative rates, failure modes, and sensitivity to rendering artifacts are not reported.
- No physics-constrained trajectory generator ensuring contact feasibility and collision-free motions without reliance on post-hoc filtering.
- Evaluation scope and generalization
- Real-world validation is limited to a single representative task (T-shirt folding) and one robot/gripper; claims of generality are not supported by multi-task, multi-robot, or single-arm evaluations.
- Domain shifts tested are modest (e.g., ±8 cm translation, ±15° rotation, ±5° camera elevation); robustness to larger pose variations, highly crumpled initial states, and substantial viewpoint changes is untested.
- Limited garment diversity in real deployment (e.g., one “highly dissimilar” polo): no systematic coverage of materials (knit vs woven), thickness (towels/jeans), stretchiness, or topological complexities (sleeves inside-out, ties/strings).
- No analysis of failure cases (where/why the policy fails) or sensitivity to environmental factors (humidity, static charge, surface compliance).
- Synthetic-to-real “equivalence ratio”
- The 1:15 synthetic-to-real equivalence is reported for one task and setup; it is unknown how this ratio varies across tasks, garments, robots, solvers, or data regimes (especially ultra-low data).
- No confidence intervals or statistical tests on success rates; only 30 trials per configuration may be insufficient for tight uncertainty bounds.
- Policy learning details and sensing
- The policy architecture, observation/action spaces, training regimes, and regularization are insufficiently specified for reproducibility; impact of choices on sim-to-real remains unclear.
- Only RGB head-camera observations are considered; how to integrate and simulate tactile/force sensing (critical for deformable manipulation) is an open direction.
- Appearance and sensor realism
- Rendering randomization focuses on materials, lighting, and camera pose; there is no modeling of camera artifacts (lens distortion, rolling shutter, noise, motion blur) or depth sensor noise that affect real perception.
- No study of the effect of photorealism vs. stylization on transfer, nor of the necessary level of fidelity to maintain performance.
- Broader applicability and scaling
- Automation of the full R2S2R loop at scale (scanning many garments, calibrating per-item materials, generating task-specific trajectories) is not addressed; cost–benefit and throughput are unknown.
- Extension to other deformables (e.g., sponges, ropes, bags, soft food, wet cloth) and multi-physics settings (fluids, granular materials) is untested.
- Integration of small real-world fine-tuning or active data collection to close residual gaps is not explored (e.g., hybrid R2S2R+on-policy corrections).
- Reproducibility and release
- The availability and completeness of code, assets, and datasets are unclear; details on how to reproduce the calibration, solver settings, and data generation at scale are missing.
- Precise camera and robot calibration procedures (and their tolerances) are not documented, limiting replicability.
These gaps suggest concrete next steps: automate material parameter recovery with quantitative metrics and benchmarks; validate the solver against physically grounded references; incorporate richer contact/gripper models; expand real-world evaluations across tasks, garments, and robots with larger and more challenging domain shifts; develop grasp discovery and physics-constrained motion synthesis; simulate sensor artifacts and tactile feedback; and report comprehensive performance, cost, and reproducibility details.
Practical Applications
Practical Applications of the Physics-Aligned R2S2R Simulator for Deformable Manipulation
Below are actionable, real-world applications derived from the paper’s findings and innovations. They are grouped into Immediate Applications (deployable now with reasonable effort) and Long-Term Applications (requiring further R&D, scaling, or integration). Each item notes sectors, potential tools/products/workflows, and key assumptions or dependencies that affect feasibility.
Immediate Applications
The following applications can be implemented today using the paper’s R2S2R workflow: metric-accurate scanning, a deformation-stable solver, calibrated real-to-sim matching, and diffusion-based trajectory synthesis with automated validity filtering.
- Physics-aligned synthetic data generation to replace or augment real demonstrations
- Sectors: Robotics, Software/AI, Research/Academia
- Use case: Generate synthetic deformable-object demonstrations (e.g., folding, flattening) to train imitation or VLA policies with a ~1:15 real-to-synthetic data equivalence, reducing real data collection needs.
- Tools/workflows: “R2S2R Data Engine” pipeline; scanning (e.g., EinScan), URDF-based robot twin import, AVBD-based solver, diffusion trajectory generator, video discriminator filter, Blender rendering, LeRobot integration.
- Assumptions/dependencies: Access to a high-precision scanner and robot CAD/URDF; manual-in-the-loop parameter calibration; seed teleop demos (e.g., 100–200); GPU for sim/render (e.g., RTX 4090).
- Factory/warehouse garment manipulation policy training (folding, bagging, sorting)
- Sectors: Logistics, Apparel, E-commerce returns, Laundry services, Retail operations
- Use case: Train and deploy folding/packing policies for standardized garments/linens using synthetic data from scanned SKUs; reduce downtime and operator training.
- Tools/workflows: SKU-specific digital twins; batch synthetic dataset generation with domain randomization across lighting, textures, and layouts; on-robot zero-shot deployment; QA via video-based validity scores.
- Assumptions/dependencies: Consistent SKUs and fixtures; per-item calibration for materials/friction; dual-arm or capable single-arm robots with appropriate grippers; line integration.
- Hospitality and healthcare linen handling (folding, stacking, repositioning)
- Sectors: Hospitality (hotels), Healthcare (hospital linens)
- Use case: Automate folding and placement workflows for standardized linens to reduce repetitive labor and improve hygiene compliance.
- Tools/workflows: Room- and table-level asset digitization; batch training across linen sizes/textures; simulation-based safety validation; limited on-site trials.
- Assumptions/dependencies: Sterility and cleaning standards; robust grippers compatible with linens; site-specific calibration for surfaces and lighting.
- Academic benchmark and dataset creation for deformable manipulation
- Sectors: Academia, Open-source communities
- Use case: Build curated, reproducible datasets and benchmarks for cloth manipulation (beyond rigid-object datasets), enabling fair comparison of policies/solvers.
- Tools/workflows: Public release of scanned meshes, calibrated scenes, and synthetic data recipes; baseline policies trained from scratch using LeRobot format; ablation-ready pipelines.
- Assumptions/dependencies: Licensing for scanned assets; compute for rendering and diffusion training; course/lab adoption.
- Simulation-first QA and regression testing for deformable policies before deployment
- Sectors: Robotics vendors and integrators
- Use case: Validate policy changes against physics-aligned digital twins to catch regressions (e.g., slippage, over-stretch events) before field deployment.
- Tools/workflows: “Sim-validated CI” where a test suite of scenes/rules (state-based filters + video discriminator) gate releases; parameter “diffs” tracked with visual comparisons.
- Assumptions/dependencies: Reliable scene digitization and consistent camera calibration; policy outputs interfaced to the simulator.
- Training services and internal tooling for deformable data-at-scale
- Sectors: Robotics/AI services, System integrators
- Use case: Offer “Data-as-a-Service” that takes a client’s garments/linens, scans and calibrates them, generates large-scale synthetic training sets, and delivers deployable policies.
- Tools/workflows: Scanning kit + calibration UI; parameter libraries per fabric class; cloud-based synthetic data generation with distributed rendering.
- Assumptions/dependencies: Client onboarding process; per-client IP agreements; turnaround times tied to calibration complexity.
- Product prototyping for home robots on constrained tasks (e.g., towel/t-shirt folding)
- Sectors: Consumer robotics, Startups
- Use case: Rapidly iterate on consumer-proof-of-concept behaviors (folding a small set of items) using synthetic training; demonstrate zero-shot deployment viability.
- Tools/workflows: Starter asset bundle (common towels/tees) + home table/camera scans; simulation-driven policy tuning; small-scale real trials.
- Assumptions/dependencies: Limited generalization to diverse households unless expanded scanning/randomization; safety and reliability standardization.
- Educational modules on deformable dynamics and sim-to-real
- Sectors: Higher education, Professional training
- Use case: Teach hands-on courses/labs on R2S2R, covering scanning, solver design, calibration, motion synthesis, and policy training for deformables.
- Tools/workflows: Course kits; notebooks/scripts; sandbox robots or simulated twins.
- Assumptions/dependencies: Access to scanning and modest compute; simplified assets for classroom pace.
Long-Term Applications
These applications are plausible extensions but require additional research, broader validation, automation of calibration, or integration with specialized hardware/standards.
- General-purpose home assistance with deformables (dressing, bed-making, laundry pipelines)
- Sectors: Consumer robotics, Elder care, Assistive tech
- Use case: Robots that handle varied clothing, bedding, and everyday deformables robustly across homes without per-item scans.
- Tools/workflows: Onboard perception to infer material properties online; automated parameter estimation; incremental learning from user feedback.
- Assumptions/dependencies: Strong generalization to unseen fabrics/geometries; safety, privacy, and cost constraints; broader perception-action integration beyond folding.
- Dressing assistance and medical textile manipulation (e.g., draping, bandage handling)
- Sectors: Healthcare, Rehabilitation
- Use case: Assistive robots to help with dressing; precise handling of sterile drapes and bandages.
- Tools/workflows: Sterile-compatible robots; fine-grained force/torque control; safety-critical simulators with validated soft-body contact models for medical fabrics.
- Assumptions/dependencies: Regulatory approval; verified biomechanical and infection-control constraints; high-fidelity sensors and compliant grippers.
- Flexible object assembly and cable harness routing in manufacturing
- Sectors: Electronics/Auto manufacturing, Robotics
- Use case: Extend R2S2R to cables, wires, gaskets, and flexible tubing where contact-rich, deformable dynamics matter.
- Tools/workflows: Solver extensions for 1D rods/strings (cosserat-like models) integrated into the pipeline; scanned harness layouts; trajectory synthesis under hard/soft constraints.
- Assumptions/dependencies: Accurate contact/friction models for slender deformables; efficient real-time solvers; validated safety in dense assemblies.
- Soft food handling and flexible packaging
- Sectors: Food processing, Logistics
- Use case: Manipulate soft items (dough, pastry sheets, flexible packaging films) with physics-aligned policies.
- Tools/workflows: Material-class parameter libraries; hygienic grippers; simulators that capture viscoelastic/plastic behaviors.
- Assumptions/dependencies: Solvers beyond cloth-like elasticity (e.g., MPM/FEM hybrid); strict compliance with food safety standards.
- Apparel digital twins for design-to-automation workflows
- Sectors: Fashion/Apparel, CAD/PLM software
- Use case: Use design-stage garment twins to predict downstream handling/packaging processes and optimize SKUs for automation.
- Tools/workflows: CAD-to-physics-aligned twin conversion; “automation-readiness” scoring; what-if simulations for materials/cuts.
- Assumptions/dependencies: Interoperability between garment CAD and simulators; robust material parameter inference; vendor ecosystem alignment.
- Certification and policy frameworks for synthetic data in robot safety validation
- Sectors: Policy/Regulators, Industry standards bodies
- Use case: Define standards for when synthetic, physics-aligned data can replace or reduce real-world tests in certification and safety cases.
- Tools/workflows: Auditable calibration logs; conformance tests (pass-rate thresholds, physical plausibility metrics); scenario coverage metrics derived from domain randomization.
- Assumptions/dependencies: Consensus on benchmarks and metrics; clear traceability from real-to-sim-to-real; sector-specific risk assessments.
- Automated, self-calibrating deformable solvers and parameter estimation
- Sectors: Robotics software, Simulation platforms
- Use case: Learn material and contact parameters from a handful of real interactions, removing expert-in-the-loop tuning.
- Tools/workflows: Bayesian/gradient-based parameter inference; vision-force fusion; active learning to probe informative interactions.
- Assumptions/dependencies: Reliable sensing of forces and cloth state; robust identifiability in the presence of noise; compute-efficient inference.
- Cross-robot, cross-site transfer with minimal local setup
- Sectors: System integration, Enterprise robotics
- Use case: Deploy policies trained once to multiple facilities and robot models with minor adjustments, using site-level digital twins.
- Tools/workflows: Standardized scene digitization protocols; hardware abstraction layers; automated camera/kinematic calibration routines.
- Assumptions/dependencies: Consistent kinematic and end-effector capabilities; robust sim-parameter portability; scalable asset libraries.
- Sustainable textile recycling and sorting automation
- Sectors: Recycling, Circular economy
- Use case: Robustly manipulate and sort used garments of unknown types/shapes to improve recycling throughput and quality.
- Tools/workflows: Online adaptation to unfamiliar items; vision-only parameter guesses refined via interaction; domain-randomized simulation to pretrain generalists.
- Assumptions/dependencies: Significant generalization beyond scanned assets; handling of contaminants and wear; throughput and reliability requirements.
- High-fidelity, real-time deformable interaction modules for AR/VR and digital humans
- Sectors: Media/Entertainment, Gaming, Virtual try-on
- Use case: Leverage stabilized, physically plausible cloth dynamics for interactive experiences and try-on that also align with real-world handling.
- Tools/workflows: Real-time AVBD-derived solvers optimized for GPUs; simplified calibration workflows for consumer devices.
- Assumptions/dependencies: Balancing real-time performance with physical accuracy; content pipeline integration; acceptable approximations for interactivity.
Key Cross-Cutting Assumptions and Dependencies
- Asset digitization quality: Metric-accurate 3D scans and properly aligned URDFs are foundational; poor scans undermine dynamics and contact alignment.
- Material and contact calibration: Current pipeline needs expert-guided tuning per item/material; automating this is an open area for scalability.
- Compute requirements: Diffusion-based synthesis, rendering, and physics require GPU resources; throughput depends on hardware and parallelization.
- Seed demonstrations: Although synthetic data scales well, initial seed demos (teleop) are needed to bootstrap trajectory decomposition and diffusion training.
- Task scope and generalization: Validated primarily on garment tasks (e.g., T-shirt folding); extending to other deformables (cables, food, films) will require solver/model adaptations.
- Hardware/gripper capabilities: Success depends on robot dexterity, compliance, and grasp reliability; some applications may need specialized end-effectors.
- Safety and certification: For regulated domains (healthcare, consumer), simulation-derived evidence must be tied to accepted standards and robust real-world tests.
Glossary
- appearance randomization: Rendering-time variability of visual factors (e.g., materials, lighting, cameras) to improve robustness. "Valid trajectories are rendered in Blender~\citep{blender} with appearance randomization of materials, lighting, and camera parameters."
- Augmented Vertex Block Descent (AVBD): An augmented cloth/soft-body optimization method that stabilizes deformation by adding constraints to VBD. "We develop a deformation-stable solver inspired by the Augmented Vertex Block Descent (AVBD) formulation~\citep{Giles2025}, extending the NewtonâVBD solver~\citep{10.1145/3658179}."
- bidirectionally synchronized simulation infrastructure: A control and simulation setup where real and simulated executions are kept in lockstep for calibration. "A bidirectionally synchronized simulation infrastructure replaces identical dual-arm executions in simulation and aligns deformation behaviors through visual calibration."
- Blender: An open-source 3D creation and rendering suite used here for photorealistic dataset generation. "Valid trajectories are rendered in Blender~\citep{blender} with appearance randomization of materials, lighting, and camera parameters."
- bimanual platform: A robot with two coordinated arms for dexterous tasks. "The robot used in this study is the ARX ACONE robot, a bimanual platform designed for dexterous manipulation tasks."
- conditional diffusion forcing: A diffusion-based sequence modeling approach that reconstructs trajectories from corrupted inputs under conditioning. "We employ conditional diffusion forcing~\citep{NEURIPS2024_2aee1c41}, where a transformer sequence model reconstructs trajectories from partially corrupted tokens."
- contact-rich dynamics: Physical interactions dominated by complex, sustained contacts that strongly influence system evolution. "While this paradigm shows promise in rigid-object settings, deformable manipulation intensifies the hunger for data, as its evolving geometry and contact-rich dynamics demand substantially broader state and visual coverage."
- cycle path tracing: Photorealistic rendering via path tracing (Cycles) to generate realistic images. "Multiple variations are generated per trajectory using cycle path tracing to produce photorealistic RGB images synchronized with trajectory timestamps."
- deformation-stable solver: A physics solver designed to maintain stable, realistic soft-body deformation during interaction. "We develop a deformation-stable solver inspired by the Augmented Vertex Block Descent (AVBD) formulation~\citep{Giles2025}, extending the NewtonâVBD solver~\citep{10.1145/3658179}."
- deformable manipulation: Robotic manipulation of objects whose shape changes under force (e.g., cloth). "However, this paradigm breaks down in deformable manipulation."
- diffusion-based motion framework: A trajectory-generation framework using diffusion models to synthesize realistic motions. "We enhance simulation fidelity and data utility through metric-accurate scene digitization, a deformation-stabilized solver with physics-based calibration, and a diffusion-based motion framework coupled with filtering to generate high-quality manipulation data."
- domain shifts: Changes in environment or data distribution between training and deployment. "Simulated data matches real episodes under equal budgets and surpasses them when scaled, especially under domain shifts."
- draping: The way a cloth hangs or conforms under gravity and contact. "The renderings are visually compared with real executions, allowing experts to assess discrepancies in draping, folding, and contact behavior."
- elastic modeling: Modeling material response using elasticity theory to calibrate deformable dynamics. "Given limited demonstrations, the system digitizes scenes into metric-consistent twins, calibrates deformable dynamics through elastic modeling, and expands behaviors via diffusion-based trajectory generation with quality filtering."
- FEM (Finite Element Method): A numerical method for simulating deformable materials by discretizing into elements. "Generic deformable solvers (e.g., FEM~\citep{zienkiewicz2005finite}, VBD~\citep{10.1145/3658179}, etc.) are not designed for rigidâsoft interaction and exhibit unrealistic dynamics due to particle motion lag."
- isomorphic teleoperation: A setup where the operator’s controls map directly and consistently to the simulated/real robot. "Real-world and simulated data collection via kinesthetic teaching and isomorphic teleoperation on Arx ACONE and Arx X5."
- kinesthetic teaching: Teaching by physically guiding the robot’s end-effectors through the desired motions. "In the real world, we adopt kinesthetic teaching, in which the operator directly guides the robot's end-effectors by hand"
- Lagrange multiplier: A variable used to enforce constraints in optimization-based physics solvers. "and is the Lagrange multiplier accumulating constraint forces."
- LeRobot format: A dataset format used to store robot observations, states, and actions for imitation learning. "The final dataset combines rendered observations with robot states and actions in the LeRobot format~\citep{lerobot2024} for imitation learning."
- LiDAR: A sensor that measures distances with laser light to produce dense 3D point clouds. "Multi-view RGB images and LiDAR scans are captured and fused to generate a dense point cloud."
- Material Point Method (MPM): A hybrid particle–grid method for simulating continuum materials such as cloth or soft bodies. "The digitization of deformable objects employs MPM~\citep{chenhu2026empm}, spring-mass models~\citep{jiang2025phystwin}, and platforms such as GarmentLab~\citep{lu2024garmentlab}, which integrate multiple physics engines."
- metric-accurate: Having correct real-world scale and measurements in the digital model. "For geometric alignment, high-precision 3D scans are reconstructed into metric-accurate, textured meshes, producing simulation-ready digital representations of real-world scenes."
- Newton–VBD solver: A VBD-based solver accelerated with Newton methods for cloth simulation. "We develop a deformation-stable solver inspired by the Augmented Vertex Block Descent (AVBD) formulation~\citep{Giles2025}, extending the NewtonâVBD solver~\citep{10.1145/3658179}."
- particle-state optimization: Optimizing per-particle states (positions, velocities) to achieve accurate deformation. "While existing solvers achieve accurate offline deformation through particle-state optimization, they are not designed for the real-time requirements of embodied manipulation where rigidâsoft interaction must be updated dynamically during control."
- penalty stiffness: The stiffness parameter used in penalty methods to enforce constraints in optimization. "where is the penalty stiffness parameter at Newton iteration , and is the Lagrange multiplier accumulating constraint forces."
- PBD (Position-Based Dynamics): A fast, constraint-based method for simulating deformables with position corrections. "VBD suffers from unrealistic stretching~\citep{10.1145/3658179}, while PBD and FEM involve trade-offs between accuracy and efficiency~\citep{MULLER2007109,10.1145/566654.566623}."
- Poisson reconstruction: A surface reconstruction method from oriented point clouds. "The resulting point cloud is then processed through surface refinement (e.g., Poisson reconstruction~\citep{10.1145/2487228.2487237}) followed by mesh post-processing, including hole filling, smoothing, and remeshing to obtain a clean, watertight mesh suitable for simulation."
- Poisson's ratio: A material property describing lateral contraction relative to axial stretching. "Physical parameters list (density, Young's modulus, Poisson's ratio, friction, restitution, relaxation) cannot be recovered to their true physical values through direct optimization."
- real-to-sim (R2S): Aligning simulated environments with real-world observations to ground simulation in reality. "Real-to-sim (R2S) alignment is thus foundational for deformable manipulation, prioritizing correspondence between simulated and physical dynamics over superficial realism or asset import~\citep{tian2025interndata,yin2026geniesim30}."
- real-to-sim-to-real (R2S2R): A pipeline that digitizes real scenes, generates synthetic data in sim, and transfers back to real robots. " adopts the real-to-sim-to-real (R2S2R) paradigm to bridge geometry, dynamics, and motion across stages."
- restitution: A parameter controlling bounciness in collisions (coefficient of restitution). "Physical parameters list (density, Young's modulus, Poisson's ratio, friction, restitution, relaxation) cannot be recovered to their true physical values through direct optimization."
- rigid-body abstractions: Simulation models that treat objects as rigid, often inadequate for soft materials. "Although simulation promises relief from the cost of real-world data acquisition, prevailing sim-to-real pipelines remain rooted in rigid-body abstractions, producing mismatched geometry, fragile soft dynamics, and motion primitives poorly suited for cloth interaction."
- rigidâsoft coupling: The interaction and mutual constraints between rigid bodies (e.g., grippers) and soft bodies (e.g., cloth). "Achieving reliable S2R transfer for deformable manipulation requires physically consistent rigidâsoft coupling, a setting that remains poorly supported by existing simulation engines."
- sim-to-real (S2R): Using simulation-generated data to train models that transfer to the real world. "In the field, sim-to-real (S2R) synthetic data generation has emerged as a compelling strategy for scaling manipulation data~\citep{ gu2023maniskill2,ye2025dex1b,he2025viral,xue2025openingsimtorealdoorhumanoid,deng2025graspvlagraspingfoundationmodel}."
- sim-to-real gap: The performance and fidelity discrepancy between simulation and real-world deployment. "We introduce , which minimizes the sim-to-real gap through a physics-aligned R2S2R paradigm, enabling synthetic data to serve as high-fidelity training data for direct deployment in deformable manipulation."
- simulated twin: A simulation instance mirroring the real robot/environment for synchronized evaluation. "The robotâs joint states are streamed to the simulator so that the simulated twin reproduces identical motions."
- spring-mass models: A deformable simulation model using masses connected by springs to approximate elasticity. "The digitization of deformable objects employs MPM~\citep{chenhu2026empm}, spring-mass models~\citep{jiang2025phystwin}, and platforms such as GarmentLab~\citep{lu2024garmentlab}, which integrate multiple physics engines."
- strain constraint: A constraint limiting maximum stretch in an edge to prevent unrealistic deformation. "The strain constraint is therefore:"
- strain limiting: Techniques to cap material stretch to physically plausible ranges. "Recent solvers~\citep{Giles2025} improve strain limiting but remain isolated from broader pipelines."
- stabilized soft-body solver: A solver designed to maintain stable soft-body behavior under contact and motion. "For dynamical alignment, a stabilized soft-body solver~\citep{Giles2025} enforces physically consistent elastic and bending responses while suppressing excessive deformation, thereby enabling realistic interaction modeling."
- teleoperated simulation: Collecting demonstrations by remotely controlling the simulated robot. "Demonstration data from teleoperated simulation are first decomposed into motion segments and subsequently synthesized via diffusion, with visual randomization used to generate scaled training data that enhances generalization"
- textured meshes: Meshes with mapped surface textures for realistic appearance. "For geometric alignment, high-precision 3D scans are reconstructed into metric-accurate, textured meshes, producing simulation-ready digital representations of real-world scenes."
- Transformer encoder: A transformer-based temporal encoder used for video sequence discrimination. "A ResNet-18 feature extractor~\citep{He_2016_CVPR} and Transformer encoder~\citep{NIPS2017_3f5ee243} aggregate temporal information and output a validity score ."
- Transformer sequence model: A transformer that models sequential data (here, robot trajectories). "We employ conditional diffusion forcing~\citep{NEURIPS2024_2aee1c41}, where a transformer sequence model reconstructs trajectories from partially corrupted tokens."
- URDF (Unified Robot Description Format): A standardized XML format describing robot kinematics, geometry, and dynamics. "Its kinematic structure, joint limits, collision geometries, and visual meshes are defined in a URDF file generated from CAD models (e.g., SolidWorks) provided by the manufacturer."
- Vertex Block Descent (VBD): An optimization-based cloth solver updating vertex blocks iteratively to minimize energy. "(a) After naive VBD~\citep{10.1145/3658179} updates under external forces, edge deformation is monitored and virtual elastic constraints are activated when stretch exceeds a threshold, injecting strain forces that accelerate convergence toward physically plausible cloth configurations."
- vibe coding: A heuristic/statistical coding approach used here to derive filtering thresholds from particle states. "From simulation-derived positive and negative trajectories, we leverage vibe coding to synthesize threshold rules over particle statistics, defining admissible regions that favor positive states and exclude negative ones."
- watertight mesh: A mesh without holes, suitable for robust physics and rendering. "to obtain a clean, watertight mesh suitable for simulation."
- Young's modulus: A material stiffness parameter in elasticity theory. "Physical parameters list (density, Young's modulus, Poisson's ratio, friction, restitution, relaxation) cannot be recovered to their true physical values through direct optimization."
- zero-shot: Deployment without task-specific real-world fine-tuning after training. "while delivering 90\% zero-shot success and 50\% generalization gains in real-world deployment."
Collections
Sign up for free to add this paper to one or more collections.