RealAppliance Dataset

Updated 6 December 2025

RealAppliance is a high-fidelity dataset comprising 100 digital appliance assets, each precisely aligned with corresponding real-world manuals.
It features detailed geometric models, modular physical and electronic systems, and programmatic control logic for simulating realistic appliance behavior.
The accompanying RealAppliance-Bench rigorously evaluates multimodal models on tasks such as manual retrieval, part grounding, and closed-loop manipulation planning.

The RealAppliance dataset is an extensive suite of photorealistic, mechanism-rich virtual appliance assets, constructed for the advancement of appliance manipulation research in simulation. Comprising 100 high-fidelity digital appliances, each asset is precisely aligned with real-world user manuals, supporting comprehensive physical, electronic, and logic-level interactivity. Accompanying the dataset, the RealAppliance-Bench benchmark rigorously evaluates multimodal LLMs (MLLMs) and embodied manipulation planners across core tasks such as manual understanding, part grounding, and manipulation planning (Gao et al., 29 Nov 2025).

1. Composition and Scope

RealAppliance encompasses 100 digital assets stratified across 14 appliance types drawn from kitchen and laundry domains, including ovens, toasters, air fryers, microwaves, rice cookers, bread machines, washing machines, and more. Each type is instantiated with multiple brand and model variants sourced from authentic product manuals, exemplifying full coverage of industry form-factors and control modalities (analog, digital, touch interfaces). Every asset is uniquely paired with its real-world user manual, ensuring one-to-one document alignment.

Category	Examples	Quantity/Type
Kitchen Cooking	Oven, Toaster, Air Fryer	Multiple variants
Food Preparation	Mixer, Blender, Kettle	Multiple variants
Laundry	Washing Machine	Multiple variants

This comprehensive coverage enables cross-device and cross-interface generalization for both embodied agents and document-based models.

2. Asset Design: Models, Mechanisms, and Logic

2.1 Geometric and Visual Fidelity

Assets are authored in Autodesk 3ds Max, utilizing TurboSmooth subdivision to achieve polygon densities of 200K–2M triangles per model, permitting close-up, inspection-level detail. Models are exported in Universal Scene Description (USD) format, supporting robust interchange and native compatibility with NVIDIA Isaac Sim. Each asset integrates high-resolution, UV-unwrapped color textures (≥ 4K × 4K), meticulously reproducing panel graphics, logos, scales, and control labels. Dynamic interface elements (screens, touchpads) are segregated into distinct UV regions for real-time updates. USD-encoded level-of-detail (LOD) variants—high- and mid-poly—reside per asset, streamlining simulation performance and fidelity.

2.2 Modular Physical and Electronic Systems

Physical mechanisms are encapsulated as modular Isaac Sim classes, standardized by shared interfaces. Implemented types include inner springs (lever return), magnetic attraction (door/lid closure), mechanical triggers (causal action propagation like door releases), knob-based countdown drives (mimicking mechanical timers), and safety locks (requiring secondary actions for activation). Electronic subsystems comprise screen displays (dynamic readout textures), touch sensors, interior illumination systems, status LED indicators, and rotary motors (for turntables, beaters, etc.).

2.3 Programmatic Control Logic

Each appliance defines a state vector over discrete and continuous domains, commonly including power status, setpoints (temperature, timer), and operational mode. State machines, implemented in Python/C++, map user actions (presses, rotations, openings) to state transitions and mechanism invocation. For continuous operations (e.g., “cooking” or “mixing”), periodic control loops advance internal timers, drive animations, and synchronize visual feedback (screen, light updates).

3. Alignment with Real Manuals

3.1 Assembly and Data Schema

Manuals are acquired as PDFs containing component diagrams, procedures, and measurements. CAD modeling extracts representative geometry guided by high-quality imagery and schematic data. During virtual assembly, each component node in the model strictly adheres to the manual’s naming conventions as documented in the corresponding parts lists.

A per-appliance JSON mapping file links model components to manual semantics. Each mapping entry specifies the component name, reference manual sections or figures, and precise model node path, e.g.:

{
  "component_name": "Knob_Temperature",
  "manual_sections": [ "Sec 2.1_Start-Up", "Fig 3.2_Control-Panel" ],
  "node_path": "/root/Body/Panel/Knob_Temp"
}

This facilitates unambiguous, programmatic correspondence from manual documentation to digital asset elements.

3.2 Alignment Rigour

Alignment accuracy is defined as

$\alpha = \frac{|\text{correct links}|}{|\text{total links}|}$

and by construction, RealAppliance achieves $\alpha = 1.00$ due to exact node naming. The design precludes mismatches between manual terminology and simulated asset structure.

4. Dataset Statistics and Organization

The dataset comprises 589 distinct operable components (average ≈ 5.9 per appliance), 979 articulated manipulation tasks (≈ 9.8 per appliance), and 941 “disturbance” steps for evaluating corrective/closed-loop reasoning. Manuals average 766.2 words, and annotated manipulation plans average 7.57 steps each. Part-grounding annotations adopt COCO-style JSON with 2D bounding boxes for each operationally relevant part.

Data organization is modular. The directory for each appliance asset includes: model.usd, textures/, complete manual.pdf, mapping.json, and executable program.py with control logic. Appliance indices and metadata are stored in a top-level catalog.

File layout (abridged):

RealAppliance/
  ├── 001_Oven/
  │    ├─ model.usd
  │    ├─ textures/
  │    ├─ manual.pdf
  │    ├─ mapping.json
  │    └─ program.py
  └── indices.json

Train/validation/test splits are not explicitly specified; the collection is intended as a comprehensive zero-shot evaluation resource.

5. RealAppliance-Bench: Benchmark Suite

5.1 Task Definitions and Metrics

RealAppliance-Bench comprises four principal evaluation tasks:

Manual Page Retrieval: Input: Full manual + query (e.g., “Operating Procedures”). Output: Relevant manual page indices. Metrics: Precision (P), Recall (R), as

$P = \frac{TP}{TP + FP}, \quad R = \frac{TP}{TP + FN}$

Open-Loop Manipulation Planning: Input: Task instructions, retrieved manual pages, initial scene image. Output: Sequence of atomic actions from a fixed grammar. Metrics: Completion Rate (CR), Success Rate (SR):

$CR = \frac{N_{\text{plans proposed}}}{N_{\text{tasks}}}, \quad SR = \frac{N_{\text{fully correct plans}}}{N_{\text{tasks}}}$

Plans are evaluated as fully correct only if all atomic steps (action + parameters) match ground truth.

Appliance Part Grounding: Input: Scene image, target part name. Output: 2D bounding box. Metrics: mean IoU, [email protected].
Closed-Loop Planning Adjustment: Input: Plan, action history, real-time disturbed observations. Output: Next atomic corrective action. Metric: Step-wise success rate:

$\text{SR} = \frac{\#\text{correct adjustment steps}}{\#\text{steps}}$

5.2 Model Performance Summary

Proprietary MLLMs (GPT-5, Gemini 2.5 Pro/Flash) achieve ~87% recall/F1 on manual retrieval but only single-digit percentages for open-loop planning.
Part grounding performance is limited (top IoU ≈ 12%, [email protected] ≈ 8.6%).
Best closed-loop adjustment performance is 31% stepwise SR (Gemini 2.5 Flash).
Embodied-planning baselines (Robobrain 2.0, ManualPlan, ApBot) underperform on document understanding, but may rival larger models in constrained, low-level tasks.
Full-process success is near-zero, primarily due to error accumulation across pipeline stages.

5.3 Analytical Observations

Manual-understanding models exhibit strong document-query accuracy but lack both spatial part identification and robust, long-horizon planning. Spatial localization (part-grounding) is a critical failure point, with most predicted IoUs below 0.05. Adaptation to closed-loop feedback remains unresolved, restricting reliable execution in dynamic or disturbed environments (Gao et al., 29 Nov 2025).

6. Applications, Access, and Licensing

RealAppliance is designed for multimodal LLM and embodied robotics evaluation—facilitating zero-shot/few-shot assessment on integrated document, vision, and planning challenges. The full suite of assets, control scripts, and benchmarks is publicly accessible at https://realappliance.github.io/. Use is governed by an open academic license (MIT-style), permitting unrestricted research applications including but not limited to:

Benchmarking vision-language-action models on realistic, manual-aligned appliance domains
Training and evaluation of reinforcement and imitation learning agents in high-fidelity simulation
Generation of new manipulation datasets via scripted executions within an aligned reality-grounded framework

7. Significance and Relation to Broader Research

RealAppliance addresses longstanding simulation-reality gaps by fusing visually and mechanically faithful assets with program logic and exact manual alignment. It provides a standardized, extensible substrate for end-to-end analysis of agents that must integrate document understanding, perception, and closed-loop interaction, offering unique diagnostic opportunities across the vision-language-action spectrum. The dataset and benchmark facilitate reproducible, system-level evaluation and expose critical research challenges at the intersection of multimodal reasoning and physical embodiment (Gao et al., 29 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as Aligned Real Manuals (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RealAppliance Dataset.