Papers
Topics
Authors
Recent
Search
2000 character limit reached

Reflective VLM Planning for Dual-Arm Desktop Cleaning: Bridging Open-Vocabulary Perception and Precise Manipulation

Published 19 Jun 2025 in cs.RO | (2506.17328v1)

Abstract: Desktop cleaning demands open-vocabulary recognition and precise manipulation for heterogeneous debris. We propose a hierarchical framework integrating reflective Vision-LLM (VLM) planning with dual-arm execution via structured scene representation. Grounded-SAM2 facilitates open-vocabulary detection, while a memory-augmented VLM generates, critiques, and revises manipulation sequences. These sequences are converted into parametric trajectories for five primitives executed by coordinated Franka arms. Evaluated in simulated scenarios, our system achieving 87.2% task completion, a 28.8% improvement over static VLM and 36.2% over single-arm baselines. Structured memory integration proves crucial for robust, generalizable manipulation while maintaining real-time control performance.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.