Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners

Published 26 May 2025 in cs.RO and cs.AI | (2505.20573v2)

Abstract: LLMs have demonstrated strong performance in various robot control tasks. However, their deployment in real-world applications remains constrained. Even state-ofthe-art LLMs, such as GPT-o4mini, frequently produce invalid action plans that violate physical constraints, such as directing a robot to an unreachable location or causing collisions between robots. This issue primarily arises from a lack of awareness of these physical constraints during the reasoning process. To address this issue, we propose a novel framework that integrates reinforcement learning with verifiable rewards (RLVR) to incentivize knowledge of physical constraints into LLMs to induce constraints-aware reasoning during plan generation. In this approach, only valid action plans that successfully complete a control task receive positive rewards. We applied our method to two small-scale LLMs: a non-reasoning Qwen2.5-3B-Instruct and a reasoning Qwen3-4B. The experiment results demonstrate that constraint-aware small LLMs largely outperform large-scale models without constraints, grounded on both the BoxNet task and a newly developed BoxNet3D environment built using MuJoCo. This work highlights the effectiveness of grounding even small LLMs with physical constraints to enable scalable and efficient multi-robot control in complex, physically constrained environments.

Abstract PDF Upgrade to Chat

Summary

Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners

The paper "Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners" introduces an innovative framework aimed at enhancing the practical application of large language models (LLMs) in multi-robot control tasks. This research addresses significant challenges faced by current LLMs when integrated into real-world robotics, particularly in generating action plans that adhere to physical constraints such as collision avoidance and reachability.

Summary of the Framework

The proposed solution employs reinforcement learning with verifiable rewards (RLVR) to instill a deep understanding of physical constraints into LLMs, thereby enabling them to generate valid, collision-free action plans. The framework emphasizes the importance of rewarding only those outputs that respect physical constraints, effectively teaching the LLMs to perform constraint-aware reasoning.

Two distinct environments are created for experimentation:
- BoxNet2D: A grid-based environment where robotic arms operate in a 2D plane, navigating objects to target locations.
- BoxNet3D: A more complex setup utilizing MuJoCo simulation to bring realistic 3D physical interactions into play.

Experimental Results

The empirical analysis demonstrates that smaller LLMs, when precisely grounded with physical constraints, can outperform even larger state-of-the-art models in multi-robot control tasks. The results show marked improvements in planning success rates and efficiency. For instance, the best-performing grounded LLM achieved a planning success rate of 0.87 compared to a meager 0.33 from an unconstrained counterpart.

Moreover, the RL-enhanced models illustrate better scalability to unseen environments. Experiments with varied robot placements and perturbed object coordinates reveal the generalization capabilities of RL-trained models, highlighting their robust reasoning abilities in diverse scenarios.

Implications and Future Potential

The findings of this paper have significant implications for the future of AI in robotics:
- Enhanced Real-world Applicability: The framework could enable more reliable deployment of LLMs in industrial and commercial sectors where complex robotic systems undertake collaborative tasks while honoring safety protocols.
- Reduced Dependency on Expert Knowledge: It potentially lowers the barrier of utilizing formal methods requiring expert translation by enabling natural language-driven planning that inherently understands physical constraints.
- Foundation for Further Research: This approach lays the groundwork for scaling robotic control tasks and integrating richer sensor data, potentially improving the autonomy of LLMs in achieving complex objectives.

Future developments may involve integrating more diverse sets of sensors and employing adaptive learning algorithms to evolve LLM capabilities further. Such expansions could lead to more nuanced applications, exploring new domains like autonomous vehicle coordination and smart city infrastructure management.

Conclusion

The paper offers an insightful look into how deliberate grounding of constraint knowledge within LLMs can enhance their practicality in robotic control tasks. By merging the strengths of reinforcement learning and physical realism, the research provides a pathway towards more robust and reliable AI-driven robotics, marking a significant step forward in navigating the complexities of real-world interactions.