- The paper introduces an LLM-driven framework that autonomously converts multimodal inputs into detailed building models using a hierarchical planning system.
- It reports a 32% task completion rate in real-world tests, with success rates up to 95.12% for intricate operations like wall and window creation.
- The framework reduces manual workload in AEC by applying speculative execution and self-reflective feedback to accurately navigate complex BIM software GUIs.
Overview of "BIMgent: Towards Autonomous Building Modeling via Computer-use Agents"
This paper introduces BIMgent, an innovative framework exploring the automation of building modeling through computer-use agents. Current desktop automation agents typically have a general focus, which limits their effectiveness in specialized domains such as Architecture, Engineering, and Construction (AEC). The proposed framework leverages multimodal LLMs to facilitate autonomous tasks in Building Information Modeling (BIM) environments, a crucial advancement given the demands of open-ended design tasks and the GUI complexity intrinsic to BIM software.
Technical Implementation
BIMgent operates through a multimodal LLM-driven agentic framework capable of autonomously authoring building models via GUI operations. The framework employs a hierarchy structured into three distinct layers:
- Design Layer: This layer processes multimodal inputs—such as textual design descriptors or 2D sketches—to generate floorplans that feed into downstream modeling tasks.
- Action Planning Layer: It consists of a high-level planner and a low-level planner that autonomously decompose the building modeling process into actionable substeps based on software documentation. This hierarchical planning is novel in tackling BIM authoring's complexity and multiple interaction paradigms.
- Execution Layer: Here, planned actions are executed utilizing both speculative action sequences and dynamic GUI grounding. Additionally, this layer features self-reflective mechanisms to enhance reliability and accuracy, forming a closed-loop feedback system.
Empirical Evaluation
BIMgent was rigorously tested on real-world building tasks, where it was tasked with both generating new designs and reconstructing existing models. Performance metrics revealed that BIMgent successfully executed 32\% of the tasks, a significant outcome given the 0% completion rate for baseline models. This illustrates the framework's adeptness at reducing manual effort while maintaining design intent.
Furthermore, BIMgent's component-level evaluations showcased its efficiency: for instance, the framework achieved success rates of 86.58% and 95.12% in repetitive yet intricate tasks of wall and window creation, respectively, underscoring its proficiency in managing detailed operations.
Theoretical and Practical Implications
Theoretically, this work contributes to expanding the capabilities of computer-use agents by integrating them into specialized applications involving complex GUIs, such as BIM authoring. Practically, BIMgent's success in reducing manual workload and conserving the architectural fidelity of designs points towards its potential deployment in real-world scenarios, particularly within the AEC sector where efficiency gains are vital.
Future Prospects in AI and AEC
Looking forward, the capabilities demonstrated by BIMgent have broader implications for the future of AI in specialized design fields. The development of more refined multimodal LLMs and improved interface interaction strategies will likely ameliorate the efficiency and robustness of such frameworks. Future exploration could entail adapting similar methodologies to other professional domains requiring multidisciplinary integration via GUIs, thereby enhancing automation in complex design and operational workflows.