BIMgent: Towards Autonomous Building Modeling via Computer-use Agents

Published 8 Jun 2025 in cs.AI | (2506.07217v2)

Abstract: Existing computer-use agents primarily focus on general-purpose desktop automation tasks, with limited exploration of their application in highly specialized domains. In particular, the 3D building modeling process in the Architecture, Engineering, and Construction (AEC) sector involves open-ended design tasks and complex interaction patterns within Building Information Modeling (BIM) authoring software, which has yet to be thoroughly addressed by current studies. In this paper, we propose BIMgent, an agentic framework powered by multimodal LLMs, designed to enable autonomous building model authoring via graphical user interface (GUI) operations. BIMgent automates the architectural building modeling process, including multimodal input for conceptual design, planning of software-specific workflows, and efficient execution of the authoring GUI actions. We evaluate BIMgent on real-world building modeling tasks, including both text-based conceptual design generation and reconstruction from existing building design. The design quality achieved by BIMgent was found to be reasonable. Its operations achieved a 32% success rate, whereas all baseline models failed to complete the tasks (0% success rate). Results demonstrate that BIMgent effectively reduces manual workload while preserving design intent, highlighting its potential for practical deployment in real-world architectural modeling scenarios. Project page: https://tumcms.github.io/BIMgent.github.io/

Abstract PDF Upgrade to Chat

Summary

The paper introduces an LLM-driven framework that autonomously converts multimodal inputs into detailed building models using a hierarchical planning system.
It reports a 32% task completion rate in real-world tests, with success rates up to 95.12% for intricate operations like wall and window creation.
The framework reduces manual workload in AEC by applying speculative execution and self-reflective feedback to accurately navigate complex BIM software GUIs.

Overview of "BIMgent: Towards Autonomous Building Modeling via Computer-use Agents"

This paper introduces BIMgent, an innovative framework exploring the automation of building modeling through computer-use agents. Current desktop automation agents typically have a general focus, which limits their effectiveness in specialized domains such as Architecture, Engineering, and Construction (AEC). The proposed framework leverages multimodal LLMs to facilitate autonomous tasks in Building Information Modeling (BIM) environments, a crucial advancement given the demands of open-ended design tasks and the GUI complexity intrinsic to BIM software.

Technical Implementation

BIMgent operates through a multimodal LLM-driven agentic framework capable of autonomously authoring building models via GUI operations. The framework employs a hierarchy structured into three distinct layers:

Design Layer: This layer processes multimodal inputs—such as textual design descriptors or 2D sketches—to generate floorplans that feed into downstream modeling tasks.
Action Planning Layer: It consists of a high-level planner and a low-level planner that autonomously decompose the building modeling process into actionable substeps based on software documentation. This hierarchical planning is novel in tackling BIM authoring's complexity and multiple interaction paradigms.
Execution Layer: Here, planned actions are executed utilizing both speculative action sequences and dynamic GUI grounding. Additionally, this layer features self-reflective mechanisms to enhance reliability and accuracy, forming a closed-loop feedback system.

Empirical Evaluation

BIMgent was rigorously tested on real-world building tasks, where it was tasked with both generating new designs and reconstructing existing models. Performance metrics revealed that BIMgent successfully executed 32\% of the tasks, a significant outcome given the 0% completion rate for baseline models. This illustrates the framework's adeptness at reducing manual effort while maintaining design intent.

Furthermore, BIMgent's component-level evaluations showcased its efficiency: for instance, the framework achieved success rates of 86.58% and 95.12% in repetitive yet intricate tasks of wall and window creation, respectively, underscoring its proficiency in managing detailed operations.

Theoretical and Practical Implications

Theoretically, this work contributes to expanding the capabilities of computer-use agents by integrating them into specialized applications involving complex GUIs, such as BIM authoring. Practically, BIMgent's success in reducing manual workload and conserving the architectural fidelity of designs points towards its potential deployment in real-world scenarios, particularly within the AEC sector where efficiency gains are vital.

Future Prospects in AI and AEC

Looking forward, the capabilities demonstrated by BIMgent have broader implications for the future of AI in specialized design fields. The development of more refined multimodal LLMs and improved interface interaction strategies will likely ameliorate the efficiency and robustness of such frameworks. Future exploration could entail adapting similar methodologies to other professional domains requiring multidisciplinary integration via GUIs, thereby enhancing automation in complex design and operational workflows.

Markdown Report Issue