Generating Physically Stable and Buildable Brick Structures from Text

Published 8 May 2025 in cs.CV | (2505.05469v2)

Abstract: We introduce BrickGPT, the first approach for generating physically stable interconnecting brick assembly models from text prompts. To achieve this, we construct a large-scale, physically stable dataset of brick structures, along with their associated captions, and train an autoregressive LLM to predict the next brick to add via next-token prediction. To improve the stability of the resulting designs, we employ an efficient validity check and physics-aware rollback during autoregressive inference, which prunes infeasible token predictions using physics laws and assembly constraints. Our experiments show that BrickGPT produces stable, diverse, and aesthetically pleasing brick structures that align closely with the input text prompts. We also develop a text-based brick texturing method to generate colored and textured designs. We show that our designs can be assembled manually by humans and automatically by robotic arms. We release our new dataset, StableText2Brick, containing over 47,000 brick structures of over 28,000 unique 3D objects accompanied by detailed captions, along with our code and models at the project website: https://avalovelace1.github.io/BrickGPT/.

Abstract PDF Upgrade to Chat

Summary

The paper introduces LegoGPT, an autoregressive LLM fine-tuned to generate sequences of LEGO brick placements while ensuring physical stability.
It employs rejection sampling and physics-aware rollback mechanisms to validate design integrity and maintain structural realism.
Experimental results show superior performance over baselines by achieving high text-structure alignment and manufacturable LEGO designs.

Generating Physically Stable and Buildable Brick Structures from Text

The paper presents a methodology for generating LEGO brick structures from textual descriptions, focusing on physical stability and constructibility. Leveraging advances in autoregressive LLMs, the research introduces LegoGPT, which promises transformational applications in education, entertainment, and beyond.

Overview of LegoGPT

LegoGPT's core innovation is an autoregressive LLM fine-tuned to predict the sequence of bricks needed to construct a stable LEGO design from text-based prompts. This task involves generating each next brick by extending the capabilities of traditional LLM architectures, focusing on ensuring physical stability and buildability.

Dataset and Training

The foundation of the research is the StableText2Lego dataset, encompassing over 47,000 LEGO structures derived from 28,000 unique 3D objects (Figure 1). These structures are accompanied by detailed captions, ensuring rich data for model training. The models are trained to predict the positioning and orientation of each brick, adhering strictly to structural integrity principles.

Figure 1: StableText2Lego Dataset.

Methodology

Autoregressive Model Training

The approach reformulates LEGO design generation as a text prediction task. Using a modified architecture of LLaMA-3.2-Instruct-1B, the model generates a sequence of brick placements. Each step involves tokenizing LEGO structures and incorporating both size and positioning data into a streamlined text format (Figure 2).

Figure 2: Method overview including tokenization, model fine-tuning, and prediction validation.

Ensuring Structural Stability

A critical component of LegoGPT is ensuring designs are not only aesthetically aligned with prompts but also physically viable. This includes physics-aware checks during model inference to rollback invalid predictions that may result in unstable structures. Stability is further ensured through a rigorous force model that simulates real-world forces acting on blocks (Figure 3).

Figure 3: Force Model illustrating various forces considered in stability assessments.

Model Inference and Optimization

The inference process incorporates both validity checks and physics-aware rollbacks. Rejection sampling is employed to seamlessly integrate these components into the model prediction pipeline, ensuring designs adhere to defined physical constraints before assembly is deemed complete.

Texturing and Coloring

Beyond basic structural generation, the model includes functionality for detailed texturing and coloring (Figure 4). This extension utilizes UV mapping and FlashTex for text-based mesh texturing, showcasing the diverse stylistic possibilities LEGO structures can achieve.

Figure 4: Textured and Colored LEGO Generation.

Practical Applications

The real-world applicability of LegoGPT extends to both automated robotic assembly and manual construction (Figures 7 and 10). The system is designed for automated assembly using dual-robotic arms, leveraging task-distribution frameworks and precise manipulation policies.

Figure 5: Automated Assembly.

Figure 6: Manual Assembly.

Experimental Results

Quantitative analysis showcases the model's superiority over existing baselines, with higher proportions of stable and valid designs. The LegoGPT approach significantly enhances text-structure alignment, stability, and adherence to LEGO geometry. Techniques like the described physics-aware rollback and stitching enable robust, real-world implementation.

Conclusion

This research identifies and overcomes key challenges in real-world object generation, advancing the field of text-to-3D model generation by ensuring structural stability and buildability of LEGO designs. Future directions involve scaling the dataset and exploring more complex structures to further enhance the granularity of generated models. The methodology promotes wider access to automated design tools across disciplines, emphasizing the bridging of digital models to tangible constructs.

Markdown Report Issue