Papers
Topics
Authors
Recent
Search
2000 character limit reached

BrepLLM Framework Overview

Updated 9 February 2026
  • BrepLLM is a framework that enables native parsing of complex Boundary Representation data by integrating geometric, topological, and linguistic modeling.
  • It employs adaptive UV sampling and hierarchical encoding with dual-tower contrastive pre-training to significantly improve 3D classification and captioning benchmarks.
  • Multi-stage LLM fine-tuning, featuring a Geometry-to-Vision bridge and Mixture-of-Query Experts, facilitates robust instruction tuning on the novel Brep2Text dataset.

Boundary Representation (Brep) models provide precise encoding of 3D geometry and topology in engineering and CAD, but the complexity and structure of Breps have made them challenging to integrate natively with LLMs. BrepLLM is the first framework to enable LLMs to parse and reason directly over raw Brep data, bridging the modality gap between structured 3D geometry and natural language via joint geometric, topological, and linguistic modeling. Leveraging a two-stage pipeline—cross-modal alignment pre-training and multi-stage LLM fine-tuning—BrepLLM achieves state-of-the-art performance on industrial 3D classification and captioning benchmarks, and establishes the first large-scale Brep instruction-tuning dataset (Deng et al., 18 Dec 2025).

1. Native Brep Graph Construction and Feature Representation

BrepLLM begins with an adaptive UV-sampling scheme that converts Boundary Representation data into a graph structure incorporating both geometry and topology. For a Brep with faces S\mathcal{S} and edges C\mathcal{C}, the process is as follows:

  • Graph Nodes and Edges: Each face SSS \in \mathcal{S} becomes a node; adjacency is established by connecting nodes that share a boundary edge in C\mathcal{C}.
  • Adaptive UV Sampling: For each parametric face SS, sampling density NSN_S is set by

NS=Nminface+ASAminAmaxAmin(NmaxfaceNminface),N_S = N_{\min}^{\mathrm{face}} + \frac{A_S - A_{\min}}{A_{\max}-A_{\min}} \bigl(N_{\max}^{\mathrm{face}}-N_{\min}^{\mathrm{face}}\bigr),

where ASA_S is the face area. Each sampled point (u,v)(u,v) is mapped to a 10D feature vector comprising 3D position PP, normal nn, mean curvature HH, visibility mask VV, face type tt, and normalized area aa.

  • Edge Sampling: Each edge CC is sampled with density

MC=Mminedge+Cminmaxmin(MmaxedgeMminedge),M_C = M_{\min}^{\mathrm{edge}} + \frac{\ell_C - \ell_{\min}}{\ell_{\max}-\ell_{\min}} \bigl(M_{\max}^{\mathrm{edge}}-M_{\min}^{\mathrm{edge}}\bigr),

yielding 8D point features: 3D position QQ, tangent τ\tau, edge type cc, and normalized length bb.

This sampling ensures that both fine and coarse geometric structures are represented proportionally to their metric importance within the Brep.

2. Hierarchical BrepEncoder Architecture

The sampled Brep data are processed by a hierarchical BrepEncoder, comprising three parallel feature extraction branches for each face:

  • Fine-Grained Face Features: PointTransformerV3 is applied to per-face attributes to yield FfR32F_f \in \mathbb{R}^{32}.
  • Edge-Conditioned Face Features: An Edge Encoder processes sampled edge attributes, propagating them onto incident faces via an NNConv mechanism, yielding FeR32F_e \in \mathbb{R}^{32}.
  • Global Topology Features: 2D/1D CNNs for embedding faces/edges, followed by two Edge-conditioned Graph Attention (EGATConv) layers, produce FtR64F_t \in \mathbb{R}^{64}.

All three branch outputs are concatenated to yield a per-face node token hi=[Ft(i)Fe(i)Ff(i)]R128h_i = [F_t^{(i)} \Vert F_e^{(i)} \Vert F_f^{(i)}] \in \mathbb{R}^{128}, and a global token hclsR128h_{\mathrm{cls}} \in \mathbb{R}^{128} is produced by global-attention pooling. The result is a variable-length node token sequence and a single global token for downstream processing.

3. Cross-Modal Pre-training with Contrastive Alignment

To align the structured Brep modality with natural language, BrepLLM employs dual-tower contrastive pre-training, analogous to CLIP:

  • Geometry Tower: The BrepEncoder's global token is projected to a DD-dimensional embedding (zbrep,iz_{\mathrm{brep},i}).
  • Text Tower: A frozen CLIP text encoder (ViT-L/14) produces corresponding text embeddings (ztext,iz_{\mathrm{text},i}).
  • InfoNCE Loss: Cosine similarity between geometry and text embeddings across a batch is computed and normalized, producing

LCLIP=12Ni=1N[logPii+logQii],\mathcal{L}_{\mathrm{CLIP}} = -\frac{1}{2N}\sum_{i=1}^N [\log P_{ii} + \log Q_{ii}],

where Pij,QijP_{ij}, Q_{ij} are batchwise softmaxes over similarity scores.

This symmetric loss encourages matched Brep-text pairs to be close in embedding space, while separating mismatched pairs. The BrepEncoder is thereby trained to produce representations semantically aligned with natural language descriptions.

4. Multi-Stage LLM Fine-Tuning

Following cross-modal alignment, BrepLLM undergoes a three-tiered fine-tuning regimen to integrate Brep encoding with a text-generative LLM:

Stage I: Geometry-to-Vision Bridging

  • The frozen BrepEncoder generates node token sequences {hi}\{h_i\}, projected via a two-layer MLP to match the Q-Former embedding dimension.
  • A BLIP-2-style Q-Former with 32 learnable queries aggregates node embeddings, linearly projecting the output into the LLM's input space.
  • Only the projection MLP is trained; all other modules remain frozen.

Stage II: 3D–Language Alignment Fine-Tuning

  • LoRA adapters are applied to the projection MLP, selected Q-Former sublayers, and a subset of LLM layers; BrepEncoder remains frozen.
  • Standard autoregressive objective is used, further aligning the representation with text output.

Stage III: Mixture-of-Query Experts (MQE)

  • Introduces lightweight residual query experts and a sparse router that selects the top GG experts based on the aggregated node tokens.
  • The final Q-Former query set is given by Qfinal=Qbase+QresQ_{\mathrm{final}} = Q_{\mathrm{base}} + Q_{\mathrm{res}}, with only the residual experts and router updated in training.
  • MQE is ablated to confirm optimal placement and effectiveness in Stage III.

This curriculum transfers vision–language priors and incorporates geometric diversity for robust Brep understanding and text generation.

5. Brep2Text Dataset for Instruction Tuning

BrepLLM is trained and evaluated on Brep2Text, the first large-scale Brep–language question–answer (QA) dataset:

  • Construction: Based on 134,722 industrial Brep models from Text2CAD, with two semantic question tiers automatically generated per model using Qwen-Max: high-level semantic and procedural modeling questions.
  • Scale: Yields 269,444 QA pairs in total, with 200 Breps and 400 QA pairs held out as a test set.
  • Quality Control: Automatic filtering for coherence and spot-checks for correctness.

Brep2Text enables direct instruction-tuning and evaluation of models on native Brep input rather than point-cloud or mesh surrogates.

6. Experimental Evaluation and Ablation

BrepLLM achieves state-of-the-art results on both object captioning and generative classification:

Metric BrepLLM (Brep, 2.9B) MiniGPT-3D (point cloud, 7–13B) Improvement
Qwen-Max (captioning) 58.89 56.58 +2.31
SBERT similarity 73.05 71.64 +1.41
SimCSE similarity 74.46 73.13 +1.33
Human precision (caption) 81.85%
Classification avg (%) 57.05 54.90 +2.15

Ablations reveal:

  • Adaptive UV sampling yields +2.05% lift in Stage I, +0.64% end-to-end.
  • Hierarchical features add +2.42% to +2.87% accuracy.
  • Full three-stage fine-tuning curriculum is optimal (57.05% classification accuracy).
  • MQE's effectiveness is maximized when introduced only at Stage III.

BrepLLM's native processing of Brep data contrasts with prior point-cloud or mesh-based pipelines, enabling fine-grained geometric and topological reasoning unavailable to indirect surrogates. The introduction of hierarchical geometric–topological encoding, modality-aligned contrastive learning, multi-stage LLM integration, and the Brep2Text dataset marks the first instance of end-to-end instruction tuning for native Brep understanding, setting a new performance baseline in 3D CAD reasoning and captioning (Deng et al., 18 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BrepLLM Framework.