Geometry-Aware Molecular LMs

Updated 29 January 2026

Geometry information-aware molecular language models integrate textual and 3D structural data to capture molecular properties and enable design.
They employ architectures like double-graph GNNs, geometry-aware Transformers, and SE(3)-equivariant tokenizations to maintain spatial invariance.
Empirical results show enhanced prediction accuracy and structure generation, though challenges remain in scalability and precision.

A geometry information-aware molecular LLM is a computational architecture that fuses molecular language representations with explicit or implicit three-dimensional (3D) geometric information for tasks such as molecular property prediction, structure generation, and molecular design. These models employ formalism that ensures invariance and/or equivariance to spatial symmetries (SE(3) or E(3)), encode atomic, bond, and higher-order geometric attributes, and leverage either self-supervised or supervised objectives tightly coupled to molecular geometry.

1. Conceptual Foundations and Motivation

The physical, chemical, and biological properties of molecules are directly determined by their 3D structure, comprising bond lengths, bond angles, dihedral angles, and spatial arrangements of functional groups. Traditional LLMs for molecules—built on linear strings (SMILES, SELFIES) or 2D graphs—cannot fully capture geometric phenomena such as conformational strain, steric hindrance, or non-covalent interactions. Geometry information-aware variants address this by explicitly encoding molecular geometry, either during input sequence construction, model design, or both, thereby improving predictive fidelity and enabling direct generation or analysis of 3D structures (Fang et al., 2021).

2. Architectural Paradigms for Geometric Integration

Multiple architectural strategies have been introduced to encode geometric information:

Double-Graph GNNs

ChemRL-GEM uses a double-graph scheme ("GeoGNN") in which atom–bond relations and bond–angle relations are represented as parallel graphs. Message passing alternates between:

An atom–bond graph (nodes: atoms; edges: bonds), where node and edge features include atom types, bond types, and bond-length features (expanded by radial basis functions).
A bond–angle graph (nodes: bonds; edges: angles), where message passing incorporates bond-angle features (Fang et al., 2021).

Geometry-Aware Transformers

GeoT introduces geometry-modulated attention, integrating RBF-embedded atomic pairwise distances directly into the multi-head attention mechanism, allowing the model to propagate both topological and global spatial context. The geometry enters as a multiplicative or additive bias in the attention scores, enabling the Transformer to recover both short-range (σ) and long-range (π) chemical interactions (Kwak et al., 2021).

SE(3)-Invariant and Equivariant Tokenizations

Approaches such as Geo2Seq construct SE(3)-invariant discrete sequences suitable for LLMs by canonical graph labeling and transformation of Cartesian coordinates to invariant spherical representations. This guarantees that structures differing only by rigid body motions are mapped to identical sequences, preserving geometric and atomic fidelity (Li et al., 2024).

Multimodal and Hybrid Architectures

Other models (Chem3DLLM, DeepMoLM, Frag2Seq) combine geometry-encoded molecular representations with additional modalities (images, text, protein context) via cross-attention or fusion layers. These systems utilize losslessly reversible encodings of 3D structures, fragment-based or graph-based vocabulary, and carefully designed adapters to align modalities, often achieving compression and speed benefits (Jiang et al., 14 Aug 2025, Lan et al., 21 Jan 2026, Fu et al., 2024).

Equivariant/E(3)-Compliant Models

GeoMFormer maintains separate invariant and equivariant streams within the same Transformer, using cross-attention mechanisms to allow commingling of non-geometric and geometric features, ensuring proper transformation rules under SE(3) actions (Chen et al., 2024). EquiLLM further integrates a frozen LLM with E(3)-equivariant neural modules, ensuring that only geometric features participate in directional reasoning, while LLM components process statics and context (Li et al., 16 Feb 2025).

3. Geometric Feature Encoding and SE(3)/E(3) Symmetry

Geometry-aware models encode geometric features such as:

Bond lengths (often radial basis embedded)
Bond angles and, occasionally, dihedral angles
Interatomic distances (sometimes as full matrices or binned features)
Spherical coordinates with respect to canonical/equivariant frames
Local conformer fingerprints (e.g., E3FP, hash-based tokens)

These features enforce SE(3) invariance (outputs unchanged by global rotation/translation) or equivariance (output vectors co-rotate), vital for physical correctness. Common mathematical constructs include radial basis function expansions, spherical transformations, and axis–angle representation of fragment orientations. In dual-stream or cross-attention schemes, symmetric and antisymmetric components are maintained explicitly in separate feature blocks.

4. Learning Objectives and Pre-Training Strategies

Geometry information-aware LLMs exploit both standard NLP-style and geometry-specific pretraining objectives:

Geometry-level Pretext Tasks: Recovery of masked bond lengths and angles, atom-pair distances, or even full distance matrices. Regression or classification is performed simultaneously with canonical language-modeling objectives (Fang et al., 2021).
Multimodal Contrastive Alignment: For aligning different modalities (e.g., SMILES and 3D graphs), contrastive losses are utilized to minimize representation discrepancy, sometimes coupled with weakly supervised or mask-and-predict protocols (Wang et al., 22 Jan 2026, Xiao et al., 2024).
Fine-Tuning: Downstream property prediction models are trained atop geometry-aware embeddings, leveraging task-specific heads (e.g., MLPs for classification or regression) and often scaffold splits to minimize data leakage.

Self-supervised learning is widely used to exploit large unlabeled datasets; geometric supervision is extracted from conformer generators (RDKit, DFT, MMFF94) or directly from file formats (XYZ, SDF, PDB).

5. Empirical Results and Impact on Molecular Modeling

Geometry information-aware molecular LLMs have demonstrated notable empirical gains:

Model	Benchmarks	Major Gains vs SOTA	Notes
ChemRL-GEM	MoleculeNet (12)	Regression RMSE ↓8.8%, AUC ↑3.7%	Atom, bond, angle, distance
GeoT	QM9, MD17	MAE competitive with DimeNet++	Attention maps chemical modes
Geo2Seq [w/ LM]	QM9, GEOM-DRUGS	Atom stability 98.9%, validity 97.1%	Bijective, SE(3)-invariant
Frag2Seq	CrossDocked2020	Vina best (-7.37), 300× speedup	Fragment+geometry sequences
Chem3DLLM	SBDD, QM9	Vina -7.21, atom-stability 99.45%	Multimodal, lossless coding
GeoMFormer	OC20, Molecule3D	SOTA energy and force MAEs	Invariant/equivariant streams

These improvements hold across regression and classification tasks, molecular generation, property control, and structure-based drug design pipelines (Fang et al., 2021, Li et al., 2024, Jiang et al., 14 Aug 2025, Chen et al., 2024).

6. Limitations, Challenges, and Future Directions

Scalability: Memory and time complexity can be quadratic in system size (notably for Transformer models without cutoff), which restricts applicability for large macromolecules or materials unless architectural modifications (sparse attention, modular generation) are introduced (Kwak et al., 2021).
Precision vs. Performance: Discretization and quantization of geometric features introduce small but nonzero information loss; however, well-designed approaches achieve RMSDs indistinguishable from reference structures for practical resolutions (Li et al., 2024, Jiang et al., 14 Aug 2025).
Generalization Beyond Small Molecules: Many pipelines focus on drug-like small molecules; transfer to crystals and proteins has been demonstrated but remains challenging, especially when explicit periodicity, heterogeneity, or non-covalent interactions are critical (Flam-Shepherd et al., 2023, Chen et al., 30 Oct 2025).
Integration of External Knowledge: Models such as EquiLLM and GeomCLIP show that LLMs can be interfaced with geometric reasoning modules, introducing new possibilities for cross-modal learning but also new complexities in managing invariants and effective parameter sharing (Li et al., 16 Feb 2025, Xiao et al., 2024).
Interpretability: Attention maps, attribution mechanisms, and geometric masking can provide chemical insight, but there remains a gap between model predictions and physically interpretable rationales, particularly for global attributions (Kwak et al., 2021).

Continued advances in hybrid model architectures, multimodal training, and adaptive, scale-aware geometric embedding are anticipated, with ongoing work addressing physical priors (conservation laws, energy constraints), efficiency in high-throughput settings, and integration into real-world chemical design workflows.

References