EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

Published 21 Jun 2023 in cs.LG, cs.AI, and physics.comp-ph | (2306.12059v3)

Abstract: Equivariant Transformers such as Equiformer have demonstrated the efficacy of applying Transformers to the domain of 3D atomistic systems. However, they are limited to small degrees of equivariant representations due to their computational complexity. In this paper, we investigate whether these architectures can scale well to higher degrees. Starting from Equiformer, we first replace $SO(3)$ convolutions with eSCN convolutions to efficiently incorporate higher-degree tensors. Then, to better leverage the power of higher degrees, we propose three architectural improvements -- attention re-normalization, separable $S^2$ activation and separable layer normalization. Putting this all together, we propose EquiformerV2, which outperforms previous state-of-the-art methods on large-scale OC20 dataset by up to $9\%$ on forces, $4\%$ on energies, offers better speed-accuracy trade-offs, and $2\times$ reduction in DFT calculations needed for computing adsorption energies. Additionally, EquiformerV2 trained on only OC22 dataset outperforms GemNet-OC trained on both OC20 and OC22 datasets, achieving much better data efficiency. Finally, we compare EquiformerV2 with Equiformer on QM9 and OC20 S2EF-2M datasets to better understand the performance gain brought by higher degrees.

Abstract PDF Upgrade to Chat

Citations (83)

View on Semantic Scholar

Summary

The paper demonstrates that efficient eSCN convolutions enable scalable higher-degree tensor representations in equivariant transformers.
The study introduces attention re-normalization and separable S2 activation that stabilize training and boost performance in angular-sensitive tasks.
Empirical evaluations show EquiformerV2 achieves up to 9% improvements in force predictions and 4% in energy predictions on the OC20 dataset.

EquiformerV2: Advancements in Equivariant Transformers for Higher-Degree Scaling

The study presented in this paper investigates the scalability of Equivariant Transformers in the context of 3D atomistic systems. Building upon previous advancements seen in Equiformer, this research introduces EquiformerV2, which effectively integrates higher-degree tensor representations by replacing traditional SOp3q convolutions with more computationally efficient eSCN convolutions. This shift enables the architecture to scale up to larger values of L_max (up to 8), addressing the limitations previously faced due to computational complexity when handling higher degrees.

Key Contributions:

Efficient Incorporation of Higher-Degree Tensors: The integration of eSCN convolutions allows for the handling of higher-degree tensors without the prohibitive computational costs associated with SOp3q convolutions. This is achieved by transforming the complex tensor products into more manageable SOp2q linear operations.
Architectural Improvements:
- Attention Re-normalization: By adding a layer normalization step in the attention mechanism, the authors stabilize training, especially as the number of input channels increases with higher L_max.
- Separable S2 Activation and Layer Normalization: These modifications improve the non-linearity across different degrees, enhancing model performance on tasks sensitive to angular information like force prediction.
Empirical Performance: The evaluation on the OC20 dataset demonstrates that EquiformerV2 outperforms state-of-the-art models by up to 9% in force predictions and 4% in energy predictions. It also presents a significant reduction in the DFT calculations needed for accurate adsorption energy predictions, thereby offering a better speed-accuracy trade-off.
Data Efficiency: EquiformerV2 trained on only OC22 datasets surpasses the performance of models like GemNet-OC, which utilize both OC20 and OC22 datasets, indicating superior data efficiency and generalization capabilities.
Comparative Analysis Against Baselines: Through experiments on datasets such as QM9 and OC20 S2EF-2M, the study dissects the contributions of higher-degree representations and architectural improvements, revealing that higher-degree information provides a tangible performance boost.

Implications and Future Directions:

The presented findings highlight the potential of scalable and data-efficient equivariant architectures in applications such as molecular simulations and materials discovery. Beyond the immediate performance improvements, this research underscores the importance of efficient tensor operations in expanding the applicability of machine learning models to more complex atomic interactions.

Future work could explore further optimizing the computability of such models, possibly exploring hybrid approaches that leverage both invariant and equivariant frameworks. Additionally, the implications of these models in real-world applications like protein structure prediction signify a broad scope for future exploration.

In conclusion, EquiformerV2 marks a significant step in enhancing the scalability and efficiency of equivariant models, offering insights and methods that are likely to influence ongoing research in the field of quantum mechanical approximations and molecular sciences.

Markdown Report Issue