Graph-Coarsening for Machine Learning Coarse-grained Molecular Dynamics

Published 22 Jul 2025 in cond-mat.soft | (2507.16531v1)

Abstract: Coarse-grained (CG) molecular dynamics (MD) simulations can simulate large molecular complexes over extended timescales by reducing degrees of freedom. A critical step in CG modeling is the selection of the CG mapping algorithm, which directly influences both accuracy and interpretability of the model. Despite progress, the optimal strategy for coarse-graining remains a challenging task, highlighting the necessity for a comprehensive theoretical framework. In this work, we present a graph-based coarsening approach to develop CG models. Coarse-grained sites are obtained through edge contractions, where nodes are merged based on a local variational cost metric while preserving key spectral properties of the original graph. Furthermore, we illustrate how Message Passing Atomic Cluster Expansion (MACE) can be applied to generate ML-CG potentials that are not only highly efficient but also accurate. Our approach provides a bottom-up, theoretically grounded computational method for the development of systematically improvable CG potentials.

Abstract PDF Upgrade to Chat

Summary

The paper presents a graph-based coarsening method that automates coarse-grained mapping using spectral similarity to minimize local variation cost.
It integrates the MACE architecture with a force-matching protocol to train ML potentials that preserve key structural and thermodynamic properties.
Empirical results on benchmark molecules show that the CG models accurately reproduce bond distributions and radial functions with low Jensen-Shannon Divergence.

Graph-Coarsening for Machine Learning Coarse-grained Molecular Dynamics

This paper introduces a theoretically principled, unsupervised graph-based coarsening framework for constructing coarse-grained (CG) molecular dynamics (MD) models, and demonstrates its integration with the Message Passing Atomic Cluster Expansion (MACE) architecture for machine-learned CG potentials. The approach is evaluated on benchmark molecular systems, with a focus on preserving structural and thermodynamic properties while achieving computational efficiency and interpretability.

Theoretical Framework and Methodology

The central challenge addressed is the systematic definition of CG mapping operators, which directly impact the accuracy and transferability of CG models. The authors propose a multilevel graph coarsening algorithm, grounded in spectral graph theory, to automate the selection of CG sites. The molecular system is represented as a weighted graph, where nodes correspond to heavy atoms and edges encode chemical connectivity and spatial proximity. Coarsening is performed via edge contractions, guided by a local variational cost metric that quantifies the spectral distortion induced by merging candidate node sets.

Two candidate selection strategies are introduced:

Local Variation Neighborhood (LVN): Candidate sets are defined as one-hop neighborhoods, favoring local aggregation.
Local Variation Cliques (LVC): Candidate sets are maximal cliques, enabling the preservation of cyclic motifs and ring structures.

At each coarsening level, contraction sets are greedily selected to minimize the local variation cost, subject to a global spectral similarity constraint. The resulting mapping matrix $P$ is deterministic, interpretable, and preserves key chemical and topological features.

For force field parameterization, the MACE architecture is trained using a force-matching loss, where atomic forces from all-atom MD are projected onto the CG space via the coarsening operator. The MACE model, based on the Atomic Cluster Expansion, is equivariant and systematically improvable, enabling accurate modeling of many-body interactions at the CG level.

Empirical Evaluation

The framework is validated on three molecular systems of increasing complexity: Aspirin, Azobenzene, and 3-(benzyloxy)pyridin-2-amine (3BPA), using data from the MD17 benchmark. For each system, the following protocol is employed:

All-atom, non-hydrogen (noh), and CG representations are generated.
The graph coarsening algorithm is applied to obtain CG mappings with specified coarsening ratios.
MACE is trained on projected coordinates and forces, with hyperparameters tuned for each resolution.
Structural observables (bond length distributions, angles, dihedrals, and radial distribution functions) are compared between CG simulations and CG-mapped ground truth.

Key findings include:

Aspirin: The LVN-based coarsening yields a five-bead CG model that accurately reproduces bond length and angle distributions, as well as the radial distribution function (RDF), with Jensen-Shannon Divergence (JSD) values indicating high fidelity to the reference data.
Azobenzene: The LVC-based coarsening preserves aromatic ring topology, and the MACE-CG model captures N=N bond lengths, C-N distances, and dihedral angle distributions with strong agreement to ground truth. RDFs and JSD metrics confirm structural consistency.
3BPA: The LVC approach enables the representation of ring and functional group motifs as distinct beads. Bond length and RDF analyses demonstrate that the CG model retains essential equilibrium properties, with low JSD values.

Across all systems, the CG models generated by the proposed workflow exhibit strong agreement with reference statistics, and the MACE-CG potentials are shown to be both accurate and computationally efficient.

Implementation Considerations

The graph coarsening algorithm is unsupervised, parameter-free (except for the coarsening ratio and edge connectivity), and CPU-bound, offering significant computational advantages over deep learning-based mapping schemes. The deterministic nature of the mapping enhances reproducibility and interpretability, and the method is agnostic to the underlying molecular system, supporting broad applicability.

The MACE architecture is leveraged for its equivariance and systematic improvability, and is trained using standard force-matching protocols. The integration of graph-based coarsening with MACE enables the direct construction of CG potentials without reliance on predefined energy terms or prior potentials.

Limitations and Future Directions

A notable limitation is the absence of learned or adaptive coarsening schemes; the method does not exploit neural network-based graph condensation, which may limit adaptability for highly heterogeneous systems. The LVN strategy can over-aggregate in cyclic graphs, but the LVC approach mitigates this by targeting cliques. The current study is restricted to small organic molecules; extension to larger biomolecular systems and proteins is proposed as future work.

Implications and Outlook

This work provides a rigorous, interpretable, and efficient framework for CG mapping in molecular simulations, addressing a longstanding challenge in the field. The combination of spectral graph coarsening and equivariant ML potentials offers a pathway toward systematically improvable, transferable CG models. The approach is well-suited for integration with existing MD workflows and can facilitate the development of CG models for complex systems where manual mapping is infeasible.

From a theoretical perspective, the use of spectral similarity as a guiding principle for coarsening establishes a foundation for further exploration of graph-theoretic methods in molecular modeling. Practically, the method's computational efficiency and automation are likely to accelerate the adoption of ML-based CG models in chemistry and materials science.

Future developments may include the incorporation of adaptive, data-driven coarsening strategies, extension to heterogeneous and multi-component systems, and integration with top-down experimental constraints. The framework also provides a template for leveraging graph-based representations in other domains of scientific machine learning.

Markdown Report Issue