Machine learning potentials for complex aqueous systems made simple

Published 31 May 2021 in physics.chem-ph | (2106.00048v2)

Abstract: Simulation techniques based on accurate and efficient representations of potential energy surfaces are urgently needed for the understanding of complex aqueous systems such as solid-liquid interfaces. Here, we present a machine learning framework that enables the efficient development and validation of models for complex aqueous systems. Instead of trying to deliver a globally-optimal machine learning potential, we propose to develop models applicable to specific thermodynamic state points in a simple and user-friendly process. After an initial ab initio simulation, a machine learning potential is constructed with minimum human effort through a data-driven active learning protocol. Such models can afterwards be applied in exhaustive simulations to provide reliable answers for the scientific question at hand. We showcase this methodology on a diverse set of aqueous systems with increasing degrees of complexity. The systems chosen here comprise bulk water with different ions in solution, water on a titanium dioxide surface, as well as water confined in nanotubes and between molybdenum disulfide sheets. Highlighting the accuracy of our approach with respect to the underlying ab initio reference, the resulting models are evaluated in detail with an automated validation protocol that includes structural and dynamical properties and the precision of the force prediction of the models. Finally, we demonstrate the capabilities of our approach for the description of water on the rutile titanium dioxide (110) surface to analyze the structure and mobility of water on this surface. Such machine learning models provide a straightforward and uncomplicated but accurate extension of simulation time and length scales for complex systems.

Abstract PDF Upgrade to Chat

Citations (101)

View on Semantic Scholar

Summary

The paper introduces a user-friendly active learning framework using committee neural network potentials to develop accurate ML potentials for complex aqueous systems.
The methodology employs Behler-Parrinello NNPs combined with a committee approach to select high-error configurations iteratively, enhancing model precision.
The paper demonstrates robust validation across diverse systems, enabling extended simulation times while reducing computational costs compared to traditional AIMD.

Machine Learning Potentials for Complex Aqueous Systems

Introduction

This paper introduces a ML framework aimed at simplifying the development of machine learning potentials (MLPs) for complex aqueous systems, particularly those involving solid-liquid interfaces. The authors propose an efficient, user-friendly process for creating and validating these models which bypasses the limitations of traditional methods, such as ab initio molecular dynamics (AIMD) and force field approaches, to extend molecular simulation length and time scales effectively.

Methodology

The proposed approach leverages an active learning framework using committee neural network potentials (C-NNPs) which minimizes human effort by focusing on specific thermodynamic conditions rather than global optimization. The development process starts with an AIMD simulation to generate a reference trajectory. This is followed by a data-driven active learning protocol that repeatedly selects configurations with the highest estimated errors to improve model accuracy through iteration.

Machine Learning Framework

Base Model Construction: The framework utilizes Behler-Parrinello NNPs to represent potential energy surfaces, incorporating both local and long-range interactions.
Committee Neural Network Potentials (C-NNPs): Multiple NNPs are trained independently using subsets of the training data. The ‘committee’ approach averages model predictions to enhance accuracy and uses the standard deviation among predictions to estimate modeling errors.
Active Learning Protocol: An initial model, derived from a small set of configurations, selects additional training data through an iterative process of prediction and error estimation to refine the model further.

Applications

The methodology was applied to a variety of aqueous systems, including bulk water with dissolved ions, water on titanium dioxide surfaces, and water confined in carbon or boron nitride nanotubes. These systems demonstrate the framework's capacity for accurate representation at reduced computational costs compared to AIMD.

Example: Water on Rutile Titanium Dioxide (110)

One specific application involved analyzing the structure and dynamics of water on a rutile titanium dioxide surface. Using their approach, the researchers extended simulation times to 5 nanoseconds while retaining the high accuracy of DFT calculations. They concluded that water exhibited significant structuring layers on the surface, and the diffusion of water molecules was markedly reduced near the interface.

Evaluation and Validation

An automated protocol assesses model performance in reproducing thermodynamic properties, comparing radial distribution functions (RDFs), vibrational densities of states (VDOS), and force predictions with AIMD data. Consistent accuracy across these measures, demonstrated across the six diverse systems investigated, confirmed the robustness of the developed models.

Discussion

The framework’s emphasis on cost-effective generation of state-specific MLPs provides a significant advantage for researchers exploring complex systems without needing extensive manual intervention or adjustment of hyperparameters. The authors also demonstrate the applicability of their workflow across different ML approaches and outline its potential for studying extensive and diverse aqueous systems, thus broadening the accessibility and practical utility of ML in computational chemistry.

Conclusion

The paper presents a comprehensive approach for developing machine learning potentials tailored to complex aqueous environments using an active learning pipeline that is both computationally efficient and user-friendly. This methodology facilitates the accurate description of such systems over extended timeframes, offering broad implications for computational material science, catalysis studies, and beyond. Future developments might explore the coupling of this framework with more elaborate functional forms or extending it to non-aqueous systems and more complex chemical environments.