ML Potentials for Hydrogen in Iron

Updated 21 January 2026

MLPs for hydrogen in iron are atomic-scale models that decompose total energy locally to replicate DFT accuracy while enabling tractable MD simulations.
They employ frameworks like neural network potentials, tabGAP, and ACE/PACE to capture diverse Fe–H environments under varying pressures and defect conditions.
Key applications include predicting defect energetics, phase stability, and hydrogen embrittlement mechanisms with energy RMSEs often below 10 meV/atom.

Machine-Learning Potentials (MLPs) for Hydrogen in Iron provide atomic-scale models that emulate density functional theory (DFT) accuracy for iron-hydrogen interactions while enabling tractable large-scale molecular dynamics (MD) simulations. These models underpin the study of hydrogen-induced phenomena such as embrittlement, defect energetics, phase stability, and mechanical failure mechanisms. Several methodological frameworks have emerged, including neural network potentials (NNPs), tabulated Gaussian Approximation Potentials (tabGAP), and Atomic Cluster Expansion (ACE)/PACE models, each tailored for various classes of Fe–H environments and optimal for different domains of pressure, defect density, and computational efficiency.

1. Mathematical Frameworks and Descriptor Construction

MLPs for Fe–H systems are grounded in local energy decompositions of the total potential energy, $E_{\rm tot} = \sum_{i=1}^N E_i$ , where each $E_i$ is a function of descriptors encoding the atomic environment of atom $i$ .

Behler–Parrinello NNPs encode each atom’s neighborhood using symmetry functions $G_i^\mu = \{G_i^{(R)}, G_i^{(A)}\}$ , constructed with a set of radial and angular basis functions evaluated over local atomic configurations. The atomic contributions are computed by feed-forward neural networks mapping these descriptor vectors to energy outputs (Tahmasbi et al., 2023).
DeepMD NNPs utilize hybrid two- and three-body descriptors, leveraging smooth cutoff functions and element-dependent radial and angular basis functions, processed through embedding nets followed by multi-layer dense fitting networks (Zhang et al., 2023).
tabGAP potentials combine short-range Ziegler–Biersack–Littmark (ZBL) repulsive terms with Gaussian process regression across two-body, three-body, and EAM-type descriptors. These employ sparse reference environments, selected via k-means clustering, and kernel functions for regression (Makkonen et al., 26 Nov 2025).
PACE/ACE models expand the atomic local energy in terms of invariant basis functions constructed from Bessel and spherical harmonics, followed by nonlinear density expansions in powers and fractional powers of atomic densities (see below for an explicit expansion) (Ito, 28 Dec 2025):

$E_i = \theta_0 + \theta_1\,\rho_i + \theta_2\,\rho_i^2 + \cdots + \theta_9\,\rho_i^{0.125}$

Descriptor choices and parameterization are tuned to maximize the coverage of relevant chemical environments and maintain invariance under permutation, rotation, and reflection.

2. Dataset Generation and Iterative Sampling Strategies

Extensive DFT databases are essential for training transferable MLPs. These datasets are generated using structured protocols to ensure broad coverage of Fe–H configurations:

Iterative minima hopping (MH) workflows combine provisional NNPs with global structure search algorithms to explore potential energy surfaces (PES) under varying pressures (ambient to $\sim$ 100 GPa). Resulting low-enthalpy structures (including clusters and crystals, with atoms ranging from 4 to 32 or more) undergo DFT calculations and are appended to the pool for improved coverage in subsequent iterations (Tahmasbi et al., 2023).
Concurrent learning approaches (DP-GEN framework) dynamically sample neglected environments, such as defects, strained cells, grain boundaries (GBs), and hydrogen segregation sites, enhancing data diversity and model generalizability (Ito, 28 Dec 2025).
Datasets typically comprise tens of thousands of atomic configurations, including pure Fe, Fe with dilute/high H concentrations, vacancy clusters with H, surfaces, stacking faults, GBs, self-interstitials, and H $_2$ /H $_3$ clusters (Makkonen et al., 26 Nov 2025, Zhang et al., 2023).

A typical training/validation split is approximately 80–90% training and 10–20% holdout for benchmarking; cross-validation strategies are sometimes omitted if broad independent testing is employed (Makkonen et al., 26 Nov 2025).

3. Model Training Procedures and Performance Metrics

MLPs for Fe–H systems are optimized using weighted least squares loss functions balancing energy and force contributions (occasionally including virial/stress terms):

$\mathcal{L} = w_E \sum_s (E_s^{\rm ML}-E_s^{\rm DFT})^2 + w_F \sum_s \sum_i \| \mathbf{F}_{si}^{\rm ML} - \mathbf{F}_{si}^{\rm DFT}\|^2$

Regularization techniques (e.g., Tikhonov) and automatic weighting schemes enhance model robustness. For tabGAP and similar frameworks, regression is performed on system-specific multiples of the regularization parameter $\sigma$ , dependent on the species and local chemical environment (Makkonen et al., 26 Nov 2025, Ito, 28 Dec 2025).

Energy RMSE for high-fidelity NNP and ACE-based models are typically $E_i$ 010 meV/atom, with force RMSE spanning $E_i$ 1100 meV/Å. tabGAP achieves:

Model	Energy RMSE (meV/atom)	Force RMSE (eV/Å)
tabGAP	$E_i$ 22.2	$E_i$ 30.08
DeepMD NNP	4.8	0.072
PACE	5.85	0.0873

tabGAP and ACE models outperform classical EAM potentials, not only in energy/force accuracy but also in the faithful reproduction of H point-defect energetics, migration barriers, H–vacancy binding increments, and H–dislocation interactions (Makkonen et al., 26 Nov 2025, Zhang et al., 2023, Ito, 28 Dec 2025).

4. Physical Phenomena Captured: Defects, Phase Stability, and Embrittlement Mechanisms

MLPs enable direct simulation of a broad scope of physical phenomena relevant to Fe–H materials:

Defect energetics: Accurate reproduction of H solution energetics in tetrahedral and octahedral sites, vacancy and interstitial formation energies, and migration barriers. For example, tabGAP yields $E_i$ 4 eV and $E_i$ 5 eV, within $E_i$ 6 eV of DFT benchmarks (Makkonen et al., 26 Nov 2025).
Elastic constants and mechanical property trends: tabGAP and DeepMD models replicate the DFT-calculated variation of $E_i$ 7 with H concentration and temperature, capturing correct slopes in moduli changes per at.% H (Makkonen et al., 26 Nov 2025, Zhang et al., 2023).
Hydrogen-dislocation binding and segregation: Machine-learned potentials pinpoint deep binding sites (E1–E2) in screw dislocation cores, with errors relative to DFT $E_i$ 8 eV (Makkonen et al., 26 Nov 2025). PACE excels in edge and screw dislocation core segregation, reproducing DFT energetics within $E_i$ 9 eV (Ito, 28 Dec 2025).
Hydrogen embrittlement mechanisms:
- Hydrogen-enhanced decohesion (HEDE) is observed as H accumulates at crack/notch tips, lowering Fe–Fe bond strength and accelerating cleavage (Makkonen et al., 26 Nov 2025, Zhang et al., 2023).
- Hydrogen-enhanced strain-induced vacancy (HESIV) is evidenced by an increased vacancy concentration via H–dislocation interactions (Makkonen et al., 26 Nov 2025).
- Grain boundary segregation induces a ductile-to-brittle transition, with H pinning migration, facilitating intergranular nanovoids and crack networks in polycrystals (Zhang et al., 2023, Ito, 28 Dec 2025).
Phase stability and global structure discovery under pressure: Minima hopping with Behler–Parrinello NNPs discovers all experimentally known FeH phases (dhcp, hcp, fcc) and numerous low-enthalpy stacking-fault phases (N1–N8), reproducing equations of state and phonon dispersions consistent with experiment (Tahmasbi et al., 2023).

5. Computational Efficiency, Scalability, and Practical Deployment

MLP construction methodologies have advanced significantly, enabling near-DFT fidelity at practical cost for large-scale MD:

DeepMD-based NNPs achieve $i$ 00.64 μs/atom/step on GPU hardware, $i$ 143 $i$ 2 speedup versus previous CPU-based NNPs. EAM potentials remain faster ( $i$ 30.07 μs/atom/step) but lack high accuracy, while tabGAP and ACE models are intermediate, allowing million-atom simulations at feasible cost (Zhang et al., 2023, Makkonen et al., 26 Nov 2025, Ito, 28 Dec 2025).
tabGAP is %%%%24 $\sim$ 25%%%% slower than EAM, but $i$ 6200 $i$ 7 faster than many neural-network-based MLPs; PACE is $i$ 8– $i$ 9 faster than standard NNPs. Benchmarking demonstrates convergence in computation time for system sizes $G_i^\mu = \{G_i^{(R)}, G_i^{(A)}\}$ 0 atoms (Ito, 28 Dec 2025).
Concurrent-learning strategies minimize redundant DFT sampling, generating datasets that accelerate training connectivity and transferability while limiting computational resource requirements (Ito, 28 Dec 2025).

Models are typically deployed via LAMMPS and DeepMD interfaces, leveraging multi-GPU parallelization.

6. Transferability, Limitations, and Extensions

Transferability: Models trained on diverse datasets (including clusters, crystals, surfaces, GBs, defects, and high-pressure phases) display high transferability across stoichiometric Fe–H bulk, surface, and defect-containing configurations. PACE models retain high fidelity for environments not explicitly seen during training, confirmed by active-learning extrapolation-grading analyses (Tahmasbi et al., 2023, Ito, 28 Dec 2025).
Limitations: For delicate phase-ranking (sub- $G_i^\mu = \{G_i^{(R)}, G_i^{(A)}\}$ 1 meV/atom differences) or explicit simulation of defect diffusion and surface segregation, current NNPs and tabGAP are often insufficiently precise (energy RMSEs $G_i^\mu = \{G_i^{(R)}, G_i^{(A)}\}$ 2 meV/atom). Extension to defect-rich, surface, or non-equilibrium environments requires dedicated expansion of training pools. Alloying elements (C, Mn, Cr, etc.) are not included in existing binary Fe–H models.
Outlook: Advances in descriptor richness (e.g., SOAP, moment-tensor potentials), deeper neural networks, explicit inclusion of stress components in training, and extension to Fe–C–H or ternary systems are recommended for improved predictive power. Integration with continuum techniques may enable bridging to experimentally relevant strain rates (Ito, 28 Dec 2025).

7. Table of Key Physical Quantities Predicted by Fe–H MLPs

Quantity	DFT	tabGAP	DeepMD NNP	PACE
$G_i^\mu = \{G_i^{(R)}, G_i^{(A)}\}$ 3 (eV)	0.228	0.227	0.267	<0.02 error
$G_i^\mu = \{G_i^{(R)}, G_i^{(A)}\}$ 4 (eV)	0.33	0.375	0.411	<0.02 error
Migration barrier $G_i^\mu = \{G_i^{(R)}, G_i^{(A)}\}$ 5 (eV)	0.09–0.10	0.11–0.15	0.099	<0.03 error
H–vacancy binding (first H, eV)	0.498	0.577	–	<0.03 error
Energy RMSE (meV/atom)	–	2.23	4.8	5.85
Force RMSE (meV/Å)	–	77	72	87

These values exemplify the level of agreement generally achieved with current MLPs for Fe–H systems. All models are benchmarked against DFT/experiment and demonstrate high reliability for key defect and phase properties.

MLPs for hydrogen in iron are now established as indispensable tools for unraveling the atomic-scale mechanisms of hydrogen embrittlement, phase stability, and defect evolution in Fe-based materials. The frameworks outlined above demonstrate rigorous construction, validation, and applicability for large-scale simulations, with ongoing development toward higher transferability and accuracy across a growing range of physical scenarios (Tahmasbi et al., 2023, Makkonen et al., 26 Nov 2025, Ito, 28 Dec 2025, Zhang et al., 2023).