Kinetic Energy Pre-training in OF-DFT

Updated 5 February 2026

Kinetic Energy Pre-training is the process of using machine learning to approximate the non-interacting kinetic energy functional (Tₛ[n]) in orbital-free DFT.
It employs advanced neural architectures, multilayer perceptrons, and kernel regression techniques to capture non-local electron density features for improved chemical accuracy.
Pre-training procedures incorporate curated datasets, variational gradient regularization, and tailored descriptor engineering to stabilize self-consistent optimizations.

Kinetic Energy Pre-training refers to the process of systematically training machine-learned functionals to predict the non-interacting kinetic energy as a density functional, specifically in the context of orbital-free density functional theory (OF-DFT). This is a fundamental challenge in electronic structure calculations due to the intrinsically non-local and dominant character of the kinetic-energy functional, $T_s[n]$ , compared to exchange-correlation (XC) terms. Successful kinetic energy pre-training enables robust, chemically accurate, and computationally efficient OF-DFT calculations relevant for molecules, atoms, and bulk materials, offering a scalable alternative to conventional Kohn–Sham DFT.

1. Theoretical Foundations and Formulation

The target of kinetic energy pre-training is the non-interacting kinetic energy functional $T_s[n]$ , entering the Hohenberg–Kohn variational principle as

$E_0 = \min_n \left\{ T_s[n] + \int v_{\rm ext}({\bf r}) n({\bf r})\, d^3r + J[n] + E_{\rm xc}[n] \right\},$

where $n({\bf r})$ is the electron density, $v_{\rm ext}$ is the external potential, $J[n]$ is the classical Coulomb energy, and $E_{\rm xc}[n]$ is the exchange-correlation functional.

In Kohn–Sham DFT, $T_s[n]$ is computed via orbitals,

$T_s = \frac12 \sum_{i=1}^N \int |\nabla \phi_i({\bf r})|^2 d^3r,$

while OF-DFT seeks a direct and practical approximation in terms of $n({\bf r})$ and its (non)local features: $T_s[n] = \int t_s[n]({\bf r}) d^3r.$ Associated with $T_s[n]$ is the kinetic potential $v_k({\bf r}) = \delta T_s / \delta n({\bf r})$ , fundamental for self-consistent density optimization.

2. Model Architectures and Representation Strategies

Three principal approaches have been established for kinetic energy pre-training:

Deep Equivariant Neural Architectures: KineticNet employs E(3)-equivariant point convolutions, decomposing convolutional filters into learnable radial functions and spherical harmonics, ensuring full rotational equivariance. The architecture is a compositional stack of encoder (grid-to-atom projection), multi-layer atom–atom interaction, and decoder (atom-to-grid), with nuclear-cusp-adapted filters to handle electron density near nuclei. The choice of high angular momentum channels (up to $l=4$ ) enables expressivity across chemical environments (Remme et al., 2023).
Multilayer Perceptrons with Non-local Features: Alternative approaches employ MLPs acting on local and global density-derived convolutional features. Non-locality is introduced through global convolutions with wide kernels, capturing substantial portions of the Lindhard response and enabling chemical accuracy in systems with extended delocalization (Mazo-Sevillano et al., 2023).
Kernel-based Regression (Gaussian Process Regression): For materials and periodic systems, kinetic energy density averaged over the unit cell is regressed against physically motivated descriptors, notably the terms of the 4th-order gradient expansion and density–effective potential products. Matérn-5/2 kernel GPR is used to construct a global, non-parametric kinetic energy functional, providing transferability and robust error characteristics (Lüder et al., 2024).

3. Pre-training Procedures and Regularization Principles

Pre-training requires carefully curated datasets, explicit regularization, and loss functions tailored to the unique properties of $T_s[n]$ :

Data Generation: Models are exposed to both ground-state and off-equilibrium densities by randomizing $v_{\rm ext}$ and sampling molecular geometries. This ensures robustness for self-consistent OF-DFT and the ability to avoid spurious minima (Remme et al., 2023). In materials, a large set of unary, binary, and ternary compounds is sampled, each under multiple strains, to comprehensively span the relevant descriptor space (Lüder et al., 2024).
Loss Functions and Gradient Regularization: The inclusion of explicit variational gradient regularization is essential. For MLP-based models, a penalty is added for deviations from stationarity of the DFT Lagrangian, enforcing correct functional derivatives ( $\delta T_s / \delta n$ ) at all reference densities. This mitigates the risk of spurious or unstable potentials in self-consistent optimization, a problem far more acute for $T_s[n]$ than for $E_{\rm xc}[n]$ (Mazo-Sevillano et al., 2023).
Target Transformations: Pre-processing steps such as subtracting atomic contributions and pointwise scaling of loss terms (e.g., using the Huber loss) are implemented to dampen the dynamic range, smooth the loss landscape, and avoid division-by-zero in potential predictions (Remme et al., 2023).

4. Descriptor Engineering and Transferability

The selection and design of descriptors directly impacts transferability and data efficiency:

Physical Descriptor Sets: In GPR models for bulk materials, descriptors are constructed as cell-averaged variants of the Thomas–Fermi term, scaled gradients, Laplacians, and products with local effective potentials. These are derived from the 4th-order gradient expansion and represent both local and quasi-nonlocal electronic structure features (Lüder et al., 2024).
Coverage Optimization: Training data must encompass astronomic diversity, such as high and low gradient regimes, to ensure model coverage. The inclusion of unary compounds is critical for extrapolation because these samples span the most extreme regions of the descriptor space, establishing a foundation for accurate prediction on multi-component systems (Lüder et al., 2024).

5. Quantitative Performance and Benchmarking

Model performance is evaluated against multiple criteria relevant to electronic structure calculations:

Model/System	Test RMSE (Energy)	Reference/Accuracy Metric	Notable Results
KineticNet (molecules)	≤1 mHa (most systems)	Chemical accuracy: 1 mHa/electron	Accurate dissociation curves; Ne₂ error 10 mHa but no spurious binding (Remme et al., 2023)
MLP with Var. Reg. (1D/atoms)	<1 kcal/mol	Energy/density agreement	Densities within $< 10^{-3} a_0^{-1}$ of KS; sufficient with 6 samples (Mazo-Sevillano et al., 2023)
GPR (bulk materials)	1–2×10⁻¹⁰ a.u. avg. KED	RMSE; RelErr(B′) ≈13%	Outperforms linear/polynomial models; transfer to binaries/ternaries depends on unaries

In OF-DFT minimization for small systems, learned functionals attain energy errors ΔE ≤ 0.5 mHa and L₁ density norms within 10⁻³–10⁻² electrons, vastly outperforming classical Thomas–Fermi, von Weizsäcker, and similar baselines by two orders of magnitude (Remme et al., 2023).

6. Guidelines and Best Practices for Kinetic-Energy Pre-training

Research to date suggests the following principles are essential for effective pre-training:

Regularize not only energy accuracy but also functional derivatives using gradient-norm (variational) penalties to prevent loss of self-consistent minimizability (Mazo-Sevillano et al., 2023).
Embed non-locality in model architectures via explicit equivariant convolutions or global density convolutions; local models fail to capture necessary response (Remme et al., 2023, Mazo-Sevillano et al., 2023).
Ensure dense sampling of chemically diverse geometries (molecules: bond lengths/angles; solids: strains, stoichiometries), especially focusing on “primitive” compounds to achieve full coverage of the descriptor space (Lüder et al., 2024).
Carry out unit-appropriate scaling and target shifts to handle sharply varying densities near nuclei and improve numerical stability (Remme et al., 2023).
Evaluate on both equilibrium and strong off-equilibrium densities to safeguard functional performance in iterative solvers (Remme et al., 2023).

7. Distinctive Challenges and Future Outlook

Pre-training kinetic energy functionals remains substantially more challenging than training for exchange-correlation due to the heightened non-locality and its critical role in the Hamiltonian. Errors in the predicted kinetic potential $v_k({\bf r})$ can severely destabilize the SCF and OF-DFT optimization loops. These difficulties have motivated the adoption of explicit gradient regularization and physically informed, global descriptors.

The field is progressing toward robust, chemically transferable kinetic energy functionals through hybridization of deep learning, kernel methods, and advanced data engineering. Continued advances are anticipated in data-efficient transfer learning (pre-train/fine-tune), architectural optimization, and broader chemical/materials coverage, targeting orbital-free electronic structure for large and complex systems (Remme et al., 2023, Mazo-Sevillano et al., 2023, Lüder et al., 2024).