Non-Conservative Machine Learning Models
- Non-conservative machine learning models are defined by their relaxation of strict conservation laws, enabling the capturing of irreversible and dissipative dynamics in complex systems.
- They integrate specialized architectures—such as GENERIC-NODE and NNPhD—that decompose dynamics into reversible and non-reversible components for enhanced physical interpretability and robust forecasts.
- These models find practical use in physics, PDE modeling, molecular dynamics, and fairness optimization, offering computational flexibility and improved adaptability despite potential stability challenges.
Non-conservative machine learning models are those that relax strict conservation principles (such as energy, probability, or fairness robustness) in their fundamental architecture or training protocol. Unlike their conservative counterparts, these models are designed either to explicitly capture physically irreversible, dissipative, or out-of-equilibrium dynamics; to allow for greater expressive or computational flexibility; or to address learning objectives (e.g., fairness) that cannot be robustly enforced by conservative means. Advances in this field, ranging from principled bracket structures for thermodynamically consistent deep models to direct force-predicting neural potentials, and fairness-focused anticipatory optimization, have yielded nuanced understandings of the trade-offs intrinsic to non-conservative modeling.
1. Mathematical Foundations and Taxonomy
Non-conservative machine learning models are fundamentally characterized by their treatment of “conservation laws” (energy, entropy, information, fairness guarantees) as auxiliary or soft constraints, as opposed to exact symmetries encoded at the level of parameterization or loss. Mathematically, a canonical distinction arises in dynamical systems with state and vector field :
- Conservative: for some potential , .
- Non-conservative: may have , and does not admit a scalar potential.
This distinction propagates through physical, mathematical, and algorithmic contexts:
- In dynamical system modeling, non-conservative models capture irreversible or dissipative effects—i.e., , with .
- In PDE-informed deep models, non-conservative forms describe systems with non-divergence fluxes or additional elliptic/parabolic contributions not expressible via a conservation law.
- In algorithmic fairness or robustness, non-conservativity refers to forecast-based, forward-looking optimization rather than worst-case robust mechanisms.
Non-conservative learning models can thus be systematically classified by the physical, statistical, or computational principle they depart from, and by whether they address fundamental irreversibility or admit increased model flexibility (Bigi et al., 2024, Giorgini, 3 May 2025, Lee et al., 2021, Almuzaini et al., 2022).
2. Machine Learning in Dissipative and Irreversible Dynamics
A major impetus for non-conservative ML arises in physical systems where irreversibility, dissipation, or entropy production are essential. The metriplectic or GENERIC formalism encodes such systems via a sum of Poisson (reversible) and metric (dissipative) brackets, parameterized as neural modules:
- Canonical evolution: , with skew-symmetric, symmetric positive semi-definite, and degeneracy , , ensuring , .
- In the “GENERIC-NODE” model, all components—energy , entropy , and brackets—are learned as neural architectures, with explicit parameterizations enforcing symmetries and degeneracy conditions (Lee et al., 2021).
- Stochastic extensions rigorously preserve fluctuation–dissipation, ensuring nonequilibrium steady states with nonzero entropy production, distinguishing the approach from penalty- or regularizer-based black-box models.
Empirical experiments reveal that, for forecasting tasks such as a damped oscillator or two-gas container, only structure-preserving non-conservative models robustly extrapolate in both energy and entropy, matching or exceeding black-box and weakly regularized models (Lee et al., 2021).
3. Decomposition and Detection of Non-Conservative Dynamics
Decomposing empirical dynamics into conservative and non-conservative components is not only a modeling tool but also a means for detecting “new physics.” The Neural New-Physics Detector (NNPhD) exemplifies this methodology:
- The force field is split as , with captured by a Lagrangian Neural Network (LNN) and by a universal approximator network (UAN) (Liu et al., 2021).
- The training objective blends force-matching loss with an penalty on the non-conservative component, , where a sharp phase transition at signals the presence of true non-conservative forces.
- Experiments uncover friction in the damped double pendulum, planetary anomalies attributable to unobserved mass, and gravitational wave reaction in inspiraling binaries—demonstrating the efficacy of this architectural split for both physical interpretability and improved generalization (Liu et al., 2021).
Recent developments further generalize data-driven decomposition to the stochastic, multivariate setting using discretized Fokker–Planck operators, k-means Gaussian-mixture score estimation, and matrix splitting (symmetric/antisymmetric) to recover minimal-circulation (irreversible) dynamics required to fit observed autocorrelations (Giorgini, 3 May 2025).
4. Non-Conservative PINNs and PDE Modeling
Physics-Informed Neural Networks (PINNs) have primarily focused on conservative PDE forms due to their favorable properties near discontinuities and shocks. However, many physically relevant PDEs (multiphase, low-Mach, primitive variable formulations) are inherently non-conservative.
Key findings from benchmark studies (Neelan et al., 27 Jun 2025):
- PINNs trained on non-conservative forms (e.g., primitive Euler variables, non-divergence Burgers form) can match the accuracy and stability of conservative-form PINNs in both smooth and shock-dominated regimes, provided:
- Adaptive artificial viscosity is included as a trainable field,
- Gradient-based adaptive weighting is used in the residual loss,
- Rankine–Hugoniot or analogous jump conditions are enforced via the residual.
- For classical benchmarks (Burgers, 1D/2D Euler, shock-tube), both conservative and non-conservative PINNs yield errors and localize shocks to within of the true speed.
- The major limitation is that vanilla non-conservative solvers (non-PINN) exhibit catastrophic error unless regularization and adaptive weighting are introduced.
Guidelines are to work directly in non-conservative variables if the PDE arises in that form, but to ensure regularization, local viscosity, and adaptive weighting to avoid pathological behavior at shocks (Neelan et al., 27 Jun 2025).
5. Atomistic and Molecular Modeling: Direct Force Prediction
Direct prediction of vector-valued forces—dispensing with the requirement that they be gradients of any scalar ML potential—has generated both computational acceleration and controversy in atomistic simulation:
- Conservative: Train potential , obtain forces by —guaranteeing energy conservation, well-defined optimization landscape, and dynamical stability.
- Non-conservative: Train network to output directly, with no underlying scalar. This typically introduces nonzero curl and, therefore, non-vanishing work around cycles (Bigi et al., 2024).
- Empirical evidence:
- Non-conservative models can suffer from ill-posed geometry optimization (no scalar prediction, optimizer may loop/pathologically not converge),
- In molecular dynamics, NVE ensemble yields systematic energy drift and explosive heating,
- Thermostatted (NVT) runs can avert blowup only by severe kinetic distortion.
- Performance gains (2–4× reduction in training time) do not offset the risk of unphysical dynamics.
Hybrid strategies—two-head models (force + lightweight energy head) with multiple time stepping—reduce the computational cost while guaranteeing conservation for long-term stability (Bigi et al., 2024). This approach is considered essential for robust deployment in MD and structure optimization tasks.
6. Anticipatory and Dynamic Non-Conservative Learning in Fairness and Robustness
In domains outside physics, “non-conservative” describes learning strategies that avoid overly pessimistic, robust-worst-case optimization in favor of anticipatory approaches:
- ABCinML (Almuzaini et al., 2022) mitigates algorithmic bias by forecasting future subgroup and label distributions, blending the current imbalance correction with anticipated shifts through a forecasted importance-weighing scheme:
- Convex weighting parameter interpolates between reactive dynamic fairness () and purely anticipatory adjustment (),
- Outperforms both “static” (robust, conservative) and dynamic retraining (post hoc mitigation) in maximum bias, temporal stability, and bias fluctuation metrics over real-world datasets,
- Minimizes loss of accuracy ( AUC change).
- This perspective generalizes non-conservativity: the learning objective is not static fairness under all possible shifts, but rather an adaptive, forward-looking correction responsive to empirically estimated drift.
A plausible implication is that non-conservative strategies in fairness applications can yield favorable trade-offs between over-cautious robustness and pragmatic, temporally consistent performance.
7. Extensions in Representation Learning and Training Algorithms
Non-conservative generalizations of foundational training paradigms and representations have also emerged:
- Equilibrium Propagation (EP) was extended beyond energy-based, reciprocal (conservative) networks to arbitrary non-reciprocal architectures (Scurria et al., 3 Feb 2026):
- The Jacobian of the network dynamics, , is decomposed into symmetric (conservative) and antisymmetric (non-conservative) parts.
- A local term proportional to the antisymmetric component is added in the nudged learning phase, which guarantees exact gradients for training in fully asymmetric or feedforward networks.
- This extension (Asymmetric EP and Dyadic EP) enables efficient, physically consistent gradient propagation in non-conservative architectures, outperforming previous variational approaches, especially as structural asymmetry increases.
- Empirical results on MNIST demonstrate close-to-optimal test accuracy and robustness to asymmetry, where previous methods collapse.
In time-series learning, architectures such as OscillatorNet (Müller et al., 2020) learn second-order dissipative ODEs directly from partial trajectory data, embedding finite-difference updates and residual connections to capture both conservative and non-conservative coefficients.
These developments collectively demonstrate that non-conservative machine learning models are not only theoretically and practically viable but often essential for accurate modeling of irreversible physics, robust learning in shifting environments, or efficient training beyond strict architectural reciprocities. However, careful architectural choices, principled regularization, vigilant monitoring for stability and conservation, and, in many contexts, hybrid “partial conservation” schemes remain critical for safe and interpretable deployment.