Published 20 Apr 2025 in cs.LG, q-bio.PE, and quant-ph | (2504.14728v1)
Abstract: We present a unified geometric framework for modeling learning dynamics in physical, biological, and machine learning systems. The theory reveals three fundamental regimes, each emerging from the power-law relationship $g \propto \kappaa$ between the metric tensor $g$ in the space of trainable variables and the noise covariance matrix $\kappa$. The quantum regime corresponds to $a = 1$ and describes Schr\"odinger-like dynamics that emerges from a discrete shift symmetry. The efficient learning regime corresponds to $a = \tfrac{1}{2}$ and describes very fast machine learning algorithms. The equilibration regime corresponds to $a = 0$ and describes classical models of biological evolution. We argue that the emergence of the intermediate regime $a = \tfrac{1}{2}$ is a key mechanism underlying the emergence of biological complexity.
The paper establishes a unified framework where learning dynamics across systems are modeled through a power-law relationship between the metric tensor and noise covariance.
It derives distinct regimes—quantum, efficient machine learning, and classical biological evolution—using covariant gradient descent and Fokker-Planck equations.
The framework proposes interpolation between regimes via phase transitions, providing insights into adaptive, hierarchical learning in complex systems.
This paper, "Geometric Learning Dynamics" (2504.14728), introduces a unified geometric framework for modeling learning dynamics across disparate systems including physical, biological, and machine learning contexts. The core idea is that the dynamics of trainable variables in these systems can be understood through the interplay of geometry (defined by a metric tensor g) and noise (characterized by a noise covariance matrix κ) in the space of these variables.
The theory reveals three fundamental dynamical regimes that emerge from a power-law relationship between the metric tensor and the noise covariance: g∝κa.
Machine Learning Framework: The paper frames a general learning system as having three interacting dynamics: boundary dynamics (data/environment coupling), activation dynamics (signal propagation through non-trainable variables), and learning dynamics (adjustment of trainable variables, q). The learning dynamics is described by a covariant gradient descent equation:
τldtdqμ=−γgμν(q)∂qν∂H(q,x)
where τl is the learning timescale, γ is the learning rate, gμν is the inverse metric tensor, and H is the loss function. The loss function is modeled as an average part U(q) and a stochastic component ϕ(q,t). This leads to a Langevin equation for the trainable variables:
dtdqμ=−γgμν(q)∂qν∂U(q)+ημ(q,t)
where the noise ημ is related to the gradient of the stochastic component of the loss.
On longer timescales (τ≫τl), the dynamics is described by a Fokker-Planck equation (Equation 8 in the paper), which explicitly depends on both the metric tensor g and the noise covariance matrix κ. The noise covariance κμν is derived from the correlation properties of the stochastic loss function.
Crucially, the relationship between g and κ is found to follow a power-law gμν∝(κa)μν for different machine learning algorithms:
Quantum Physics Regime (a=1): When g∝κ (specifically, g=κ), the Fokker-Planck equation simplifies to a covariant form (Equation 9). To describe the dynamics on the learning timescale τl, the paper applies the principle of stationary entropy production subject to a constraint on the rate of change of the loss function. Under certain assumptions about 'entropic' fluctuations, this variational principle yields Madelung-like equations: a continuity equation for the probability density P and a quantum Hamilton-Jacobi equation for a phase functionϕ. By defining a complex 'wavefunction' ψ=Pexp(−iϕ/ℏ), these equations combine to form a Schrödinger-like equation. This emergent quantum-like dynamics is shown to be valid if a discrete shift in the phase ϕ by Planck constant h=2πℏ acts as an unobservable symmetry, which the paper suggests could be interpreted in terms of the number of underlying 'neurons'. This regime corresponds to the efficient natural gradient descent in machine learning.
Classical Biology Regime (a=0): Biological dynamics, such as evolution, is modeled as a stochastic process involving random jumps (e.g., genetic drift) and acceptance probabilities (e.g., natural selection). Assuming an acceptance probability function like a sigmoid (Equation 15), the dynamics satisfies a detailed balance condition for a quasi-equilibrium state described by a canonical ensemble Pe∝exp(−βH), where H is the negative log fitness. The expected trajectory of the state resembles a covariant gradient descent (Equation 20), where the metric is proportional to the covariance matrix of random jumps. If this jump covariance Cμν is state-independent, the space can be globally flattened via a coordinate transformation. This flat-space limit corresponds to a=0 (g∝δ), which is characteristic of stochastic gradient descent. On longer timescales, the dynamics is described by the Fokker-Planck equation in flat space but potentially with anisotropic diffusion (Equation 23). This regime is argued to capture classical biological evolution, but it doesn't describe the dynamics on the faster τl timescale in its full generality.
The paper generalizes the variational principle (stationary entropy production) to account for the general relationship between g and κ. This leads to a 'neural' Hamilton-Jacobi equation (Equation 26) and a 'neural' potential (Equation 27), which generalizes the quantum potential and explicitly depends on κ.
The a=1/2 case, g∝κ, is highlighted as corresponding to the AdaBelief algorithm [zhuang2020adabelief]. In this regime, the covariance of the stochastic noise in the Langevin equation becomes proportional to the identity matrix (⟨ημην⟩τ=γ2δμν), implying equal standard deviation for each trainable variable during exploration. This behavior is suggested to be efficient when actively exploring the solution space. As directional derivatives decrease and the metric effectively flattens towards a=0 (g∝ϵδ), the fluctuations are suppressed, leading to stabilization near equilibrium, similar to the classical biological regime. The paper refers to the general framework, encompassing the Fokker-Planck and neural Hamilton-Jacobi descriptions, as 'neural physics'.
Complex Systems and Phase Transitions: The paper argues that complex systems require incorporating multiple regimes. It proposes using a metric that interpolates between regimes, for example, g=ϵ2+κ interpolates between a=1/2 (large fluctuations) and a=0 (small fluctuations). To include the quantum regime (a=1), a more general metric is suggested: g=ϵ2+κ+ζ2κ2. This metric allows for fast variables (a=1), slow variables (a=0), and intermediate variables (a=1/2). The emergence of intermediate scales is interpreted as a phase transition, particularly when parameters shift from ϵζ≫1 to ϵζ≪1. This transition allows for a range of scales where efficient learning dynamics can occur (Equation 34), potentially explaining the emergence of biological complexity and major evolutionary transitions by enabling hierarchical adaptation. The effectiveness of this geometric learning framework in biology relies on the system's ability to store and retrieve historical fluctuation information, suggesting that biological evolution might involve retaining memory of past mutations.
In conclusion, the paper presents a unified view of learning dynamics based on the geometry of the trainable variable space and its relationship to noise. The power-law g∝κa defines distinct regimes corresponding to quantum mechanics (a=1), efficient machine learning (a=1/2), and classical biological evolution (a=0). Complex systems are modeled by metrics that combine these regimes, with the emergence of the intermediate a=1/2 regime potentially linked to biological complexity and evolutionary phase transitions. The framework suggests that biological systems need mechanisms for storing historical fluctuation data to effectively leverage geometric learning. The paper proposes this 'neural physics' framework as a basis for unifying descriptions of physical, biological, and artificial learning systems.