Geometric Learning Dynamics

Published 20 Apr 2025 in cs.LG, q-bio.PE, and quant-ph | (2504.14728v1)

Abstract: We present a unified geometric framework for modeling learning dynamics in physical, biological, and machine learning systems. The theory reveals three fundamental regimes, each emerging from the power-law relationship $g \propto \kappa^a$ between the metric tensor $g$ in the space of trainable variables and the noise covariance matrix $\kappa$. The quantum regime corresponds to $a = 1$ and describes Schr\"odinger-like dynamics that emerges from a discrete shift symmetry. The efficient learning regime corresponds to $a = \tfrac{1}{2}$ and describes very fast machine learning algorithms. The equilibration regime corresponds to $a = 0$ and describes classical models of biological evolution. We argue that the emergence of the intermediate regime $a = \tfrac{1}{2}$ is a key mechanism underlying the emergence of biological complexity.

Abstract PDF Upgrade to Chat

Summary

The paper establishes a unified framework where learning dynamics across systems are modeled through a power-law relationship between the metric tensor and noise covariance.
It derives distinct regimes—quantum, efficient machine learning, and classical biological evolution—using covariant gradient descent and Fokker-Planck equations.
The framework proposes interpolation between regimes via phase transitions, providing insights into adaptive, hierarchical learning in complex systems.

This paper, "Geometric Learning Dynamics" (2504.14728), introduces a unified geometric framework for modeling learning dynamics across disparate systems including physical, biological, and machine learning contexts. The core idea is that the dynamics of trainable variables in these systems can be understood through the interplay of geometry (defined by a metric tensor $g$ ) and noise (characterized by a noise covariance matrix $\kappa$ ) in the space of these variables.

The theory reveals three fundamental dynamical regimes that emerge from a power-law relationship between the metric tensor and the noise covariance: $g \propto \kappa^a$ .

Machine Learning Framework: The paper frames a general learning system as having three interacting dynamics: boundary dynamics (data/environment coupling), activation dynamics (signal propagation through non-trainable variables), and learning dynamics (adjustment of trainable variables, $q$ $q$ ). The learning dynamics is described by a covariant gradient descent equation:

$\tau_l \frac{d q^\mu}{d t} = -\gamma\, g^{\mu\nu}(q)\, \frac{\partial H(q, x)}{\partial q^\nu}$

where $\tau_l$ is the learning timescale, $\gamma$ is the learning rate, $g^{\mu\nu}$ is the inverse metric tensor, and $H$ is the loss function. The loss function is modeled as an average part $U(q)$ and a stochastic component $\phi(q, t)$ . This leads to a Langevin equation for the trainable variables:

$\frac{d q^\mu}{d t} = -\gamma\, g^{\mu\nu}(q)\, \frac{\partial U(q)}{\partial q^\nu} + \eta^\mu(q, t)$

where the noise $\eta^\mu$ is related to the gradient of the stochastic component of the loss. On longer timescales ( $\tau \gg \tau_l$ ), the dynamics is described by a Fokker-Planck equation (Equation 8 in the paper), which explicitly depends on both the metric tensor $g$ and the noise covariance matrix $\kappa$ . The noise covariance $\kappa_{\mu\nu}$ is derived from the correlation properties of the stochastic loss function. Crucially, the relationship between $g$ and $\kappa$ is found to follow a power-law $g_{\mu\nu} \propto (\kappa^a)_{\mu\nu}$ for different machine learning algorithms:
- $a=1$ for natural gradient descent [amari1998natural].
- $a=1/2$ for AdaBelief zhuang2020adabelief.
- $a=0$ for stochastic gradient descent (SGD) [robbins1951stochastic].
Quantum Physics Regime ( $a=1$ ): When $g \propto \kappa$ (specifically, $g = \kappa$ ), the Fokker-Planck equation simplifies to a covariant form (Equation 9). To describe the dynamics on the learning timescale $\tau_l$ , the paper applies the principle of stationary entropy production subject to a constraint on the rate of change of the loss function. Under certain assumptions about 'entropic' fluctuations, this variational principle yields Madelung-like equations: a continuity equation for the probability density $P$ and a quantum Hamilton-Jacobi equation for a phase function $\phi$ . By defining a complex 'wavefunction' $\psi = \sqrt{P} \exp(-i\phi/\hbar)$ , these equations combine to form a SchrÃ¶dinger-like equation. This emergent quantum-like dynamics is shown to be valid if a discrete shift in the phase $\phi$ by Planck constant $h=2\pi\hbar$ acts as an unobservable symmetry, which the paper suggests could be interpreted in terms of the number of underlying 'neurons'. This regime corresponds to the efficient natural gradient descent in machine learning.
Classical Biology Regime ( $a=0$ ): Biological dynamics, such as evolution, is modeled as a stochastic process involving random jumps (e.g., genetic drift) and acceptance probabilities (e.g., natural selection). Assuming an acceptance probability function like a sigmoid (Equation 15), the dynamics satisfies a detailed balance condition for a quasi-equilibrium state described by a canonical ensemble $P_e \propto \exp(-\beta H)$ , where $H$ is the negative log fitness. The expected trajectory of the state resembles a covariant gradient descent (Equation 20), where the metric is proportional to the covariance matrix of random jumps. If this jump covariance $C^{\mu\nu}$ is state-independent, the space can be globally flattened via a coordinate transformation. This flat-space limit corresponds to $a=0$ ( $g \propto \delta$ ), which is characteristic of stochastic gradient descent. On longer timescales, the dynamics is described by the Fokker-Planck equation in flat space but potentially with anisotropic diffusion (Equation 23). This regime is argued to capture classical biological evolution, but it doesn't describe the dynamics on the faster $\tau_l$ timescale in its full generality.
Neural Physics / Efficient Learning Regime ( $a=1/2$ ):

The paper generalizes the variational principle (stationary entropy production) to account for the general relationship between $g$ and $\kappa$ . This leads to a 'neural' Hamilton-Jacobi equation (Equation 26) and a 'neural' potential (Equation 27), which generalizes the quantum potential and explicitly depends on $\kappa$ . The $a=1/2$ case, $g \propto \sqrt{\kappa}$ , is highlighted as corresponding to the AdaBelief algorithm [zhuang2020adabelief]. In this regime, the covariance of the stochastic noise in the Langevin equation becomes proportional to the identity matrix ( $\langle \eta^\mu \eta^\nu \rangle_\tau = \gamma^2 \delta^{\mu\nu}$ ), implying equal standard deviation for each trainable variable during exploration. This behavior is suggested to be efficient when actively exploring the solution space. As directional derivatives decrease and the metric effectively flattens towards $a=0$ ( $g \propto \epsilon \delta$ ), the fluctuations are suppressed, leading to stabilization near equilibrium, similar to the classical biological regime. The paper refers to the general framework, encompassing the Fokker-Planck and neural Hamilton-Jacobi descriptions, as 'neural physics'.
Complex Systems and Phase Transitions: The paper argues that complex systems require incorporating multiple regimes. It proposes using a metric that interpolates between regimes, for example, $g = \sqrt{\epsilon^2 + \kappa}$ interpolates between $a=1/2$ (large fluctuations) and $a=0$ (small fluctuations). To include the quantum regime ( $a=1$ ), a more general metric is suggested: $g = \sqrt{\epsilon^2 + \kappa + \zeta^2 \kappa^2}$ . This metric allows for fast variables ( $a=1$ ), slow variables ( $a=0$ ), and intermediate variables ( $a=1/2$ ). The emergence of intermediate scales is interpreted as a phase transition, particularly when parameters shift from $\epsilon \zeta \gg 1$ to $\epsilon \zeta \ll 1$ . This transition allows for a range of scales where efficient learning dynamics can occur (Equation 34), potentially explaining the emergence of biological complexity and major evolutionary transitions by enabling hierarchical adaptation. The effectiveness of this geometric learning framework in biology relies on the system's ability to store and retrieve historical fluctuation information, suggesting that biological evolution might involve retaining memory of past mutations.

In conclusion, the paper presents a unified view of learning dynamics based on the geometry of the trainable variable space and its relationship to noise. The power-law $g \propto \kappa^a$ defines distinct regimes corresponding to quantum mechanics ( $a=1$ ), efficient machine learning ( $a=1/2$ ), and classical biological evolution ( $a=0$ ). Complex systems are modeled by metrics that combine these regimes, with the emergence of the intermediate $a=1/2$ regime potentially linked to biological complexity and evolutionary phase transitions. The framework suggests that biological systems need mechanisms for storing historical fluctuation data to effectively leverage geometric learning. The paper proposes this 'neural physics' framework as a basis for unifying descriptions of physical, biological, and artificial learning systems.