Papers
Topics
Authors
Recent
Search
2000 character limit reached

Geometric Learning Dynamics

Published 20 Apr 2025 in cs.LG, q-bio.PE, and quant-ph | (2504.14728v1)

Abstract: We present a unified geometric framework for modeling learning dynamics in physical, biological, and machine learning systems. The theory reveals three fundamental regimes, each emerging from the power-law relationship $g \propto \kappaa$ between the metric tensor $g$ in the space of trainable variables and the noise covariance matrix $\kappa$. The quantum regime corresponds to $a = 1$ and describes Schr\"odinger-like dynamics that emerges from a discrete shift symmetry. The efficient learning regime corresponds to $a = \tfrac{1}{2}$ and describes very fast machine learning algorithms. The equilibration regime corresponds to $a = 0$ and describes classical models of biological evolution. We argue that the emergence of the intermediate regime $a = \tfrac{1}{2}$ is a key mechanism underlying the emergence of biological complexity.

Summary

  • The paper establishes a unified framework where learning dynamics across systems are modeled through a power-law relationship between the metric tensor and noise covariance.
  • It derives distinct regimes—quantum, efficient machine learning, and classical biological evolution—using covariant gradient descent and Fokker-Planck equations.
  • The framework proposes interpolation between regimes via phase transitions, providing insights into adaptive, hierarchical learning in complex systems.

This paper, "Geometric Learning Dynamics" (2504.14728), introduces a unified geometric framework for modeling learning dynamics across disparate systems including physical, biological, and machine learning contexts. The core idea is that the dynamics of trainable variables in these systems can be understood through the interplay of geometry (defined by a metric tensor gg) and noise (characterized by a noise covariance matrix κ\kappa) in the space of these variables.

The theory reveals three fundamental dynamical regimes that emerge from a power-law relationship between the metric tensor and the noise covariance: gκag \propto \kappa^a.

  1. Machine Learning Framework: The paper frames a general learning system as having three interacting dynamics: boundary dynamics (data/environment coupling), activation dynamics (signal propagation through non-trainable variables), and learning dynamics (adjustment of trainable variables, qq). The learning dynamics is described by a covariant gradient descent equation:

    τldqμdt=γgμν(q)H(q,x)qν\tau_l \frac{d q^\mu}{d t} = -\gamma\, g^{\mu\nu}(q)\, \frac{\partial H(q, x)}{\partial q^\nu}

    where τl\tau_l is the learning timescale, γ\gamma is the learning rate, gμνg^{\mu\nu} is the inverse metric tensor, and HH is the loss function. The loss function is modeled as an average part U(q)U(q) and a stochastic component ϕ(q,t)\phi(q, t). This leads to a Langevin equation for the trainable variables:

    dqμdt=γgμν(q)U(q)qν+ημ(q,t)\frac{d q^\mu}{d t} = -\gamma\, g^{\mu\nu}(q)\, \frac{\partial U(q)}{\partial q^\nu} + \eta^\mu(q, t)

    where the noise ημ\eta^\mu is related to the gradient of the stochastic component of the loss. On longer timescales (ττl\tau \gg \tau_l), the dynamics is described by a Fokker-Planck equation (Equation 8 in the paper), which explicitly depends on both the metric tensor gg and the noise covariance matrix κ\kappa. The noise covariance κμν\kappa_{\mu\nu} is derived from the correlation properties of the stochastic loss function. Crucially, the relationship between gg and κ\kappa is found to follow a power-law gμν(κa)μνg_{\mu\nu} \propto (\kappa^a)_{\mu\nu} for different machine learning algorithms:

  2. Quantum Physics Regime (a=1a=1): When gκg \propto \kappa (specifically, g=κg = \kappa), the Fokker-Planck equation simplifies to a covariant form (Equation 9). To describe the dynamics on the learning timescale τl\tau_l, the paper applies the principle of stationary entropy production subject to a constraint on the rate of change of the loss function. Under certain assumptions about 'entropic' fluctuations, this variational principle yields Madelung-like equations: a continuity equation for the probability density PP and a quantum Hamilton-Jacobi equation for a phase function ϕ\phi. By defining a complex 'wavefunction' ψ=Pexp(iϕ/)\psi = \sqrt{P} \exp(-i\phi/\hbar), these equations combine to form a Schrödinger-like equation. This emergent quantum-like dynamics is shown to be valid if a discrete shift in the phase ϕ\phi by Planck constant h=2πh=2\pi\hbar acts as an unobservable symmetry, which the paper suggests could be interpreted in terms of the number of underlying 'neurons'. This regime corresponds to the efficient natural gradient descent in machine learning.
  3. Classical Biology Regime (a=0a=0): Biological dynamics, such as evolution, is modeled as a stochastic process involving random jumps (e.g., genetic drift) and acceptance probabilities (e.g., natural selection). Assuming an acceptance probability function like a sigmoid (Equation 15), the dynamics satisfies a detailed balance condition for a quasi-equilibrium state described by a canonical ensemble Peexp(βH)P_e \propto \exp(-\beta H), where HH is the negative log fitness. The expected trajectory of the state resembles a covariant gradient descent (Equation 20), where the metric is proportional to the covariance matrix of random jumps. If this jump covariance CμνC^{\mu\nu} is state-independent, the space can be globally flattened via a coordinate transformation. This flat-space limit corresponds to a=0a=0 (gδg \propto \delta), which is characteristic of stochastic gradient descent. On longer timescales, the dynamics is described by the Fokker-Planck equation in flat space but potentially with anisotropic diffusion (Equation 23). This regime is argued to capture classical biological evolution, but it doesn't describe the dynamics on the faster τl\tau_l timescale in its full generality.
  4. Neural Physics / Efficient Learning Regime (a=1/2a=1/2):

    The paper generalizes the variational principle (stationary entropy production) to account for the general relationship between gg and κ\kappa. This leads to a 'neural' Hamilton-Jacobi equation (Equation 26) and a 'neural' potential (Equation 27), which generalizes the quantum potential and explicitly depends on κ\kappa. The a=1/2a=1/2 case, gκg \propto \sqrt{\kappa}, is highlighted as corresponding to the AdaBelief algorithm [zhuang2020adabelief]. In this regime, the covariance of the stochastic noise in the Langevin equation becomes proportional to the identity matrix (ημηντ=γ2δμν\langle \eta^\mu \eta^\nu \rangle_\tau = \gamma^2 \delta^{\mu\nu}), implying equal standard deviation for each trainable variable during exploration. This behavior is suggested to be efficient when actively exploring the solution space. As directional derivatives decrease and the metric effectively flattens towards a=0a=0 (gϵδg \propto \epsilon \delta), the fluctuations are suppressed, leading to stabilization near equilibrium, similar to the classical biological regime. The paper refers to the general framework, encompassing the Fokker-Planck and neural Hamilton-Jacobi descriptions, as 'neural physics'.

  5. Complex Systems and Phase Transitions: The paper argues that complex systems require incorporating multiple regimes. It proposes using a metric that interpolates between regimes, for example, g=ϵ2+κg = \sqrt{\epsilon^2 + \kappa} interpolates between a=1/2a=1/2 (large fluctuations) and a=0a=0 (small fluctuations). To include the quantum regime (a=1a=1), a more general metric is suggested: g=ϵ2+κ+ζ2κ2g = \sqrt{\epsilon^2 + \kappa + \zeta^2 \kappa^2}. This metric allows for fast variables (a=1a=1), slow variables (a=0a=0), and intermediate variables (a=1/2a=1/2). The emergence of intermediate scales is interpreted as a phase transition, particularly when parameters shift from ϵζ1\epsilon \zeta \gg 1 to ϵζ1\epsilon \zeta \ll 1. This transition allows for a range of scales where efficient learning dynamics can occur (Equation 34), potentially explaining the emergence of biological complexity and major evolutionary transitions by enabling hierarchical adaptation. The effectiveness of this geometric learning framework in biology relies on the system's ability to store and retrieve historical fluctuation information, suggesting that biological evolution might involve retaining memory of past mutations.

In conclusion, the paper presents a unified view of learning dynamics based on the geometry of the trainable variable space and its relationship to noise. The power-law gκag \propto \kappa^a defines distinct regimes corresponding to quantum mechanics (a=1a=1), efficient machine learning (a=1/2a=1/2), and classical biological evolution (a=0a=0). Complex systems are modeled by metrics that combine these regimes, with the emergence of the intermediate a=1/2a=1/2 regime potentially linked to biological complexity and evolutionary phase transitions. The framework suggests that biological systems need mechanisms for storing historical fluctuation data to effectively leverage geometric learning. The paper proposes this 'neural physics' framework as a basis for unifying descriptions of physical, biological, and artificial learning systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 194 likes about this paper.