Papers
Topics
Authors
Recent
Search
2000 character limit reached

GRSD: Coarse-Grained Dynamics in Deep Learning

Updated 27 December 2025
  • GRSD is a coarse-grained dynamical framework that models the flow of error energy across spectral scales during gradient descent training in deep neural networks.
  • It employs logarithmic resolution shells and renormalization group techniques to analyze energy conservation and spectral transfer processes.
  • The framework elucidates conditions that yield universal power-law scaling, with applications in ResNets, Transformers, and structured state-space models.

The Generalized Resolution–Shell Dynamics (GRSD) framework provides a coarse-grained dynamical theory for the learning dynamics of deep neural networks, modeling the flow of error energy across spectral scales during gradient-based training. In GRSD, the spectral evolution of the network is analyzed in terms of "logarithmic resolution shells," capturing how learning distributes and transports error energy among modes of different resolution. Renormalizable shell dynamics within this theory underpin the emergence of power-law scaling, but such power laws arise only when specific structural and statistical conditions are met during training. The GRSD methodology offers a unified language for characterizing and predicting the universal scaling behaviors observed empirically in deep learning systems (Zhang, 20 Dec 2025).

1. Logarithmic Resolution Shells and Shell Energies

At the core of GRSD is the time-dependent positive semidefinite operator

M(t)=J(t)J(t),M(t) = J(t)\,J(t)^*,

where J(t)J(t) is the Jacobian mapping model parameters to outputs. The eigenvalue spectrum {λ}\{\lambda\} of M(t)M(t) is partitioned into logarithmic shells using a uniform log-spectral grid:

sα<sα+1,sα=αh,    h>0.s_\alpha < s_{\alpha+1}, \qquad s_\alpha = \alpha h,\;\; h > 0.

Each shell SαS_\alpha collects modes with

Sα={λ:  sαlogλ<sα+1}.S_\alpha = \{\lambda:\; s_\alpha \le \log\lambda < s_{\alpha+1}\}.

The shell energy Eα(t)E_\alpha(t) sums the error-energy across all eigenmodes in SαS_\alpha. In the continuum limit h0h \to 0, the energy density

ε(λ,t)=αEα(t)1Sα(λ)\varepsilon(\lambda,t) = \sum_\alpha E_\alpha(t)\,\mathbf{1}_{S_\alpha}(\lambda)

is piecewise constant as a function of s=logλs = \log\lambda. This construction allows the global and local spectral properties of J(t)J(t) to be analyzed via energy transport across resolution shells.

2. Conservation Law and Renormalized Velocity Field

Subject to locality and incoherence assumptions, GRSD proves that shell energies Eα(t)E_\alpha(t) obey a conservation law analogous to those in turbulence or transport theory. In continuum notation,

tε(λ,t)+λJ(λ,t)=D(λ,t),\partial_t \varepsilon(\lambda, t) + \partial_\lambda J(\lambda, t) = -D(\lambda, t),

where J(λ,t)J(\lambda, t) is the spectral flux density (quantifying energy transfer across scales) and D(λ,t)D(\lambda, t) accounts for loss-reducing dissipation. The fundamental dynamical observable is the renormalized velocity field,

v(λ,t)=J(λ,t)ε(λ,t),v(\lambda, t) = \frac{J(\lambda, t)}{\varepsilon(\lambda, t)},

which describes the local spectral transport rate per unit energy. In discrete shells this specializes to

ddtEk(t)=Fk12(t)Fk+12(t)Dk(t),vk(t)=Fk+12(t)Ek(t).\frac{d}{dt} E_k(t) = F_{k-\frac12}(t) - F_{k+\frac12}(t) - D_k(t),\qquad v_k(t) = \frac{F_{k+\frac12}(t)}{E_k(t)}.

3. Coarse-Graining and Renormalization Scheme

Coarse-graining in GRSD proceeds by aggregating bb consecutive shells to form a "block shell," implementing a renormalization group (RG) step. This process involves:

  • Spectral coordinate rescaling: λbλ\lambda \rightarrow b\,\lambda or ss+τs \rightarrow s+\tau, τ=logb\tau = \log b;
  • Shell index rescaling: kk=k/bk \rightarrow k' = k/b;
  • Time rescaling: tt=bztt \rightarrow t' = b^z t for some dynamical exponent zz;
  • Velocity amplitude rescaling: vv=byvv \rightarrow v' = b^y v for scaling dimension yy.

A true RG fixed point is achieved when, up to these rescalings, the form of the evolution equations for ε\varepsilon and JJ is invariant. The fixed-point condition for the velocity field is

v(k,t)=byv(k,t)v(bλ,bzt)=byv(λ,t).v'(k', t') = b^y v(k, t) \quad \Longleftrightarrow \quad v(b\lambda, b^z t) = b^y v(\lambda, t).

Enforcing invariance for all b>0b > 0 yields a power-law scaling form v(λ,t)λav(\lambda, t) \propto \lambda^a.

4. Sufficient Conditions for Renormalizable Shell Dynamics

Rigorous renormalizability in the GRSD framework requires four precise conditions on the learning system:

Condition Formal Statement Interpretation
1. Graph-banded Jacobian evolution J˙(l)(t)span{J(m)(t):mlK}\dot J^{(l)}(t)\in \mathrm{span}\{J^{(m)}(t): |m-l|\le K\} Local propagation of gradients in the computation graph
2. Initial functional incoherence J(l)(0)J(m)(0)opεlm\|J^{(l)}(0)^* J^{(m)}(0)\|_{\mathrm{op}}\le \varepsilon_{|l-m|}; kεk<\sum_k \varepsilon_k < \infty Weak statistical correlation across distant shells at initialization
3. Controlled Jacobian path and regularity supt[0,T](J(t)op+J˙(t)op)CJ\sup_{t\in[0,T]}(\|J(t)\|_{\mathrm{op}}+\|\dot J(t)\|_{\mathrm{op}})\le C_J Uniformly bounded and smooth Jacobian evolution; higher-order moments Lipschitz in ss
4. Log-shift invariance of renormalized shell couplings K^ij=Kh((ji)h)+err(n,h,L)\widehat{\mathsf K}_{ij} = K_h((j-i)h) + \mathrm{err}(n,h,L), with err0\mathrm{err}\to 0 in overparameterized limits Translation invariance of bin-averaged shell coupling statistics in s=logλs=\log\lambda

Only when all four conditions are realized does the system admit a RG-closed shell dynamics with universal scaling solutions.

5. Rigidity of Power-Law Velocity and Scaling Laws

The derivation of power-law velocity proceeds from the conservation law and the structural conditions above. Conditions 1–3 guarantee that

tε(s,t)+s[ε(s,t)v(s,t)]=D(s,t),s=logλ,\partial_t \varepsilon(s, t) + \partial_s [\varepsilon(s, t)v(s, t)] = -D(s, t),\quad s = \log\lambda,

whereas Condition 4 (log-shift invariance) constrains v(s,t)v(s, t) to depend only on ss via relative shifts:

v(s+τ,t)=α(τ)v(s,tτ).v(s+\tau, t) = \alpha(\tau) v(s, t_\tau).

Coupled with the intrinsic covariance of gradient-flow under time rescaling, the only admissible solution for the velocity field, compatible with RG invariance and log-shift symmetry, is

v(λ,t)=c(t)λa.v(\lambda, t) = c(t)\,\lambda^a.

This implies that power-law spectral velocity—and hence the observed empirical scaling laws—are a rigidity phenomenon, not a generic outcome of coarse-graining alone. Such behavior emerges only when all four structural criteria and flow covariances are respected.

6. Model-Specific Examples and Significance

The GRSD framework applies broadly but acutely distinguishes between model classes depending on compliance with the requisite conditions:

  • MLPs and non-residual CNNs: Frequently meet Conditions 1–3 (local computation graph, incoherent random initialization, stable Jacobian path), yet lack an intrinsic mechanism for log-shift invariance (Condition 4). Approximate scaling may occur empirically but is not guaranteed by the theory.
  • Residual Networks (ResNets): Layerwise Jacobians of the form I+εGkI+\varepsilon G_k exhibit statistical stationarity in s=logλs=\log\lambda for large depth, ensuring log-shift invariance (Condition 4) by mixing and averaging effects. The GRSD fixed-point scaling therefore holds robustly in deep residual structures.
  • Transformers: The presence of residual connections and normalization supports Conditions 1–3. Condition 4's validity depends on the near-identity and statistical homogeneity of per-block transformations in log-spectral coordinates.
  • Structured State-Space Models (e.g., RWKV, SSM): Under uniform exponential stability and appropriately bounded Jacobians, an effective graph-bandedness (Condition 1) can be rigorously established. The presence or absence of a residual mechanism is critical for log-shift invariance.

Once all four GRSD conditions are satisfied, the dynamics enforce an RG-fixed-point regime characterized by a universal power-law shell velocity:

vk(t)kαor equivalentlyv(λ,t)λa.v_k(t) \propto k^{-\alpha} \quad \text{or equivalently} \quad v(\lambda, t) \propto \lambda^a.

A plausible implication is that the architectural and initialization choices directly determine the applicability of power-law scaling predictions during network training. This suggests constrained regimes of universality in deep learning spectral dynamics, with precise empirical and theoretical boundaries dictated by the structural properties summarized by GRSD (Zhang, 20 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalized Resolution--Shell Dynamics (GRSD).