GRSD: Coarse-Grained Dynamics in Deep Learning
- GRSD is a coarse-grained dynamical framework that models the flow of error energy across spectral scales during gradient descent training in deep neural networks.
- It employs logarithmic resolution shells and renormalization group techniques to analyze energy conservation and spectral transfer processes.
- The framework elucidates conditions that yield universal power-law scaling, with applications in ResNets, Transformers, and structured state-space models.
The Generalized Resolution–Shell Dynamics (GRSD) framework provides a coarse-grained dynamical theory for the learning dynamics of deep neural networks, modeling the flow of error energy across spectral scales during gradient-based training. In GRSD, the spectral evolution of the network is analyzed in terms of "logarithmic resolution shells," capturing how learning distributes and transports error energy among modes of different resolution. Renormalizable shell dynamics within this theory underpin the emergence of power-law scaling, but such power laws arise only when specific structural and statistical conditions are met during training. The GRSD methodology offers a unified language for characterizing and predicting the universal scaling behaviors observed empirically in deep learning systems (Zhang, 20 Dec 2025).
1. Logarithmic Resolution Shells and Shell Energies
At the core of GRSD is the time-dependent positive semidefinite operator
where is the Jacobian mapping model parameters to outputs. The eigenvalue spectrum of is partitioned into logarithmic shells using a uniform log-spectral grid:
Each shell collects modes with
The shell energy sums the error-energy across all eigenmodes in . In the continuum limit , the energy density
is piecewise constant as a function of . This construction allows the global and local spectral properties of to be analyzed via energy transport across resolution shells.
2. Conservation Law and Renormalized Velocity Field
Subject to locality and incoherence assumptions, GRSD proves that shell energies obey a conservation law analogous to those in turbulence or transport theory. In continuum notation,
where is the spectral flux density (quantifying energy transfer across scales) and accounts for loss-reducing dissipation. The fundamental dynamical observable is the renormalized velocity field,
which describes the local spectral transport rate per unit energy. In discrete shells this specializes to
3. Coarse-Graining and Renormalization Scheme
Coarse-graining in GRSD proceeds by aggregating consecutive shells to form a "block shell," implementing a renormalization group (RG) step. This process involves:
- Spectral coordinate rescaling: or , ;
- Shell index rescaling: ;
- Time rescaling: for some dynamical exponent ;
- Velocity amplitude rescaling: for scaling dimension .
A true RG fixed point is achieved when, up to these rescalings, the form of the evolution equations for and is invariant. The fixed-point condition for the velocity field is
Enforcing invariance for all yields a power-law scaling form .
4. Sufficient Conditions for Renormalizable Shell Dynamics
Rigorous renormalizability in the GRSD framework requires four precise conditions on the learning system:
| Condition | Formal Statement | Interpretation |
|---|---|---|
| 1. Graph-banded Jacobian evolution | Local propagation of gradients in the computation graph | |
| 2. Initial functional incoherence | ; | Weak statistical correlation across distant shells at initialization |
| 3. Controlled Jacobian path and regularity | Uniformly bounded and smooth Jacobian evolution; higher-order moments Lipschitz in | |
| 4. Log-shift invariance of renormalized shell couplings | , with in overparameterized limits | Translation invariance of bin-averaged shell coupling statistics in |
Only when all four conditions are realized does the system admit a RG-closed shell dynamics with universal scaling solutions.
5. Rigidity of Power-Law Velocity and Scaling Laws
The derivation of power-law velocity proceeds from the conservation law and the structural conditions above. Conditions 1–3 guarantee that
whereas Condition 4 (log-shift invariance) constrains to depend only on via relative shifts:
Coupled with the intrinsic covariance of gradient-flow under time rescaling, the only admissible solution for the velocity field, compatible with RG invariance and log-shift symmetry, is
This implies that power-law spectral velocity—and hence the observed empirical scaling laws—are a rigidity phenomenon, not a generic outcome of coarse-graining alone. Such behavior emerges only when all four structural criteria and flow covariances are respected.
6. Model-Specific Examples and Significance
The GRSD framework applies broadly but acutely distinguishes between model classes depending on compliance with the requisite conditions:
- MLPs and non-residual CNNs: Frequently meet Conditions 1–3 (local computation graph, incoherent random initialization, stable Jacobian path), yet lack an intrinsic mechanism for log-shift invariance (Condition 4). Approximate scaling may occur empirically but is not guaranteed by the theory.
- Residual Networks (ResNets): Layerwise Jacobians of the form exhibit statistical stationarity in for large depth, ensuring log-shift invariance (Condition 4) by mixing and averaging effects. The GRSD fixed-point scaling therefore holds robustly in deep residual structures.
- Transformers: The presence of residual connections and normalization supports Conditions 1–3. Condition 4's validity depends on the near-identity and statistical homogeneity of per-block transformations in log-spectral coordinates.
- Structured State-Space Models (e.g., RWKV, SSM): Under uniform exponential stability and appropriately bounded Jacobians, an effective graph-bandedness (Condition 1) can be rigorously established. The presence or absence of a residual mechanism is critical for log-shift invariance.
Once all four GRSD conditions are satisfied, the dynamics enforce an RG-fixed-point regime characterized by a universal power-law shell velocity:
A plausible implication is that the architectural and initialization choices directly determine the applicability of power-law scaling predictions during network training. This suggests constrained regimes of universality in deep learning spectral dynamics, with precise empirical and theoretical boundaries dictated by the structural properties summarized by GRSD (Zhang, 20 Dec 2025).