- The paper demonstrates that coherent coupling among task-specific critical neurons drives rapid convergence and generalization in deep neural networks.
- A graph-theoretic and thermodynamic framework reveals distinct connectivity and convergence phase transitions, enabling effective model pruning.
- Critical computational graphs extracted during training offer a sparse, interpretable structure that mirrors phenomena in biological neural circuits.
Self-Organized Learning via Coherent Coupling of Critical Neurons
Introduction
This work presents a comprehensive theoretical and empirical analysis of the training dynamics in deep artificial neural networks (ANNs), focusing on the emergence of self-organized learning through the coherent coupling of critical neurons. The study leverages graph-theoretic, statistical physics, and nonequilibrium thermodynamics frameworks to elucidate the mechanisms underlying rapid convergence, generalization, and criticality in ANN training. The central thesis is that training induces the formation of Hebbian-like neural correlation graphs, which undergo distinct phase transitions, and that coherent coupling among strongly activated, task-specific neurons drives both efficient learning and generalization.
Emergence of Correlation Structure and Connectivity Phase Transition
The initial phase of training is characterized by the rapid establishment of persistent activity patterns and strong neuronal correlations, as revealed by Rastermap visualizations, correlation matrices, and neuronal correlation graphs (NCGs). These structures are robust to initialization, hyperparameter choices, and model size. The evolution of NCGs exhibits a flourish-diminish process: a rapid increase in correlation strength and network connectivity, followed by a pruning to a more compact, task-specific structure.
Figure 1: Emergence of activity patterns and connectivity phase transition, showing the evolution of neuronal activity, correlation, and network topology across training iterations.
Graph-theoretic metrics (Frobenius norm, survival probability, mean degree, Forman-Ricci entropy) display sharp, synchronous peaks, indicating a second-order connectivity phase transition. This transition is scale-invariant and persists across a range of correlation thresholds, signifying a global reorganization of network structure. The size of the largest connected component scales as O(M), consistent with critical phenomena in complex networks.
Loss Landscape Concentration and Convergence Phase Transition
Training dynamics are further analyzed through the lens of loss landscape geometry. The empirical loss landscape L^(w) is highly rugged due to batch-wise stochasticity, but batch averaging and training induce a local concentration of measure: per-sample losses become heavy-tailed and concentrate near a degenerate minimum, providing a theoretical basis for generalization.
Figure 2: Local concentration of loss landscapes and convergence phase transition, illustrating the evolution of loss distributions and convergence metrics.
A quasi-thermal free energy principle is introduced, balancing average loss and entropy to define a robust convergence criterion L=ln(Lˉ/ΔS). The training process exhibits a first-order convergence phase transition, with a U-shaped phase boundary in the model size–mobility factor space, indicating an optimal regime for convergence and generalization. Marginal probability flux analysis, based on Fokker-Planck dynamics, reveals that phase transitions in both connectivity and convergence are driven by abrupt changes in probability flux across loss and weight distributions.
Figure 3: Marginal probability flux signature phase transitions, showing stochastic trajectories and flux metrics as indicators of phase transitions.
Criticality and Heavy-Tailed Connectivity
The study identifies ubiquitous criticality in trained ANNs: distributions of correlations, connection weights, gradient noise, and Hessian spectra all exhibit power-law heavy tails. This is most pronounced at the connectivity transition and persists in the terminal regime, where the network exhibits quasi-criticality. The critical redistribution model, inspired by Hebbian self-organization in biological neural circuits, reproduces these heavy-tailed statistics and demonstrates a linear relationship between power-law exponents and critical probability.
Figure 4: Heavy-tailed connectivity and criticality in training, highlighting power-law distributions in correlations, weights, and dynamical metrics.
Critical connections are identified via correlation thresholds and are shown to dominate the concentration of the loss landscape and exhibit slow, irreversible dynamics. Redundant connections are pruned post-transition, further enhancing concentration and generalization.
Critical Computational Graphs and Task Interpretation
A clustering-growing-pruning strategy is used to extract critical computational graphs (CCGs) for each learning task. These sparse subgraphs capture the essential features for task performance, reducing computational cost by orders of magnitude while maintaining or improving generalization. CCGs are modular, task-specific, and their size distributions follow the same power-law as neuronal avalanches in biological circuits, with exponents matching empirical observations in neuroscience.
Figure 5: Critical computational graph interprets task learning, showing sparse, interpretable subgraphs and their statistical properties.
Coherent coupling among critical neurons is quantified via a coupling cost analogous to spin glass models. Training induces a monotonic, logarithmic reduction in coupling cost, characteristic of aging in disordered systems. This coupling segregates task-specific activity on low-dimensional manifolds, aligning with the manifold hypothesis in neural computation.
Unified Framework: Self-Organized Assembly and Thermodynamic Analogy
The proposed framework unifies the observed phenomena: ANNs self-organize through the assembly and coherent coupling of critical neurons, forming sparse, modular computational graphs that encode key features of the data. This process is governed by phase transitions in connectivity and convergence, driven by probability flux and entropy dynamics. The local concentration of loss landscapes ensures robust generalization, and the criticality of network structure aligns with principles observed in biological neural circuits.
Figure 6: Neural network learns via coherent coupling of critical neurons, summarizing the hierarchical self-organization and manifold deformation during training.
The analogy to thermodynamic systems is formalized via the minimum quasi-thermal free energy principle, with training viewed as a process of approaching equilibrium in a purpose-driven system. The framework distinguishes ANNs from Hopfield networks, emphasizing function learning and unbounded generalization over discrete state memorization.
Implications and Future Directions
The findings have significant implications for the design and analysis of neural networks. The identification of critical computational graphs enables efficient model pruning and interpretability, while the thermodynamic perspective provides new avenues for understanding convergence and generalization. The parallels with biological neural circuits suggest that coherent coupling of critical neurons may be a universal mechanism in both artificial and natural intelligence.
Future research should extend these principles to deeper architectures, more complex datasets, and alternative training regimes. Investigating the emergence and dynamics of CCGs in biological systems may yield further insights into energy-efficient computation and the evolution of intelligence.
Conclusion
This study establishes a rigorous theoretical and empirical foundation for self-organized learning in ANNs, driven by the coherent coupling of critical neurons. The framework integrates graph-theoretic, statistical, and thermodynamic perspectives to explain rapid convergence, generalization, and criticality in neural network training. The identification of critical computational graphs and the analogy to biological avalanches provide a pathway toward interpretable, efficient, and robust artificial intelligence systems.