Papers
Topics
Authors
Recent
Search
2000 character limit reached

Optimal Condition for Initialization Variance in Deep Neural Networks: An SGD Dynamics Perspective

Published 18 Aug 2025 in stat.ML and cs.LG | (2508.12834v1)

Abstract: Stochastic gradient descent (SGD), one of the most fundamental optimization algorithms in ML, can be recast through a continuous-time approximation as a Fokker-Planck equation for Langevin dynamics, a viewpoint that has motivated many theoretical studies. Within this framework, we study the relationship between the quasi-stationary distribution derived from this equation and the initial distribution through the Kullback-Leibler (KL) divergence. As the quasi-steady-state distribution depends on the expected cost function, the KL divergence eventually reveals the connection between the expected cost function and the initialization distribution. By applying this to deep neural network models (DNNs), we can express the bounds of the expected loss function explicitly in terms of the initialization parameters. Then, by minimizing this bound, we obtain an optimal condition of the initialization variance in the Gaussian case. This result provides a concrete mathematical criterion, rather than a heuristic approach, to select the scale of weight initialization in DNNs. In addition, we experimentally confirm our theoretical results by using the classical SGD to train fully connected neural networks on the MNIST and Fashion-MNIST datasets. The result shows that if the variance of the initialization distribution satisfies our theoretical optimal condition, then the corresponding DNN model always achieves lower final training loss and higher test accuracy than the conventional He-normal initialization. Our work thus supplies a mathematically grounded indicator that guides the choice of initialization variance and clarifies its physical meaning of the dynamics of parameters in DNNs.

Authors (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 4 likes about this paper.