Stochastic Thermodynamics of Learning Parametric Probabilistic Models
Abstract: We have formulated a family of machine learning problems as the time evolution of Parametric Probabilistic Models (PPMs), inherently rendering a thermodynamic process. Our primary motivation is to leverage the rich toolbox of thermodynamics of information to assess the information-theoretic content of learning a probabilistic model. We first introduce two information-theoretic metrics: Memorized-information (M-info) and Learned-information (L-info), which trace the flow of information during the learning process of PPMs. Then, we demonstrate that the accumulation of L-info during the learning process is associated with entropy production, and parameters serve as a heat reservoir in this process, capturing learned information in the form of M-info.
- R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of Research and Development, 5(3):183–191, 1961.
- L. Szilard. On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings. Z. Phys. 53 (1929), 840.
- Charles H. Bennett. The thermodynamics of computation—a review. International Journal of Theoretical Physics, 21(12):905–940, 1982.
- Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press, Cambridge, 2010.
- The entropy of hawking radiation. Rev. Mod. Phys., 93:035002, Jul 2021.
- T. Sagawa J. Parrondo, J. Horowitz. Thermodynamics of information. 2015.
- Stochastic Thermodynamics: An Introduction. 2021.
- Thermodynamics of prediction. Phys. Rev. Lett., 109:120604, 2012.
- Fluctuation theorem with information exchange: Role of correlations in stochastic thermodynamics. Physical review letters, 109(18):180602, 2012.
- Entropy production as correlation between system and reservoir, 2009.
- How to train your energy-based models. arXiv preprint arXiv:2101.03288, 2021.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- An information-theoretic framework for supervised learning. arXiv preprint arXiv:2203.00246, 2022.
- Mutual information learned classifiers: an information-theoretic viewpoint of training deep learning classification systems. arXiv preprint arXiv:2210.01000, 2022.
- To compress or not to compress- self-supervised learning and information theory: A review, 2023.
- Information-theoretic methods in deep neural networks: Recent advances and emerging opportunities. 2021.
- Bernhard C Geiger. On information plane analyses of neural network classifiers–a review. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Where is the information in a deep neural network?, 2019.
- Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.
- Andrew M. Saxe et al. On the information bottleneck theory of deep learning. 2018.
- Bernhard C. Geiger. On information plane analyses of neural network classifiers–a review. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory, pages 5–13, 1993.
- Emergence of invariance and disentanglement in deep representations. The Journal of Machine Learning Research, 19(1):1947–1980, 2018.
- Tomas M. Cover and Joy A . Thomas. Elements of information theory. John Wiley and Sons, New York, NY, 1991.
- Jorma Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080–1100, 1986.
- Tightening mutual information-based bounds on generalization error. IEEE Journal on Selected Areas in Information Theory, 1(1):121–130, 2020.
- Christian Van den Broeck and Massimiliano Esposito. Ensemble and trajectory thermodynamics: A brief introduction. Physica A: Statistical Mechanics and its Applications, 418:6–16, 2015.
- Gradient descent provably optimizes over-parameterized neural networks, 2018.
- Learning overparameterized neural networks via stochastic gradient descent on structured data. Advances in neural information processing systems, 31, 2018.
- Information processing and the second law of thermodynamics: An inclusive, hamiltonian approach. Physical Review X, 3(4):041003, 2013.
- Implicit generation and generalization in energy-based models, 2019.
- Christian Maes. Local detailed balance. SciPost Physics Lecture Notes, Jul 2021.
- Detailed fluctuation theorems: A unifying perspective. Entropy, 20(9):635, 2018.
- Robert Zwanzig. Nonequilibrium Statistical Mechanics. 2001.
- How noise affects the hessian spectrum in overparameterized neural networks. arXiv preprint arXiv:1910.00195, 2019.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- Correlated noise in epoch-based stochastic gradient descent: Implications for weight variances, 2023.
- Langevin equation with colored noise for constant-temperature molecular dynamics simulations. Physical Review Letters, 102(2), Jan 2009.
- Law of balance and stationary distribution of stochastic gradient descent, 2023.
- Machine learning in and out of equilibrium, 2023.
- Takahiro Sagawa. Thermodynamic and logical reversibilities revisited. Journal of Statistical Mechanics: Theory and Experiment, 2014(3):P03025, Mar 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.