Papers
Topics
Authors
Recent
Search
2000 character limit reached

Stochastic Thermodynamics of Learning Parametric Probabilistic Models

Published 4 Oct 2023 in cs.LG | (2310.19802v5)

Abstract: We have formulated a family of machine learning problems as the time evolution of Parametric Probabilistic Models (PPMs), inherently rendering a thermodynamic process. Our primary motivation is to leverage the rich toolbox of thermodynamics of information to assess the information-theoretic content of learning a probabilistic model. We first introduce two information-theoretic metrics: Memorized-information (M-info) and Learned-information (L-info), which trace the flow of information during the learning process of PPMs. Then, we demonstrate that the accumulation of L-info during the learning process is associated with entropy production, and parameters serve as a heat reservoir in this process, capturing learned information in the form of M-info.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of Research and Development, 5(3):183–191, 1961.
  2. L. Szilard. On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings. Z. Phys. 53 (1929), 840.
  3. Charles H. Bennett. The thermodynamics of computation—a review. International Journal of Theoretical Physics, 21(12):905–940, 1982.
  4. Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press, Cambridge, 2010.
  5. The entropy of hawking radiation. Rev. Mod. Phys., 93:035002, Jul 2021.
  6. T. Sagawa J. Parrondo, J. Horowitz. Thermodynamics of information. 2015.
  7. Stochastic Thermodynamics: An Introduction. 2021.
  8. Thermodynamics of prediction. Phys. Rev. Lett., 109:120604, 2012.
  9. Fluctuation theorem with information exchange: Role of correlations in stochastic thermodynamics. Physical review letters, 109(18):180602, 2012.
  10. Entropy production as correlation between system and reservoir, 2009.
  11. How to train your energy-based models. arXiv preprint arXiv:2101.03288, 2021.
  12. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  13. An information-theoretic framework for supervised learning. arXiv preprint arXiv:2203.00246, 2022.
  14. Mutual information learned classifiers: an information-theoretic viewpoint of training deep learning classification systems. arXiv preprint arXiv:2210.01000, 2022.
  15. To compress or not to compress- self-supervised learning and information theory: A review, 2023.
  16. Information-theoretic methods in deep neural networks: Recent advances and emerging opportunities. 2021.
  17. Bernhard C Geiger. On information plane analyses of neural network classifiers–a review. IEEE Transactions on Neural Networks and Learning Systems, 2021.
  18. Where is the information in a deep neural network?, 2019.
  19. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.
  20. Andrew M. Saxe et al. On the information bottleneck theory of deep learning. 2018.
  21. Bernhard C. Geiger. On information plane analyses of neural network classifiers–a review. IEEE Transactions on Neural Networks and Learning Systems, 2021.
  22. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory, pages 5–13, 1993.
  23. Emergence of invariance and disentanglement in deep representations. The Journal of Machine Learning Research, 19(1):1947–1980, 2018.
  24. Tomas M. Cover and Joy A . Thomas. Elements of information theory. John Wiley and Sons, New York, NY, 1991.
  25. Jorma Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080–1100, 1986.
  26. Tightening mutual information-based bounds on generalization error. IEEE Journal on Selected Areas in Information Theory, 1(1):121–130, 2020.
  27. Christian Van den Broeck and Massimiliano Esposito. Ensemble and trajectory thermodynamics: A brief introduction. Physica A: Statistical Mechanics and its Applications, 418:6–16, 2015.
  28. Gradient descent provably optimizes over-parameterized neural networks, 2018.
  29. Learning overparameterized neural networks via stochastic gradient descent on structured data. Advances in neural information processing systems, 31, 2018.
  30. Information processing and the second law of thermodynamics: An inclusive, hamiltonian approach. Physical Review X, 3(4):041003, 2013.
  31. Implicit generation and generalization in energy-based models, 2019.
  32. Christian Maes. Local detailed balance. SciPost Physics Lecture Notes, Jul 2021.
  33. Detailed fluctuation theorems: A unifying perspective. Entropy, 20(9):635, 2018.
  34. Robert Zwanzig. Nonequilibrium Statistical Mechanics. 2001.
  35. How noise affects the hessian spectrum in overparameterized neural networks. arXiv preprint arXiv:1910.00195, 2019.
  36. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
  37. Correlated noise in epoch-based stochastic gradient descent: Implications for weight variances, 2023.
  38. Langevin equation with colored noise for constant-temperature molecular dynamics simulations. Physical Review Letters, 102(2), Jan 2009.
  39. Law of balance and stationary distribution of stochastic gradient descent, 2023.
  40. Machine learning in and out of equilibrium, 2023.
  41. Takahiro Sagawa. Thermodynamic and logical reversibilities revisited. Journal of Statistical Mechanics: Theory and Experiment, 2014(3):P03025, Mar 2014.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 47 likes about this paper.