Papers
Topics
Authors
Recent
Search
2000 character limit reached

CaAdam: Improving Adam optimizer using connection aware methods

Published 31 Oct 2024 in cs.LG | (2410.24216v1)

Abstract: We introduce a new method inspired by Adam that enhances convergence speed and achieves better loss function minima. Traditional optimizers, including Adam, apply uniform or globally adjusted learning rates across neural networks without considering their architectural specifics. This architecture-agnostic approach is deeply embedded in most deep learning frameworks, where optimizers are implemented as standalone modules without direct access to the network's structural information. For instance, in popular frameworks like Keras or PyTorch, optimizers operate solely on gradients and parameters, without knowledge of layer connectivity or network topology. Our algorithm, CaAdam, explores this overlooked area by introducing connection-aware optimization through carefully designed proxies of architectural information. We propose multiple scaling methodologies that dynamically adjust learning rates based on easily accessible structural properties such as layer depth, connection counts, and gradient distributions. This approach enables more granular optimization while working within the constraints of current deep learning frameworks. Empirical evaluations on standard datasets (e.g., CIFAR-10, Fashion MNIST) show that our method consistently achieves faster convergence and higher accuracy compared to standard Adam optimizer, demonstrating the potential benefits of incorporating architectural awareness in optimization strategies.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015.
  2. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  3. D. P. Kingma, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  4. A.-L. Cauchy, “Méthode générale pour la résolution des systèmes d’équations simultanées,” Comptes Rendus de l’Académie des Sciences, vol. 25, no. 1847, pp. 536–538, 1847.
  5. R. Fletcher and M. J. Powell, “A rapidly convergent descent method for minimization,” The Computer Journal, vol. 6, no. 2, pp. 163–168, 1963.
  6. S. Du, J. Lee, H. Li, L. Wang, and X. Zhai, “Gradient descent finds global minima of deep neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 1675–1685.
  7. M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,” Advances in neural information processing systems, vol. 29, 2016.
  8. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
  9. S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016.
  10. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
  11. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization.” Journal of machine learning research, vol. 12, no. 7, 2011.
  12. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang et al., “Large scale distributed deep networks,” Advances in neural information processing systems, vol. 25, 2012.
  13. M. D. Zeiler, “Adadelta: an adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012.
  14. T. Dozat, “Incorporating nesterov momentum into adam,” 2016.
  15. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  16. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.