Papers
Topics
Authors
Recent
Search
2000 character limit reached

Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization

Published 17 Apr 2023 in cs.LG and stat.ML | (2304.08309v2)

Abstract: The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. It is theoretically compelling since it can be seen as a Gaussian process posterior with the mean function given by the neural network's maximum-a-posteriori predictive function and the covariance function induced by the empirical neural tangent kernel. However, while its efficacy has been studied in large-scale tasks like image classification, it has not been studied in sequential decision-making problems like Bayesian optimization where Gaussian processes -- with simple mean functions and kernels such as the radial basis function -- are the de-facto surrogate models. In this work, we study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility. However, we also present some pitfalls that might arise and a potential problem with the LLA when the search space is unbounded.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Michael Aerni. On the Laplace approximation for sequential model selection of Bayesian neural networks. Master’s thesis, ETH Zurich, 2022.
  2. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. In NeurIPS, 2020.
  3. Bayesian experimental design for computed tomography with the linearised deep image prior. In ICML Workshop on Adaptive Experimental Design and Active Learning, 2022.
  4. Weight uncertainty in neural networks. In ICML, 2015.
  5. Conservative uncertainty estimation by fitting prior networks. In ICLR, 2020.
  6. Repulsive deep ensembles are Bayesian. In NeurIPS, 2021.
  7. On Stein variational neural network ensembles. arXiv preprint arXiv:2106.10760, 2021.
  8. Laplace redux–effortless Bayesian deep learning. In NeurIPS, 2021.
  9. Mixtures of Laplace approximations for improved post-hoc uncertainty in deep learning. In NeurIPS Workshop of Bayesian Deep Learning, 2021.
  10. Vincent Fortuin. Priors in Bayesian deep learning: A review. International Statistical Review, 90(3), 2022.
  11. BNNpriors: A library for Bayesian neural network inference with different prior distributions. Software Impacts, 9, 2021.
  12. Bayesian neural network priors revisited. In ICLR, 2022.
  13. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
  14. GPyTorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration. In NIPS, 2018.
  15. On upper-confidence bound policies for switching bandit problems. In ALT, 2011.
  16. Roman Garnett. Bayesian optimization. Cambridge University Press, 2023.
  17. Exact Langevin dynamics with stochastic gradients. arXiv preprint arXiv:2102.01691, 2021.
  18. John C. Gittins. Bandit processes and dynamic allocation indices. Journal of the royal statistical society series b-methodological, 41:148–164, 1979.
  19. Alex Graves. Practical variational inference for neural networks. In NIPS, 2011.
  20. Bayesian deep ensembles via the neural tangent kernel. In NeurIPS, 2020.
  21. Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In CVPR, 2019.
  22. Scalable marginal likelihood estimation for model selection in deep learning. In ICML, 2021a.
  23. Improving predictions of Bayesian neural nets via local linearization. In AISTATS, 2021b.
  24. Probing as quantifying inductive bias. In ACL, 2022a.
  25. Invariance learning in deep neural networks with differentiable Laplace approximations. In NeurIPS, 2022b.
  26. Averaging weights leads to wider optima and better generalization. In UAI, 2018.
  27. What are Bayesian neural network posteriors really like? In ICML, 2021.
  28. Neural tangent kernel: Convergence and generalization in neural networks. In NIPS, 2018.
  29. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 1998.
  30. Hands-on Bayesian neural networks - a tutorial for deep learning users. IEEE Computational Intelligence Magazine, 17(2), 2022.
  31. Fast and scalable Bayesian deep learning by weight-perturbation in Adam. In ICML, 2018.
  32. Approximate inference turns deep networks into Gaussian processes. In NeurIPS, 2019.
  33. Scalable and flexible deep Bayesian optimization with auxiliary information for scientific problems. TMLR, 2021.
  34. Adam: A method for stochastic optimization. In ICLR, 2015.
  35. Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems 28. 2015.
  36. Predictive uncertainty quantification with compound density networks. In NeurIPS Workshop in Bayesian Deep Learning, 2019.
  37. Being Bayesian, even just a bit, fixes overconfidence in ReLU networks. In ICML, 2020.
  38. Learnable uncertainty under Laplace approximations. In UAI, 2021a.
  39. An infinite-feature extension for Bayesian ReLU nets that fixes their asymptotic overconfidence. In NeurIPS, 2021b.
  40. Simple and scalable predictive uncertainty estimation using deep ensembles. In NIPS, 2017.
  41. Pierre-Simon Laplace. Mémoires de mathématique et de physique, tome sixieme. 1774.
  42. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 1998.
  43. Deep neural networks as Gaussian processes. In ICLR, 2018.
  44. Structured and efficient variational deep learning with matrix Gaussian posteriors. In ICML, 2016.
  45. Multiplicative normalizing flows for variational Bayesian neural networks. In ICML, 2017.
  46. David JC MacKay. Bayesian interpolation. Neural computation, 4(3), 1992a.
  47. David JC MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5), 1992b.
  48. David JC MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3), 1992c.
  49. A simple baseline for Bayesian uncertainty in deep learning. In NeurIPS, 2019.
  50. James Martens. New insights and perspectives on the natural gradient method. JMLR, 21(146), 2020.
  51. Periodic activation functions induce stationarity. In NeurIPS, 2021.
  52. Laplacian autoencoders for learning stochastic representations. In NeurIPS, 2022.
  53. Data augmentation in Bayesian neural networks and the cold posterior effect. In UAI, 2022.
  54. Radford M Neal. Bayesian learning via stochastic dynamics. In NIPS, 1993.
  55. Radford M Neal et al. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 2011.
  56. Practical deep learning with Bayesian principles. In NeurIPS, 2019.
  57. The neural testbed: Evaluating predictive distributions. In NeurIPS, 2022.
  58. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes in machine learning. 2005.
  59. Pacoh: Bayes-optimal meta-learning with pac-guarantees. In ICML, 2021.
  60. PAC-Bayesian meta-learning: From theory to practice. arXiv preprint arXiv:2211.07206, 2022.
  61. Trust region policy optimization. In ICML, 2015.
  62. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  63. Last layer marginal likelihood for invariance learning. In AISTATS, 2022.
  64. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1), 2015.
  65. Incorporating unlabelled data into Bayesian neural networks. arXiv preprint arXiv:2304.01762, 2023.
  66. Scalable Bayesian optimization using deep neural networks. In ICML, 2015.
  67. Sequential updating of conditional probabilities on directed graphical structures. Networks, 20(5), 1990.
  68. Bayesian optimization with robust Bayesian neural networks. In NIPS, 2016.
  69. Tycho FA van der Ouderaa and Mark van der Wilk. Learning invariant weights in neural networks. In UAI, 2022.
  70. High-dimensional Bayesian optimization with invariance. In ICML Workshop on Adaptive Experimental Design and Active Learning, 2022.
  71. Function space particle optimization for Bayesian neural networks. In ICLR, 2019.
  72. Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
  73. From predictions to decisions: The importance of joint predictive distributions. arXiv preprint arXiv:2107.09224, 2021.
  74. Bayesian deep learning and a probabilistic perspective of generalization. In NeurIPS, 2020.
Citations (6)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.