- The paper presents advancements in making Bayesian inference scalable for large deep learning models and datasets, addressing traditional limitations in model selection and uncertainty estimation.
- Key methodological contributions include applying stochastic gradient descent (specifically introducing Stochastic Dual Descent) to Gaussian Processes and developing a scalable Linearised Laplace Approximation for deep neural networks.
- The research demonstrates the practical application of these scalable methods for uncertainty quantification in large-scale tasks like image classification and sequential decision making such as Bayesian optimisation.
Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks
This thesis, entitled Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks, authored by Javier Antorán Cabiscol, addresses two significant challenges in modern machine learning: model selection and uncertainty estimation, especially in the context of deep learning models. The work systematically advances the application of Bayesian methodologies, traditionally hindered by scalability and complexity issues, particularly for large neural networks and datasets.
Overview of Contributions
- Stochastic Gradient Descent for Gaussian Processes:
- The thesis demonstrates how stochastic gradient descent (SGD), widely successful in deep learning, can also be effectively applied to Gaussian Processes (GPs). This approach allows to circumvent the traditional cubic computational costs associated with exact GP inference, making these powerful models applicable to large-scale datasets.
- A novel optimisation approach named Stochastic Dual Descent (SDD) is introduced. SDD significantly enhances the performance of SGD in performing Gaussian process inference by focusing on a better-conditioned dual objective.
- The associated numerical results confirm that SDD outperforms traditional methods like conjugate gradients (CG) and sparse variational GP approaches in both wall-clock time and predictive performance.
- Scalable Linearised Laplace Approximation:
- The Laplace approximation, initially introduced for neural networks by MacKay, is revisited and extended to scale with modern deep learning practices. Antoran identifies key incompatibilities of classical Laplace approximation with state-of-the-art deep learning techniques, such as neural network normalization, and resolves these through innovative computational strategies.
- This thesis provides a systematic approach to hyperparameter selection using marginal likelihood maximisation, alleviating issues previously encountered due to non-convergence of network training or normalization-induced scale indeterminacy.
- The work proposes a sample-based Expectation-Maximisation algorithm to compute model evidence and posterior estimates efficiently, using methods developed in previous chapters.
- Application to Uncertainty Quantification and Sequential Decision Making:
- The thesis doesn’t merely present theoretical advancements—these methods are exemplified across tasks requiring uncertainty estimation, such as image classification across datasets of a scale previously considered out of reach for Bayesian techniques.
- For tasks such as Bayesian optimisation, the approach empowers GPs to be competitive with deep learning methods, a field where they haven't traditionally excelled.
Theoretical and Practical Implications
The theoretical contributions push the boundaries of how models handle uncertainty and are selected in a data-driven manner. They revitalise classic Bayesian approaches by making them feasible in modern contexts, addressing both overconfidence in model predictions and computational bottlenecks.
Practically, this opens new doors for deploying uncertainty-aware models in real-world scenarios where large-scale, intricate datasets are common—ranging from adaptive experimental designs in scientific research to the robust deployment of AI systems in safety-critical applications.
Speculation on Future Developments
Antoran’s work hints at a convergence of deep learning and Bayesian inference methodologies. Future developments may include further fusion of these domains, such as integrating the developed scalable methods into more complex, hierarchical models or exploring their synergies with emerging methodologies like probabilistic programming or reinforcement learning frameworks.
In summary, this thesis provides substantial contributions to advancing the applicability of Bayesian methods within the domain of deep learning. Its innovations in stochastic optimisation for GPs and linearisation techniques for deep networks stand to influence ongoing research and practical implementations profoundly.