Papers
Topics
Authors
Recent
Search
2000 character limit reached

Rapid Bayesian Computation and Estimation for Neural Networks via Log-Concave Coupling

Published 26 Nov 2024 in math.ST and stat.TH | (2411.17667v3)

Abstract: This paper studies a Bayesian estimation procedure for single-hidden-layer neural networks using $\ell_{1}$ controlled weights. We study the structure of the posterior density and provide a representation that makes it amenable to rapid sampling via Markov Chain Monte Carlo (MCMC), and to statistical risk guarantees. The neural network has $K$ neurons, internal weight dimension $d$, and fix the outer weights. Thus, $Kd$ parameters overall. With $N$ data observations, use a gain parameter of $\beta$ in the posterior density. The posterior is multimodal and not naturally suited to rapid mixing of direct MCMC algorithms. For a continuous uniform prior on the $\ell_{1}$ ball, we show that the posterior density can be written as a mixture density with suitably defined auxiliary random variables, where the mixture components are log-concave. Furthermore, when the number of model parameters $Kd$ is large enough that $Kd \geq C(\beta N){2}$, the mixing distribution of the auxiliary random variables is also log-concave. Thus, neuron parameters can be sampled from the posterior by only sampling log-concave densities. The authors refer to the mixture density as a log-concave coupling. For a discrete uniform prior restricted to a grid, we study the statistical risk (generalization error) of procedures based on the posterior. Using a gain of $\beta = C [(\log d)/N]{1/4}$, we demonstrate squared error is on the order $O([(\log d)/N]{1/4})$. Using independent Gaussian data with a variance $\sigma{2} $ that matches the inverse gain, $\beta = 1/\sigma{2}$, we show that the expected Kullback divergence has a cube root power $O([(\log d)/N]{1/3})$. Future work aims to bridge the sampling ability of the continuous uniform prior with the risk control of the discrete uniform prior, resulting in a polynomial time Bayesian training algorithm for neural networks with statistical risk control.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.