Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression
Abstract: Additive Gaussian Processes (GPs) are popular approaches for nonparametric feature selection. The common training method for these models is Bayesian Back-fitting. However, the convergence rate of Back-fitting in training additive GPs is still an open problem. By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))t$, where $n$ and $t$ denote the data size and the iteration number, respectively. Consequently, Back-fitting requires a minimum of $\mathcal{O}(n\log n)$ iterations to achieve convergence. Based on KPs, we further propose an algorithm called Kernel Multigrid (KMG). This algorithm enhances Back-fitting by incorporating a sparse Gaussian Process Regression (GPR) to process the residuals after each Back-fitting iteration. It is applicable to additive GPs with both structured and scattered data. Theoretically, we prove that KMG reduces the required iterations to $\mathcal{O}(\log n)$ while preserving the time and space complexities at $\mathcal{O}(n\log n)$ and $\mathcal{O}(n)$ per iteration, respectively. Numerically, by employing a sparse GPR with merely 10 inducing points, KMG can produce accurate approximations of high-dimensional targets within 5 iterations.
- C. F. Ansley and R. Kohn. Convergence of the backfitting algorithm for additive models. Journal of the Australian Mathematical Society, 57(3):316–329, 1994.
- S. Bartels and P. Hennig. Conjugate gradients for kernel machines. Journal of Machine Learning Research, 21(55):1–42, 2020.
- A. Borzi and V. Schulz. Multigrid methods for pde optimization. SIAM review, 51(2):361–395, 2009.
- L. Breiman and J. H. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391):580–598, 1985.
- Time series: theory and methods. Springer science & business media, 1991.
- Linear smoothers and additive models. The Annals of Statistics, pages 453–510, 1989.
- Variational orthogonal features. arXiv preprint arXiv:2006.13170, 2020.
- E. Candes and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35(6):2313–2351, 2007. ISSN 00905364. URL http://www.jstor.org/stable/25464587.
- G. Chen and R. Tuo. Projection pursuit gaussian process regression. IISE Transactions, pages 1–11, 2022.
- Kernel packet: An exact and scalable algorithm for gaussian process regression with matérn correlations. Journal of machine learning research, 23(127):1–32, 2022.
- Bart: Bayesian additive regression trees. Annals of Applied Statistics, 6(1):266–298, 2012.
- A bayesian conjugate gradient method. Bayesian Analysis, 2019.
- Wine Quality. UCI Machine Learning Repository, 2009. DOI: https://doi.org/10.24432/C56S3T.
- N. Cressie. Statistics for spatial data. John Wiley & Sons, 2015.
- Preconditioning kernel matrices. In International conference on machine learning, pages 2529–2538. PMLR, 2016.
- Randomly projected additive gaussian processes for regression. In International Conference on Machine Learning, pages 2453–2463. PMLR, 2020.
- Additive gaussian process for computer models with qualitative and quantitative factors. Technometrics, 59(3):283–292, 2017.
- L. Ding and T. Rui. A general theory for kernel packets: from state space model to compactly supported basis. arXiv preprint arXiv:2402.04022, 2024.
- L. Ding and X. Zhang. Sample and computationally efficient stochastic kriging in high dimensions. Operations Research, 2022.
- Random smoothing regularization in kernel gradient descent learning, 2023.
- Sparse gaussian processes with spherical harmonic features. In International Conference on Machine Learning, pages 2793–2802. PMLR, 2020.
- Additive gaussian processes. Advances in neural information processing systems, 24, 2011.
- Y. El-Bachir and A. C. Davison. Fast automatic smoothing for generalized additive models. Journal of Machine Learning Research, 20(173):1–27, 2019.
- J. Fan and J. Jiang. Nonparametric inferences for additive models. Journal of the American Statistical Association, 100(471):890–907, 2005.
- Product kernel interpolation for scalable gaussian processes. In International Conference on Artificial Intelligence and Statistics, pages 1407–1416. PMLR, 2018.
- Backfitting for large scale crossed random effects regressions. The Annals of Statistics, 50(1):560–583, 2022.
- Scaling multidimensional gaussian processes using projected additive approximations. In International Conference on Machine Learning, pages 454–461. PMLR, 2013.
- J. Hartikainen and S. Särkkä. Kalman filtering and smoothing solutions to temporal gaussian process regression models. In 2010 IEEE international workshop on machine learning for signal processing, pages 379–384. IEEE, 2010.
- T. Hastie and R. Tibshirani. Bayesian backfitting (with comments and a rejoinder by the authors. Statistical Science, 15(3):196–223, 2000.
- T. J. Hastie. Generalized additive models. In Statistical models in S, pages 249–307. Routledge, 2017.
- Variational fourier features for gaussian processes. Journal of Machine Learning Research, 18(151):1–52, 2018.
- Methods of conjugate gradients for solving linear systems, volume 49. NBS Washington, DC, 1952.
- Bayesian functional ANOVA modeling using gaussian process prior distributions. Bayesian Analysis, 5(1):123–149, 2010.
- V. Koltchinskii and M. Yuan. Sparsity in multiple kernel learning. The Annals of Statistics, 38(6):3660 – 3695, 2010.
- C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255–282, 1950.
- A. R. Linero. Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association, 113(522):626–636, 2018.
- High-dimensional additive gaussian processes under monotonicity constraints. Advances in Neural Information Processing Systems, 35:8041–8053, 2022.
- X. Lu and R. E. McCulloch. Gaussian processes correlated bayesian additive regression trees. arXiv preprint arXiv:2311.18699, 2023.
- Additive gaussian processes revisited. In International Conference on Machine Learning, pages 14358–14383. PMLR, 2022.
- Sparse additive gaussian process regression. Journal of Machine Learning Research, 23(61):1–34, 2022.
- Z. Luo. Backfitting in smoothing spline anova. The Annals of Statistics, 26(5):1733–1759, 1998.
- When are iterative gaussian processes reliably accurate? arXiv preprint arXiv:2112.15246, 2021.
- E. Mammen and B. U. Park. A simple smooth backfitting method for additive models. The Annals of Statistics, 34(5):2252–2271, 2006.
- The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The Annals of Statistics, 27(5):1443–1490, 1999.
- A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1):55–61, 2000.
- J. D. Opsomer. Asymptotic properties of backfitting estimators. Journal of Multivariate Analysis, 73(2):166–179, 2000.
- V. I. Paulsen and M. Raghupathi. An introduction to the theory of reproducing kernel Hilbert spaces, volume 152. Cambridge university press, 2016.
- J. Quinonero-Candela and C. E. Rasmussen. A unifying view of sparse approximate gaussian process regression. The Journal of Machine Learning Research, 6:1939–1959, 2005.
- Y. Saad. Iterative methods for sparse linear systems. SIAM, 2003.
- Y. Saatçi. Scalable inference for structured Gaussian process models. PhD thesis, Citeseer, 2012.
- V. Sadhanala and R. J. Tibshirani. Additive models with trend filtering. The Annals of Statistics, 47(6):3032–3068, 2019.
- Fast forward selection to speed up sparse gaussian process regression. In International Workshop on Artificial Intelligence and Statistics, pages 254–261. PMLR, 2003.
- E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. Advances in neural information processing systems, 18, 2005.
- I. Steinwart and A. Christmann. Support vector machines. Springer Science & Business Media, 2008.
- G. Strang. Computational science and engineering. Optimization, 551(563):571–586, 2007.
- O. Tatebe. The multigrid preconditioned conjugate gradient method. In NASA. Langley Research Center, The Sixth Copper Mountain Conference on Multigrid Methods, Part 2, 1993.
- M. Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial intelligence and statistics, pages 567–574. PMLR, 2009.
- F. I. Utreras. Convergence rates for multivariate smoothing spline functions. Journal of approximation theory, 52(1):1–27, 1988.
- Exact gaussian processes on a million data points. Advances in neural information processing systems, 32, 2019.
- H. Wendland. Scattered Data Approximation. Cambridge university press, Cambridge, United Kingdom, 2004.
- A. Wilson and H. Nickisch. Kernel interpolation for scalable structured gaussian processes (kiss-gp). In International conference on machine learning, pages 1775–1784. PMLR, 2015.
- Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository, 1995. DOI: https://doi.org/10.24432/C5DW2B.
- Large-scale gaussian processes via alternating projection. arXiv preprint arXiv:2310.17137, 2023.
- Smooth backfitting in generalized additive models. The Annals of Statistics, 36(1):228–260, 2008.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.