Papers
Topics
Authors
Recent
Search
2000 character limit reached

Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression

Published 20 Mar 2024 in stat.ML and cs.LG | (2403.13300v2)

Abstract: Additive Gaussian Processes (GPs) are popular approaches for nonparametric feature selection. The common training method for these models is Bayesian Back-fitting. However, the convergence rate of Back-fitting in training additive GPs is still an open problem. By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))t$, where $n$ and $t$ denote the data size and the iteration number, respectively. Consequently, Back-fitting requires a minimum of $\mathcal{O}(n\log n)$ iterations to achieve convergence. Based on KPs, we further propose an algorithm called Kernel Multigrid (KMG). This algorithm enhances Back-fitting by incorporating a sparse Gaussian Process Regression (GPR) to process the residuals after each Back-fitting iteration. It is applicable to additive GPs with both structured and scattered data. Theoretically, we prove that KMG reduces the required iterations to $\mathcal{O}(\log n)$ while preserving the time and space complexities at $\mathcal{O}(n\log n)$ and $\mathcal{O}(n)$ per iteration, respectively. Numerically, by employing a sparse GPR with merely 10 inducing points, KMG can produce accurate approximations of high-dimensional targets within 5 iterations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. C. F. Ansley and R. Kohn. Convergence of the backfitting algorithm for additive models. Journal of the Australian Mathematical Society, 57(3):316–329, 1994.
  2. S. Bartels and P. Hennig. Conjugate gradients for kernel machines. Journal of Machine Learning Research, 21(55):1–42, 2020.
  3. A. Borzi and V. Schulz. Multigrid methods for pde optimization. SIAM review, 51(2):361–395, 2009.
  4. L. Breiman and J. H. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391):580–598, 1985.
  5. Time series: theory and methods. Springer science & business media, 1991.
  6. Linear smoothers and additive models. The Annals of Statistics, pages 453–510, 1989.
  7. Variational orthogonal features. arXiv preprint arXiv:2006.13170, 2020.
  8. E. Candes and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35(6):2313–2351, 2007. ISSN 00905364. URL http://www.jstor.org/stable/25464587.
  9. G. Chen and R. Tuo. Projection pursuit gaussian process regression. IISE Transactions, pages 1–11, 2022.
  10. Kernel packet: An exact and scalable algorithm for gaussian process regression with matérn correlations. Journal of machine learning research, 23(127):1–32, 2022.
  11. Bart: Bayesian additive regression trees. Annals of Applied Statistics, 6(1):266–298, 2012.
  12. A bayesian conjugate gradient method. Bayesian Analysis, 2019.
  13. Wine Quality. UCI Machine Learning Repository, 2009. DOI: https://doi.org/10.24432/C56S3T.
  14. N. Cressie. Statistics for spatial data. John Wiley & Sons, 2015.
  15. Preconditioning kernel matrices. In International conference on machine learning, pages 2529–2538. PMLR, 2016.
  16. Randomly projected additive gaussian processes for regression. In International Conference on Machine Learning, pages 2453–2463. PMLR, 2020.
  17. Additive gaussian process for computer models with qualitative and quantitative factors. Technometrics, 59(3):283–292, 2017.
  18. L. Ding and T. Rui. A general theory for kernel packets: from state space model to compactly supported basis. arXiv preprint arXiv:2402.04022, 2024.
  19. L. Ding and X. Zhang. Sample and computationally efficient stochastic kriging in high dimensions. Operations Research, 2022.
  20. Random smoothing regularization in kernel gradient descent learning, 2023.
  21. Sparse gaussian processes with spherical harmonic features. In International Conference on Machine Learning, pages 2793–2802. PMLR, 2020.
  22. Additive gaussian processes. Advances in neural information processing systems, 24, 2011.
  23. Y. El-Bachir and A. C. Davison. Fast automatic smoothing for generalized additive models. Journal of Machine Learning Research, 20(173):1–27, 2019.
  24. J. Fan and J. Jiang. Nonparametric inferences for additive models. Journal of the American Statistical Association, 100(471):890–907, 2005.
  25. Product kernel interpolation for scalable gaussian processes. In International Conference on Artificial Intelligence and Statistics, pages 1407–1416. PMLR, 2018.
  26. Backfitting for large scale crossed random effects regressions. The Annals of Statistics, 50(1):560–583, 2022.
  27. Scaling multidimensional gaussian processes using projected additive approximations. In International Conference on Machine Learning, pages 454–461. PMLR, 2013.
  28. J. Hartikainen and S. Särkkä. Kalman filtering and smoothing solutions to temporal gaussian process regression models. In 2010 IEEE international workshop on machine learning for signal processing, pages 379–384. IEEE, 2010.
  29. T. Hastie and R. Tibshirani. Bayesian backfitting (with comments and a rejoinder by the authors. Statistical Science, 15(3):196–223, 2000.
  30. T. J. Hastie. Generalized additive models. In Statistical models in S, pages 249–307. Routledge, 2017.
  31. Variational fourier features for gaussian processes. Journal of Machine Learning Research, 18(151):1–52, 2018.
  32. Methods of conjugate gradients for solving linear systems, volume 49. NBS Washington, DC, 1952.
  33. Bayesian functional ANOVA modeling using gaussian process prior distributions. Bayesian Analysis, 5(1):123–149, 2010.
  34. V. Koltchinskii and M. Yuan. Sparsity in multiple kernel learning. The Annals of Statistics, 38(6):3660 – 3695, 2010.
  35. C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255–282, 1950.
  36. A. R. Linero. Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association, 113(522):626–636, 2018.
  37. High-dimensional additive gaussian processes under monotonicity constraints. Advances in Neural Information Processing Systems, 35:8041–8053, 2022.
  38. X. Lu and R. E. McCulloch. Gaussian processes correlated bayesian additive regression trees. arXiv preprint arXiv:2311.18699, 2023.
  39. Additive gaussian processes revisited. In International Conference on Machine Learning, pages 14358–14383. PMLR, 2022.
  40. Sparse additive gaussian process regression. Journal of Machine Learning Research, 23(61):1–34, 2022.
  41. Z. Luo. Backfitting in smoothing spline anova. The Annals of Statistics, 26(5):1733–1759, 1998.
  42. When are iterative gaussian processes reliably accurate? arXiv preprint arXiv:2112.15246, 2021.
  43. E. Mammen and B. U. Park. A simple smooth backfitting method for additive models. The Annals of Statistics, 34(5):2252–2271, 2006.
  44. The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The Annals of Statistics, 27(5):1443–1490, 1999.
  45. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1):55–61, 2000.
  46. J. D. Opsomer. Asymptotic properties of backfitting estimators. Journal of Multivariate Analysis, 73(2):166–179, 2000.
  47. V. I. Paulsen and M. Raghupathi. An introduction to the theory of reproducing kernel Hilbert spaces, volume 152. Cambridge university press, 2016.
  48. J. Quinonero-Candela and C. E. Rasmussen. A unifying view of sparse approximate gaussian process regression. The Journal of Machine Learning Research, 6:1939–1959, 2005.
  49. Y. Saad. Iterative methods for sparse linear systems. SIAM, 2003.
  50. Y. Saatçi. Scalable inference for structured Gaussian process models. PhD thesis, Citeseer, 2012.
  51. V. Sadhanala and R. J. Tibshirani. Additive models with trend filtering. The Annals of Statistics, 47(6):3032–3068, 2019.
  52. Fast forward selection to speed up sparse gaussian process regression. In International Workshop on Artificial Intelligence and Statistics, pages 254–261. PMLR, 2003.
  53. E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. Advances in neural information processing systems, 18, 2005.
  54. I. Steinwart and A. Christmann. Support vector machines. Springer Science & Business Media, 2008.
  55. G. Strang. Computational science and engineering. Optimization, 551(563):571–586, 2007.
  56. O. Tatebe. The multigrid preconditioned conjugate gradient method. In NASA. Langley Research Center, The Sixth Copper Mountain Conference on Multigrid Methods, Part 2, 1993.
  57. M. Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial intelligence and statistics, pages 567–574. PMLR, 2009.
  58. F. I. Utreras. Convergence rates for multivariate smoothing spline functions. Journal of approximation theory, 52(1):1–27, 1988.
  59. Exact gaussian processes on a million data points. Advances in neural information processing systems, 32, 2019.
  60. H. Wendland. Scattered Data Approximation. Cambridge university press, Cambridge, United Kingdom, 2004.
  61. A. Wilson and H. Nickisch. Kernel interpolation for scalable structured gaussian processes (kiss-gp). In International conference on machine learning, pages 1775–1784. PMLR, 2015.
  62. Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository, 1995. DOI: https://doi.org/10.24432/C5DW2B.
  63. Large-scale gaussian processes via alternating projection. arXiv preprint arXiv:2310.17137, 2023.
  64. Smooth backfitting in generalized additive models. The Annals of Statistics, 36(1):228–260, 2008.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 4 likes about this paper.