Papers
Topics
Authors
Recent
Search
2000 character limit reached

Local Function Complexity for Active Learning via Mixture of Gaussian Processes

Published 27 Feb 2019 in cs.LG, cs.NA, math.NA, and stat.ML | (1902.10664v6)

Abstract: Inhomogeneities in real-world data, e.g., due to changes in the observation noise level or variations in the structural complexity of the source function, pose a unique set of challenges for statistical inference. Accounting for them can greatly improve predictive power when physical resources or computation time is limited. In this paper, we draw on recent theoretical results on the estimation of local function complexity (LFC), derived from the domain of local polynomial smoothing (LPS), to establish a notion of local structural complexity, which is used to develop a model-agnostic active learning (AL) framework. Due to its reliance on pointwise estimates, the LPS model class is not robust and scalable concerning large input space dimensions that typically come along with real-world problems. Here, we derive and estimate the Gaussian process regression (GPR)-based analog of the LPS-based LFC and use it as a substitute in the above framework to make it robust and scalable. We assess the effectiveness of our LFC estimate in an AL application on a prototypical low-dimensional synthetic dataset, before taking on the challenging real-world task of reconstructing a quantum chemical force field for a small organic molecule and demonstrating state-of-the-art performance with a significantly reduced training demand.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (119)
  1. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, pp.  1027–1035. Society for Industrial and Applied Mathematics, 2007.
  2. Hierarchical approach for multiscale support vector regression. IEEE Transactions on Neural Networks and Learning Systems, 23(9), 2012.
  3. The power of ensembles for active learning in image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  9368–9377, 2018.
  4. Width optimization of the gaussian kernels in radial basis function networks. In ESANN, volume 2, pp.  425–432, 2002.
  5. Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning. J. Chem. Phys., 148(24):241706, 2018.
  6. A survey on high-dimensional gaussian process modeling with application to bayesian optimization. ACM Transactions on Evolutionary Learning and Optimization, 2(2):1–26, 2022.
  7. Ab initio molecular simulations with numeric atom-centered orbitals. Computer Physics Communications, 180(11):2175–2196, 2009.
  8. Breast cancer diagnosis through active learning in content-based image retrieval. Neurocomputing, 357:1–10, 2019.
  9. Machine learning for fluid mechanics. Annual review of fluid mechanics, 52:477–508, 2020.
  10. Adam D Bull et al. Spatially-adaptive sensing in nonparametric regression. The Annals of Statistics, 41(1):41–62, 2013.
  11. Active learning for regression based on query by committee. In International conference on intelligent data engineering and automated learning, pp.  209–218. Springer, 2007.
  12. Maximizing expected model change for active learning in regression. In 2013 IEEE 13th international conference on data mining, pp.  51–60. IEEE, 2013.
  13. Estimating predictive variances with kernel ridge regression. In Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, pp.  56–77, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-33428-6.
  14. A precise hard-cut em algorithm for mixtures of gaussian processes. In International Conference on Intelligent Computing, pp. 68–75. Springer, 2014.
  15. Machine learning of accurate energy-conserving molecular force fields. Science Advances, 3(5):e1603015, 2017.
  16. Towards exact molecular dynamics simulations with machine-learned force fields. Nature Communications, 9(1):3887, 2018. doi: 10.1038/s41467-018-06169-2.
  17. sgdml: Constructing accurate and data efficient molecular force fields using machine learning. Computer Physics Communications, 240:38–45, 2019. doi: 10.1016/j.cpc.2019.02.007.
  18. Accurate global machine learning force fields for molecules with hundreds of atoms. Science Advances, 9:eadf0873, 2023. doi: 10.1126/sciadv.adf0873.
  19. Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American statistical association, 83(403):596–610, 1988.
  20. David Cohn. Neural network exploration using optimal experiment design. Advances in Neural Information Processing Systems, 6:679–686, 1994.
  21. Deep gaussian processes. In Artificial intelligence and statistics, pp.  207–215. PMLR, 2013.
  22. Ideal spatial adaptation by wavelet shrinkage. biometrika, 81(3):425–455, 1994.
  23. Kernel ridge regression with active learning for wind speed prediction. Applied Energy, 103(0):328 – 340, 2013.
  24. On the nyström method for approximating a gram matrix for improved kernel-based learning. journal of machine learning research, 6(12), 2005.
  25. Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific reports, 7(1):12140, 2017.
  26. Local polynomial regression: optimal kernels and asymptotic minimax efficiency. Annals of the Institute of Statistical Mathematics, 49(1):79–99, 1997.
  27. Multi-scale support vector regression. In Neural Networks (IJCNN), The 2010 International Joint Conference on, pp.  1–7. IEEE, 2010.
  28. Efficient svm training using low-rank kernel representations. Journal of Machine Learning Research, 2(Dec):243–264, 2001.
  29. Data clustering: theory, algorithms, and applications. SIAM, 2020.
  30. Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. In Advances in neural information processing systems, pp. 7576–7586, 2018.
  31. Active learning for non-parametric regression using purely random trees. In Advances in Neural Information Processing Systems, pp. 2537–2546, 2018.
  32. A novel active learning method using svm for text classification. International Journal of Automation and Computing, 15(3):290–298, 2018.
  33. Foundations of quantization for probability distributions. Springer, 2007.
  34. Local gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2):561–578, 2015.
  35. Robert B Gramacy and Herbert K H Lee. Bayesian treed gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103(483):1119–1130, 2008.
  36. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys., 151(20):204105, 2019.
  37. Machine learning of molecular properties: Locality and active learning. The Journal of chemical physics, 148(24):241727, 2018.
  38. Kernel basis pursuit. In European Conference on Machine Learning, pp.  146–157. Springer, 2005.
  39. A distribution-free theory of nonparametric regression, volume 1. Springer, 2002.
  40. Stein variational gradient descent without gradient. In International Conference on Machine Learning, pp. 1900–1908. PMLR, 2018.
  41. Active learning with convolutional neural networks for hyperspectral image classification using a new bayesian approach. IEEE Transactions on Geoscience and Remote Sensing, 56(11):6440–6461, 2018.
  42. Xiaofei He. Laplacian regularized d-optimal design for active learning and its application to image retrieval. Image Processing, IEEE Transactions on, 19(1):254–263, 2010.
  43. Scalable variational gaussian process classification. In Artificial Intelligence and Statistics, pp.  351–360. PMLR, 2015.
  44. Bing Huang and O Anatole von Lilienfeld. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nature Chemistry, 12(10):945–951, 2020.
  45. Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991.
  46. Hierarchical mixtures of experts and the em algorithm. Neural computation, 6(2):181–214, 1994.
  47. Alexander Jung. Machine learning: The basics. In Machine Learning: Foundations, Methodologies, and Applications. Springer, 2022.
  48. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chem. Rev., 121(16):9816–9872, 2021.
  49. Most likely heteroscedastic gaussian process regression. In International Conference on Machine Learning, 2007.
  50. Jack Kiefer. Optimum experimental designs. Journal of the Royal Statistical Society. Series B (Methodological), pp.  272–319, 1959.
  51. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  52. A fourth-generation high-dimensional neural network potential with accurate electrostatics including non-local charge transfer. Nat. Commun., 12(1):398, 2021.
  53. Distributional clustering: A distribution-preserving clustering method. arXiv preprint arXiv:1911.05940, 2019.
  54. Sampling methods for the nyström method. The Journal of Machine Learning Research, 13(1):981–1006, 2012.
  55. Oleg V Lepski. On a problem of adaptive estimation in gaussian white noise. Theory of Probability & Its Applications, 35(3):454–466, 1991.
  56. Optimal pointwise adaptive methods in nonparametric estimation. The Annals of Statistics, pp.  2512–2546, 1997.
  57. Toward explainable artificial intelligence for regression models: A methodological perspective. IEEE Signal Processing Magazine, 39(4):40–58, 2022.
  58. A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 3–12. Springer-Verlag New York, Inc., 1994.
  59. Stein variational gradient descent: A general purpose bayesian inference algorithm. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper/2016/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf.
  60. Pool-based unsupervised active learning for regression using iterative representativeness-diversity maximization (irdm). Pattern Recognition Letters, 142:11–19, 2021.
  61. David J. C. MacKay. Information-based objective functions for active data selection. Neural computation, 4(4):590–604, 1992.
  62. Asymmetric kernel regression. IEEE transactions on neural networks, 15(2):276–282, 2004.
  63. Elias Masry. Multivariate local polynomial regression for time series: uniform strong consistency and rates. Journal of Time Series Analysis, 17(6):571–599, 1996.
  64. Elias Masry. Multivariate regression estimation: local polynomial fitting for time series. Nonlinear Analysis: Theory, Methods & Applications, 30(6):3575–3581, 1997.
  65. An alternative infinite mixture of gaussian process experts. In Advances in Neural Information Processing Systems, pp. 883–890, 2006.
  66. Fast learning in networks of locally-tuned processing units. Neural computation, 1(2):281–294, 1989.
  67. Inducing point allocation for sparse gaussian processes in high-throughput bayesian optimisation. In International Conference on Artificial Intelligence and Statistics, pp.  5213–5230. PMLR, 2023.
  68. Harald Niederreiter. Low-discrepancy and low-dispersion sequences. Journal of number theory, 30(1):51–70, 1988.
  69. Machine learning for molecular simulation. Annual review of physical chemistry, 71:361–390, 2020.
  70. Optimal sampling density for nonparametric regression. arXiv preprint arXiv:2105.11990, 2021.
  71. E. Pasolli and F. Melgani. Active learning methods for electrocardiographic signal classification. Information Technology in Biomedicine, IEEE Transactions on, 14(6):1405–1416, 11 2010. ISSN 1089-7771. doi: 10.1109/TITB.2010.2048922.
  72. Gaussian process regression within an active learning scheme. In 2011 IEEE International Geoscience and Remote Sensing Symposium, pp.  3574–3577. IEEE, 2011.
  73. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems, pp. 8026–8037, 2019.
  74. Annealed competition of experts for a segmentation and classification of switching dynamics. Neural Computation, 8(2):340–356, 1996.
  75. Generalized gradient approximation made simple. Physical review letters, 77(18):3865–3868, 1996.
  76. Fast matrix square roots with applications to gaussian processes and bayesian optimization. arXiv preprint arXiv:2006.11267, 2020.
  77. Hyperpriors for matérn fields with applications in bayesian inversion. Inverse Problems and Imaging, 13(1):1–29, 2019. ISSN 1930-8337. doi: 10.3934/ipi.2019001. URL https://www.aimsciences.org/article/id/d17bde6b-3e5f-438d-af0a-b712cf433748.
  78. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning, pp.  441–448. Morgan Kaufmann Publishers Inc., 2001.
  79. Nested kriging predictions for datasets with a large number of observations. Statistics and Computing, 28:849–867, 2018.
  80. Robust active learning for the diagnosis of parasites. Pattern Recognition, 48(11):3572–3583, 2015.
  81. Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE, 109(3):247–278, 2021.
  82. Construction of machine learned force fields with quantum chemical accuracy: Applications and chemical insights. In Machine Learning Meets Quantum Physics, pp.  277–307. Springer, 2020.
  83. Vecchia-approximated deep gaussian processes for computer experiments. Journal of Computational and Graphical Statistics, 32(3):824–837, 2023a.
  84. Active learning for deep gaussian process surrogates. Technometrics, 65(1):4–18, 2023b.
  85. Safe exploration for active learning with gaussian processes. In Joint European conference on machine learning and knowledge discovery in databases, pp.  133–149. Springer, 2015.
  86. Fast forward selection to speed up sparse gaussian process regression. In International Workshop on Artificial Intelligence and Statistics, pp.  254–261. PMLR, 2003.
  87. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
  88. Gaussian process regression: Active data selection and test point rejection. In Mustererkennung 2000, pp.  27–34. Springer, 2000.
  89. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory, pp.  287–294. ACM, 1992.
  90. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
  91. Bernard W Silverman. Density estimation for statistics and data analysis, volume 26 of Monographs on Statistics & Applied Probability. Chapman & Hall/CRC Press, 1986.
  92. AJ. Smola and B. Schölkopf. Sparse greedy matrix approximation for machine learning. In Proceedings of the 17th International Conference on Machine Learning, pp.  911–918, San Fransisco, CA, USA, 2000. Max-Planck-Gesellschaft, Morgan Kaufman.
  93. Sparse gaussian processes using pseudo-inputs. Advances in neural information processing systems, 18, 2005.
  94. Pool-based active learning in approximate linear regression. Machine Learning, 75(3):249–274, 2009.
  95. Prediction of atomization energy using graph kernel and active learning. The Journal of chemical physics, 150(4):044107, 2019.
  96. Active learning in regression, with an application to stochastic dynamic programming. In ICINCO 2007, 2007.
  97. Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial intelligence and statistics, pp.  567–574. PMLR, 2009.
  98. Accurate molecular van der waals interactions from ground-state electron density and free-atom reference data. Physical review letters, 102(7):073005, 2009.
  99. Support vector machine active learning for image retrieval. In Proceedings of the ninth ACM international conference on Multimedia, pp.  107–118. ACM, 2001.
  100. Volker Tresp. Mixtures of gaussian processes. In Advances in neural information processing systems, pp. 654–660, 2001.
  101. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun., 12:7273, 2021a.
  102. Machine learning force fields. Chem. Rev., 121(16):10142–10186, 2021b.
  103. Bayesian inference with rescaled gaussian process priors. Electronic Journal of Statistics, 1:433–448, 2007.
  104. Adaptive bayesian estimation using a gaussian random field with inverse gamma bandwidth. The Annals of Statistics, 37(5B):2655–2675, 2009.
  105. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem., 4:347–358, 2020.
  106. Kernel Smoothing, volume 60 of Monographs on Statistics & Applied Probability. Chapman & Hall/CRC Press, 1994.
  107. Bandwidth Selection for Weighted Kernel Density Estimation. arXiv e-prints, art. arXiv:0709.1616, 09 2007.
  108. Active learning with support vector machines in the drug discovery process. Journal of chemical information and computer sciences, 43(2):667–673, 2003.
  109. Faster rates in regression via active learning. Advances in Neural Information Processing Systems, 18, 2005.
  110. Gaussian processes for regression. In Advances in neural information processing systems, pp. 514–520, 1996.
  111. Dongrui Wu. Pool-based sequential active learning for regression. IEEE transactions on neural networks and learning systems, 30(5):1348–1359, 2019.
  112. Active learning approach to optimization of experimental control. Chinese Physics Letters, 37(10):103201, 2020.
  113. An efficient em approach to parameter learning of the mixture of gaussian processes. In International Symposium on Neural Networks, pp.  165–174. Springer, 2011.
  114. The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chem. Sci., 9(8):2261–2269, 2018.
  115. Passive sampling for regression. In 2010 IEEE International Conference on Data Mining, pp. 1151–1156. IEEE, 2010.
  116. Variational mixture of gaussian process experts. In Advances in Neural Information Processing Systems, pp. 1897–1904, 2009.
  117. Active learning for gaussian process considering uncertainties with application to shape control of composite fuselage. IEEE Transactions on Automation Science and Engineering, 18(1):36–46, 2020.
  118. Improved nyström low-rank approximation and error analysis. In Proceedings of the 25th international conference on Machine learning, pp.  1232–1239, 2008.
  119. Non-flat function estimation with a multi-scale support vector regression. Neurocomputing, 70(1):420–429, 2006.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.