Papers
Topics
Authors
Recent
Search
2000 character limit reached

Agnostic Active Learning of Single Index Models with Linear Sample Complexity

Published 15 May 2024 in cs.LG | (2405.09312v3)

Abstract: We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent ${O}(d{2})$ bound of \cite{gajjar2023active}. Second, we show that $\tilde{O}(d)$ samples suffice even in the more difficult setting when $f$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. The merged-staircase property: a necessary and nearly sufficient condition for sgd learning of sparse functions on two-layer neural networks. In Proceedings of the \nth35 Annual Conference on Computational Learning Theory (COLT), pages 4782–4887, 2022.
  2. CS4ML: A general framework for active learning with arbitrary data based on Christoffel functions. In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023.
  3. A fast and scalable method for a-optimal design of experiments for infinite-dimensional bayesian nonlinear inverse problems. SIAM Journal on Scientific Computing, 38(1):A243–A272, 2016.
  4. A universal sampling method for reconstructing signals with simple fourier transforms. In Proceedings of the \nth51 Annual ACM Symposium on Theory of Computing (STOC), pages 1051–1063, 2019.
  5. Agnostic active learning. In Proceedings of the \nth23 International Conference on Machine Learning (ICML), pages 65–72, 2006.
  6. Model reduction and neural networks for parametric PDEs. The SMAI Journal of Computational Mathematics, 7:121–157, 2021.
  7. Learning single-index models with shallow neural networks. In Advances in Neural Information Processing Systems 35 (NeurIPS), pages 9768–9783, 2022.
  8. Nonlinear dimension reduction for surrogate modeling using gradient information. Information and Inference: A Journal of the IMA, 11(4):1597–1639, 2022.
  9. CS4ML: A general framework for active learning with arbitrary data based on christoffel functions. In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023.
  10. Towards understanding hierarchical learning: Benefits of neural representations. In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.
  11. Query complexity of least absolute deviation regression via robust uniform convergence. In Proceedings of the \nth34 Annual Conference on Computational Learning Theory (COLT), pages 1144–1179, 2021.
  12. Active regression via linear-sample sparsification. In Proceedings of the \nth32 Annual Conference on Computational Learning Theory (COLT), pages 663–695, 2019.
  13. Approximation of high-dimensional parametric pdes. Acta Numerica, 24:1–159, 2015.
  14. Optimal weighted least-squares methods. SMAI Journal of Computational Mathematics, 3:181–203, 2017.
  15. Capturing ridge functions in high dimensions from point queries. Constructive Approximation, 35(2):225–243, 2011.
  16. On the stability and accuracy of least squares approximations. Foundations of Computational Mathematics, 13(5):819–834, 2013.
  17. A new algorithm for estimating the effective dimension-reduction subspace. The Journal of Machine Learning Research, 9:1647–1678, 2008.
  18. Smoothing the landscape boosts the signal for SGD: Optimal sample complexity for learning single index models. In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023.
  19. Neural networks can learn representations with gradient descent. In Proceedings of the \nth35 Annual Conference on Computational Learning Theory (COLT), pages 5413–5452, 2022.
  20. Leveraged volume sampling for linear regression. In Advances in Neural Information Processing Systems 31 (NeurIPS), 2018.
  21. Approximation schemes for relu regression. In Proceedings of the \nth33 Annual Conference on Computational Learning Theory (COLT), 2020.
  22. Learning a single neuron with adversarial label noise via gradient descent. In Proceedings of the \nth35 Annual Conference on Computational Learning Theory (COLT), 2022.
  23. Sampling algorithms for ℓ2subscriptℓ2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT regression and applications. In Proceedings of the \nth17 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1127–1136, 2006.
  24. Learning single-index models in gaussian space. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Proceedings of the \nth31 Annual Conference on Computational Learning Theory (COLT), pages 1887–1930, 2018.
  25. Fourier sparse leverage scores and approximate kernel learning. In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.
  26. Learning functions of few arbitrary linear parameters in high dimensions. Foundations of Computational Mathematics, 12:229–262, 2012.
  27. Agnostic learning of a single neuron with gradient descent. In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.
  28. Active learning for single neuron models with lipschitz non-linearities. In Proceedings of the \nth26 International Conference on Artificial Intelligence and Statistics (AISTATS), pages 4101–4113, 2023.
  29. Numerical solution of the parametric diffusion equation by deep neural networks. Journal of Scientific Computing, 88(1):22, 2021.
  30. Reliably learning the ReLU in polynomial time. In Proceedings of the \nth30 Annual Conference on Computational Learning Theory (COLT), 2017.
  31. Agnostically learning single-index models using omnipredictors. In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023.
  32. Coherence motivated sampling and convergence analysis of least squares polynomial chaos regression. Computer Methods in Applied Mechanics and Engineering, 290:73–97, 2015.
  33. Nonparametric and semiparametric models, volume 1. Springer, 2004.
  34. Active learning of multi-index function models. In Advances in Neural Information Processing Systems 25 (NeurIPS), 2012.
  35. Data-driven polynomial ridge approximation using variable projection. SIAM Journal on Scientific Computing, 40(3):A1566–A1589, 2018.
  36. Direct estimation of the index coefficient in a single-index model. Annals of Statistics, pages 595–623, 2001.
  37. One-shot active learning based on lewis weight sampling for multiple deep models. In The Twelfth International Conference on Learning Representations, 2023.
  38. Sparsifying generalized linear models, 2023.
  39. Matti Kääriäinen. Active learning in the non-realizable case. In Proceedings of the \nth17 International Conference on Algorithmic Learning Theory (ALT), pages 63–77, 2006.
  40. Efficient learning of generalized linear and single index models with isotonic regression. Advances in Neural Information Processing Systems 24 (NeurIPS), 24, 2011.
  41. Solving parametric PDE problems with artificial neural networks. European Journal of Applied Mathematics, 32(3):421–435, 2021.
  42. A theoretical analysis of deep neural networks and parametric PDEs. Constructive Approximation, 55(1):73–125, 2022.
  43. Michel Ledoux. The concentration of measure phenomenon. Number 89. American Mathematical Soc., 2001.
  44. Probability in Banach Spaces: isoperimetry and processes, volume 23. Springer Science & Business Media, 1991.
  45. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:3475–3506, 2012.
  46. Michael W Mahoney et al. Randomized algorithms for matrices and data. Foundations and Trends® in Machine Learning, 3(2):123–224, 2011.
  47. Coresets for classification – simplified and strengthened. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021.
  48. Neural networks efficiently learn low-dimensional representations with SGD. In Proceedings of the \nth11 International Conference on Learning Representations (ICLR), 2023.
  49. On coresets for logistic regression. In Advances in Neural Information Processing Systems 31 (NeurIPS), pages 6562–6571, 2018.
  50. Active linear regression for ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT norms and beyond norms and beyond. In Proceedings of the \nth63 Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 744–753, 2022.
  51. Derivative-informed projected neural networks for high-dimensional parametric maps governed by PDEs. Computer Methods in Applied Mechanics and Engineering, 388:114199, 2022.
  52. Friedrich Pukelsheim. Optimal Design of Experiments. Society for Industrial and Applied Mathematics, 2006.
  53. Sparse Legendre expansions via ℓ⁢1ℓ1\ell 1roman_ℓ 1-minimization. Journal of Approximation Theory, 164(5):517 – 533, 2012.
  54. Mark Rudelson. Random vectors in the isotropic position. Journal of Functional Analysis, 164(1):60–72, 1999.
  55. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM (JACM), 54(4):21–es, 2007.
  56. Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In Proceedings of the \nth47 Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 143–152, 2006.
  57. Improved active learning via dependent leverage score sampling. In Proceedings of the \nth12 International Conference on Learning Representations (ICLR), 2024.
  58. Graph sparsification by effective resistances. SIAM Journal on Computing, 40(6):1913–1926, 2011. Preliminary version in the \nth40 Annual ACM Symposium on Theory of Computing (STOC).
  59. Michel Talagrand. Upper and lower bounds for stochastic processes: decomposition theorems. A Series of Modern Surveys in Mathematics. Springer Cham, 2021.
  60. Deep UQ: Learning deep neural network surrogate models for high dimensional uncertainty quantification. Journal of Computational Physics, 375:565–588, 2018.
  61. Joel A. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1–230, 2015.
  62. Learning ridge functions with randomized sampling in high dimensions. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2025–2028, 2012.
  63. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.
  64. David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1–157, 2014.
  65. A fully adaptive algorithm for pure exploration in linear bandits. In Proceedings of the \nth21 International Conference on Artificial Intelligence and Statistics (AISTATS), pages 843–851, 2018.
  66. Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems. Journal of Computational Physics, 397, 2019.
Citations (2)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.