Understanding Aggregations of Proper Learners in Multiclass Classification
Abstract: Multiclass learnability is known to exhibit a properness barrier: there are learnable classes which cannot be learned by any proper learner. Binary classification faces no such barrier for learnability, but a similar one for optimal learning, which can in general only be achieved by improper learners. Fortunately, recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners, some of which are strikingly simple. This raises a natural question: to what extent can simple aggregations of proper learners overcome the properness barrier in multiclass classification? We give a positive answer to this question for classes which have finite Graph dimension, $d_G$. Namely, we demonstrate that the optimal binary learners of Hanneke, Larsen, and Aden-Ali et al. (appropriately generalized to the multiclass setting) achieve sample complexity $O\left(\frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$. This forms a strict improvement upon the sample complexity of ERM. We complement this with a lower bound demonstrating that for certain classes of Graph dimension $d_G$, majorities of ERM learners require $\Omega \left( \frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$ samples. Furthermore, we show that a single ERM requires $\Omega \left(\frac{d_G \ln(1 / \epsilon) + \ln(1 / \delta)}{\epsilon}\right)$ samples on such classes, exceeding the lower bound of Daniely et al. (2015) by a factor of $\ln(1 / \epsilon)$. For multiclass learning in full generality -- i.e., for classes of finite DS dimension but possibly infinite Graph dimension -- we give a strong refutation to these learning strategies, by exhibiting a learnable class which cannot be learned to constant error by any aggregation of a finite number of proper learners.
- Optimal pac bounds without uniform convergence. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1203–1223. IEEE, 2023.
- Majority-of-three: The simplest optimal learner? In The Thirty Seventh Annual Conference on Learning Theory, pages 22–45. PMLR, 2024.
- Open problem: Can local regularization learn all multiclass problems? In Shipra Agrawal and Aaron Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory, volume 247 of Proceedings of Machine Learning Research, pages 5301–5305. PMLR, 30 Jun–03 Jul 2024a. URL https://proceedings.mlr.press/v247/asilis24b.html.
- Regularization and optimal multiclass learning. In The Thirty Seventh Annual Conference on Learning Theory, pages 260–310. PMLR, 2024b.
- A new PAC bound for intersection-closed concept classes. Mach. Learn., 66(2-3):151–163, 2007. doi: 10.1007/S10994-006-8638-3. URL https://doi.org/10.1007/s10994-006-8638-3.
- Characterizations of learnability for classes of {{\{{O,…, n}}\}}-valued functions. In Proceedings of the fifth annual workshop on Computational learning theory, pages 333–340, 1992.
- Characterizations of learnability for classes of {0, …, n}-valued functions. J. Comput. Syst. Sci., 50(1):74–86, 1995. doi: 10.1006/JCSS.1995.1008. URL https://doi.org/10.1006/jcss.1995.1008.
- Characterizations of learnability for classes of {{\{{0,…, n}n\}italic_n }-valued functions. Journal of Computer and System Sciences, 50(1):74–86, 1995.
- Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989a.
- Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989b.
- Proper learning, helly number, and an optimal svm bound. In Conference on Learning Theory, pages 582–609. PMLR, 2020.
- Leo Breiman. Bagging predictors. Machine learning, 24:123–140, 1996.
- Multiclass boosting and the cost of weak learning. Advances in Neural Information Processing Systems, 34:3057–3067, 2021.
- A characterization of multiclass learnability. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 943–955. IEEE, 2022.
- Improper multiclass boosting. In The Thirty Sixth Annual Conference on Learning Theory, pages 5433–5452. PMLR, 2023.
- Multiclass boosting: simple and intuitive weak learning criteria. Advances in Neural Information Processing Systems, 36, 2024.
- Optimal learners for multiclass problems. In Conference on Learning Theory, pages 287–316. PMLR, 2014.
- Multiclass learning approaches: A theoretical comparison with implications. Advances in Neural Information Processing Systems, 25, 2012.
- Multiclass learnability and the ERM principle. J. Mach. Learn. Res., 16:2377–2404, 2015. doi: 10.5555/2789272.2912074. URL https://dl.acm.org/doi/10.5555/2789272.2912074.
- Steve Hanneke. The optimal sample complexity of pac learning. J. Mach. Learn. Res., 17(1):1319–1333, jan 2016. ISSN 1532-4435.
- A generalization of sauer’s lemma. Journal of Combinatorial Theory, Series A, 71(2):219–240, 1995.
- Predicting {{\{{0, 1}}\}}-functions on randomly drawn points. Information and Computation, 115(2):248–292, 1994.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Kasper Green Larsen. Bagging is an optimal PAC learner. In Gergely Neu and Lorenzo Rosasco, editors, The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 of Proceedings of Machine Learning Research, pages 450–468. PMLR, 2023. URL https://proceedings.mlr.press/v195/larsen23a.html.
- B. K. Natarajan. On learning sets and functions. Mach. Learn., 4:67–97, 1989a. doi: 10.1007/BF00114804. URL https://doi.org/10.1007/BF00114804.
- Balas K Natarajan. On learning sets and functions. Machine Learning, 4:67–97, 1989b.
- Two new frameworks for learning. In Machine Learning Proceedings 1988, pages 402–415. Elsevier, 1988.
- David Pollard. Convergence of stochastic processes. Springer Science & Business Media, 2012.
- Shifting, one-inclusion mistake bounds and tight multiclass expected risk bounds. Advances in Neural Information Processing Systems, 19, 2006.
- Multiclass boosting: Theory and algorithms. Advances in neural information processing systems, 24, 2011.
- Boosting: Foundations and algorithms. adaptive computation and machine learning, 2012.
- Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- Hans U Simon. An almost optimal pac algorithm. In Conference on Learning Theory, pages 1552–1563. PMLR, 2015.
- Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
- V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.
- Vladimir Vapnik. Estimation of dependences based on empirical data: Springer series in statistics (springer series in statistics), 1982.
- Theory of pattern recognition, 1974.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.