How to Boost Any Loss Function
Abstract: Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully defining it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0{th}$) order information to learn, this begs the question: what loss functions can be efficiently optimized with boosting and what is the information really needed for boosting to meet the original boosting blueprint's requirements? We provide a constructive formal answer essentially showing that any loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0{th}$ order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz -- and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field -- not to be confounded with quantum computation -- that studies calculus without passing to the limit, and thus without using first order information.
- A gradient estimator via l1-randomization for online zero-order optimization with two point feedback. In NeurIPS*35, 2022.
- Exploiting higher order smoothness in derivative-free optimization and continuous bandits. In NeurIPS*33, 2020.
- Distributed zero-order optimisation under adversarial noise. In NeurIPS*34, 2021.
- Boosting simple learners. In STOC’21, 2021.
- S.-I. Amari and H. Nagaoka. Methods of Information Geometry. Oxford University Press, 2000.
- F. Bach. Learning Theory from First Principles. Course notes, MIT press (to appear), 2023.
- Clustering with bregman divergences. In Proc. of the 4thsuperscript4𝑡ℎ4^{th}4 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT SIAM International Conference on Data Mining, pages 234–245, 2004.
- Convexity, classification, and risk bounds. J. of the Am. Stat. Assoc., 101:138–156, 2006.
- P.-L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR, 3:463–482, 2002.
- Accelerated gradient boosting. Mach. Learn., 108(6):971–992, 2019.
- Learning with Fenchel-Young losses. J. Mach. Learn. Res., 21:35:1–35:69, 2020.
- L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. and Math. Phys., 7:200–217, 1967.
- S. Bubeck. Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn., 8(3-4):231–357, 2015.
- P.-S. Bullen. Handbook of means and their inequalities. Kluwer Academic Publishers, 2003.
- A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization. In 38th ICML, pages 1193–1203, 2021.
- C. Cartis and L. Roberts. Scalable subspace methods for derivative-free nonlinear least-squares optimization. Math. Prog., 199:461–524, 2023.
- Non-convex boosting overcomes random label noise. CoRR, abs/1409.2905, 2014.
- Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 5219–5233. PMLR, 2023.
- ZO-AdaMM: Zeroth-order adaptive momentum method for black-box optimization. In NeurIPS*32, 2019.
- Improve single-point zeroth-order optimization using high-pass and low-pass filters. In 39th ICML, volume 162 of Proceedings of Machine Learning Research, pages 3603–3620. PMLR, 2022.
- On the convergence of prior-guided zeroth-order optimisation algorithms. In NeurIPS*34, 2021.
- Z. Cranko and R. Nock. Boosted density estimation remastered. In 36th ICML, pages 1416–1425, 2019.
- Zeroth-order hard-thresholding: Gradient error vs. expansivity. In NeurIPS*35, 2022.
- D. Dua and C. Graff. UCI machine learning repository, 2021.
- E. Fermi and N. Metropolis. Numerical solutions of a minimum problem. Technical Report TR LA-1492, Los Alamos Scientific Laboratory of the University of California, 1952.
- Efficiently avoiding saddle points with zero order methods: No gradients required. In NeurIPS*32, 2019.
- H. Gao and H. Huang. Can stochastic zeroth-order frank-wolfe method converge faster for non-convex problems? In 37th ICML, pages 3377–3386, 2020.
- Zeroth-order non-convex learning via hierarchical dual averaging. In 38th ICML, pages 4192–4202, 2021.
- Accelerated stochastic gradient-free and projection-free methods. In 37th ICML, pages 4519–4530, 2020.
- Neural network accelerated implicit filtering: Integrating neural network surrogates with provably convergent derivative free optimization methods. In 40th ICML, volume 202 of Proceedings of Machine Learning Research, pages 14376–14389. PMLR, 2023.
- V. Kac and P. Cheung. Quantum calculus. Springer, 2002.
- An Introduction to Computational Learning Theory. M.I.T. Press, 1994.
- M.J. Kearns. Thoughts on hypothesis boosting, 1988. ML class project.
- M.J. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. J. Comp. Syst. Sc., 58:109–128, 1999.
- Derivative-free optimization methods. Acta Numerica, pages 287–404, 2019.
- Zeroth-order optimization for composite problems with functional constraints. In AAAI’22, pages 7453–7461. AAAI Press, 2022.
- Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization. In NeurIPS*35, 2022.
- Random classification noise defeats all convex potential boosters. MLJ, 78(3):287–304, 2010.
- Zeroth-order methods for convex-concave minmax problems: applications to decision-dependent risk minimization. In 25th AISTATS, 2022.
- Random classification noise does not defeat all convex potential boosters irrespective of model choice. In 40th ICML, 2023.
- E. Mhanna and M. Assaad. Single point-based distributed zeroth-order optimization with a non-convex stochastic objective function. In 40th ICML, volume 202 of Proceedings of Machine Learning Research, pages 24701–24719. PMLR, 2023.
- Foundations of Machine Learning. MIT Press, 2018.
- Y. Nesterov and V. Spokoiny. Random gradient-free optimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
- F. Nielsen and R. Nock. The Bregman chord divergence. In Geometric Science of Information - 4th International Conference, 2019, pages 299–308, 2019.
- R. Nock and A. K. Menon. Supervised learning: No loss no cry. In 37th ICML, 2020.
- R. Nock and F. Nielsen. Bregman divergences and surrogates for learning. IEEE Trans.PAMI, 31:2048–2059, 2009.
- R. Nock and R.-C. Williamson. Lossless or quantized boosting with integer arithmetic. In 36th ICML, pages 4829–4838, 2019.
- IPBoost - non-convex boosting via integer programming. In 37th ICML, volume 119, pages 7663–7672, 2020.
- Zeroth-order methods for nondifferentiable, nonconvex and hierarchical federated optimization. In NeurIPS*36, 2023.
- Structured zeroth-order for non-smooth optimization. In NeurIPS*36, 2023.
- Composite binary losses. JMLR, 11:2387–2422, 2010.
- Information, divergence and risk for binary experiments. JMLR, 12:731–817, 2011.
- Escaping saddle points in zeroth-order optimization: the power of two-point estimators. In 40th ICML, volume 202 of Proceedings of Machine Learning Research, pages 28914–28975. PMLR, 2023.
- Towards gradient free and projection free stochastic optimization. In 22nd AISTATS, pages 3468–3477, 2019.
- Gradient-free method for heavily constrained nonconvex optimization. In 39th ICML, volume 162 of Proceedings of Machine Learning Research, pages 19935–19955. PMLR, 2022.
- Tutorial: Survey of boosting from an optimization perspective. In 26th ICML, 2009.
- T. Werner and P. Ruckdeschel. The column measure and gradient-free gradient boosting, 2019.
- H. Zhang and B. Gu. Faster gradient-free methods for escaping saddle points. In ICLR’23. OpenReview.net, 2023.
- Zeroth-order negative curvature finding: Escaping saddle points without gradients. In NeurIPS*35, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.