Papers
Topics
Authors
Recent
Search
2000 character limit reached

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

Published 3 Jul 2023 in math.OC, cs.LG, and stat.ML | (2307.01169v1)

Abstract: We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension $n$. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require $O(n2)$ time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Amir Beck. The 2-coordinate descent method for solving double-sided simplex constrained minimization problems. J. Optim. Theory Appl., 162:892–919, 2014.
  2. Dimitri P Bertsekas. Network optimization: continuous and discrete models. Athena Scientific, 1998.
  3. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  4. Efficient projections onto the l 1-ball for learning in high dimensions. In International Conference on Machine Learning, pages 272–279, 2008.
  5. Working set selection using second order information for training support vector machines. J. Mach. Learn. Res., 6, 2005.
  6. Faster convergence of a randomized coordinate descent method for linearly constrained optimization problems. Analysis and Applications, 16:741–755, 2018.
  7. A comparative study on large scale kernelized support vector machines. Adv. Data Anal. Classi., 12:867–883, 2018.
  8. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Joint European conference on machine learning and knowledge discovery in databases, pages 795–811, 2016.
  9. Efficient greedy coordinate descent for composite problems. In International Conference on Artificial Intelligence and Statistics, pages 2887–2896, 2019.
  10. Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems. In International Conference on Artificial Intelligence and Statistics, pages 1204–1213, 2018.
  11. Improvements to platt’s smo algorithm for svm classifier design. Neural Computation, 13:637–649, 2001.
  12. On the global linear convergence of frank-wolfe optimization variants. Advances in Neural Information Processing Systems, 28, 2015.
  13. I. Necoara and A. Patrascu. A random coordinate descent algorithm for optimization problems with composite objection function and linear coupled constraints. Comput. Optim. Appl., pages 307–337, 2014.
  14. A random coordinate descent method on large optimization problems with linear constraints. Technical Report, University Politehnica Bucharest, 2011.
  15. Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl., 173:227–254, 2017.
  16. Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim., 22:341–362, 2012.
  17. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In International Conference on Machine Learning, pages 1632–1641, 2015.
  18. Julie Nutini. Greed is good: greedy optimization methods for large-scale structured problems. PhD thesis, University of British Columbia, 2018.
  19. Let’s make block coordinate descent converge faster: faster greedy rules, message-passing, active-set complexity, and superlinear convergence. J. Mach. Learn. Res., 23:1–74, 2022.
  20. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program., 144:1–38, 2014.
  21. Linear convergence and support vector identification of sequential minimal optimization. In NeurIPS Workshop on Optimization for Machine Learning, volume 5, page 50, 2017.
  22. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19:2246–2253, 2003.
  23. Accelerated stochastic greedy coordinate descent by soft thresholding projection onto simplex. Advances in Neural Information Processing Systems, 2017.
  24. Are we there yet? manifold identification of gradient-related proximal methods. In International Conference on Artificial Intelligence and Statistics, pages 1110–1119, 2019.
  25. P. Tseng and S. Yun. Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl., pages 513–535, 2009.
  26. Group sparsity via linear-time projection, 2008. Tech. Rep. TR-2008-09, Dept. of Computer Science, UBC.
  27. Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res., 15:1523–1548, 2014.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.