A globalization of L-BFGS and the Barzilai-Borwein method for nonconvex unconstrained optimization
Abstract: We present a modified limited memory BFGS (L-BFGS) method that converges globally and linearly for nonconvex objective functions. Its distinguishing feature is that it turns into L-BFGS if the iterates cluster at a point near which the objective is strongly convex with Lipschitz gradients, thereby inheriting the outstanding effectiveness of the classical method. These strong convergence guarantees are enabled by a novel form of cautious updating, where, among others, it is decided anew in each iteration which of the stored pairs are used for updating and which ones are skipped. In particular, this yields the first modification of cautious updating for which all cluster points are stationary while the spectrum of the L-BFGS operator is not permanently restricted, and this holds without Lipschitz continuity of the gradient. In fact, for Wolfe-Powell line searches we show that continuity of the gradient is sufficient for global convergence, which extends to other descent methods. Since we allow the memory size to be zero in the globalized L-BFGS method, we also obtain a new globalization of the Barzilai-Borwein spectral gradient (BB) method. The convergence analysis is developed in Hilbert space under comparably weak assumptions and covers Armijo and Wolfe-Powell line searches. We illustrate the theoretical findings with numerical experiments. The experiments indicate that if one of the parameters of the cautious updating is chosen sufficiently small, then the modified method agrees entirely with L-BFGS/BB. We also discuss this in the theoretical part. An implementation of the new method is available on arXiv.
- B. Azmi and M. Bernreuther. On the nonmonotone linesearch for a class of infinite-dimensional nonsmooth problems. arXiv preprint, 2023. doi:10.48550/arXiv.2303.01878.
- Damped techniques for the limited memory BFGS method for large-scale optimization. J. Optim. Theory Appl., 161(2):688–699, 2014. doi:10.1007/s10957-013-0448-8.
- A mesh-independence principle for operator equations and their discretizations. SIAM J. Numer. Anal., 23(1):160–169, 1986. doi:10.1137/0723011.
- Faster independent component analysis by preconditioning with hessian approximations. IEEE Trans. Signal Process., 66(15):4040–4049, 2018. doi:10.1109/TSP.2018.2844203.
- B. Azmi and K. Kunisch. Analysis of the Barzilai-Borwein step-sizes for problems in Hilbert spaces. J. Optim. Theory Appl., 185(3):819–844, 2020. doi:10.1007/s10957-020-01677-y.
- B. Azmi and K. Kunisch. On the convergence and mesh-independent property of the Barzilai-Borwein method for PDE-constrained optimization. IMA J. Numer. Anal., 42(4):2984–3021, 2022. doi:10.1093/imanum/drab056.
- N. Andrei. A new accelerated diagonal quasi-Newton updating method with scaled forward finite differences directional derivative for unconstrained optimization. Optimization, 70(2):345–360, 2021. doi:10.1080/02331934.2020.1712391.
- J. Barzilai and J. M. Borwein. Two-point step size gradient methods. IMA J. Numer. Anal., 8(1):141–148, 1988. doi:10.1093/imanum/8.1.141.
- A dense initialization for limited-memory quasi-Newton methods. Comput. Optim. Appl., 74(1):121–142, 2019. doi:10.1007/s10589-019-00112-x.
- Stabilized Barzilai-Borwein method. J. Comput. Math., 37(6):916–936, 2019. doi:10.4208/jcm.1911-m2019-0171.
- Compact representations of structured BFGS matrices. Comput. Optim. Appl., 80(1):55–88, 2021. doi:10.1007/s10589-021-00297-0.
- On efficiently combining limited-memory and trust-region techniques. Math. Program. Comput., 9(1):101–134, 2017. doi:10.1007/s12532-016-0109-7.
- Quasi-Newton methods for machine learning: forget the past, just sample. Optim. Methods Softw., 37(5):1668–1704, 2022. doi:10.1080/10556788.2021.1977806.
- Large-scale optimization with linear equality constraints using reduced compact representation. SIAM J. Sci. Comput., 44(1):a103–a127, 2022. doi:10.1137/21M1393819.
- R. H. Byrd and J. Nocedal. A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal., 26(3):727–739, 1989. doi:10.1137/0726042.
- Representations of quasi-Newton matrices and their use in limited memory methods. Math. Program., 63(2 (A)):129–156, 1994. doi:10.1007/BF01582063.
- S. Cipolla and F. Durastante. Fractional pde constrained optimization: An optimize-then-discretize approach with l-bfgs and approximate inverse preconditioning. Appl. Numer. Math., 123:43–57, 2018. doi:10.1016/j.apnum.2017.09.001.
- Sufficient second-order optimality conditions for semilinear control problems with pointwise state constraints. SIAM J. Optim., 19(2):616–643, 2008. doi:10.1137/07068240X.
- Hybrid limited memory gradient projection methods for box-constrained optimization problems. Comput. Optim. Appl., 84(1):151–189, 2023. doi:10.1007/s10589-022-00409-4.
- A positive Barzilai-Borwein-like stepsize and an extension for symmetric linear systems. In Numerical analysis and optimization. Selected papers based on the presentations at the 3rd international conference, pages 59–75. Cham: Springer, 2015. doi:10.1007/978-3-319-17689-5_3.
- Y.-H. Dai. Convergence properties of the BFGS algoritm. SIAM J. Optim., 13(3):693–701, 2002. doi:10.1137/S1052623401383455.
- Y.-H. Dai. A new analysis on the Barzilai-Borwein gradient method. J. Oper. Res. Soc. China, 1(2):187–198, 2013. doi:10.1007/s40305-013-0007-x.
- Y.-H. Dai. A perfect example for the BFGS method. Math. Program., 138(1-2 (A)):501–530, 2013. doi:10.1007/s10107-012-0522-2.
- Y.-H. Dai and R. Fletcher. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming. Numer. Math., 100(1):21–47, 2005. doi:10.1007/s00211-004-0569-y.
- A family of spectral gradient methods for optimization. Comput. Optim. Appl., 74(1):43–65, 2019. doi:10.1007/s10589-019-00107-8.
- The cyclic Barzilai-Borwein method for unconstrained optimization. IMA J. Numer. Anal., 26(3):604–627, 2006. doi:10.1093/imanum/drl006.
- Poblano toolbox for matlab v1.2. Accessed: 2023-03-25. URL: https://github.com/sandialabs/poblano_toolbox.
- Poblano v1.0: A matlab toolbox for gradient-based optimization. Sandia Report SAND2010-1422, 2010. URL: https://www.osti.gov/servlets/purl/989350.
- Optimal control in evolutionary micromagnetism. IMA J. Numer. Anal., 35(3):1342–1380, 2015. doi:10.1093/imanum/dru034.
- Y. Dai and L.-Z. Liao. R𝑅Ritalic_R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal., 22(1):1–10, 2002. doi:10.1093/imanum/22.1.1.
- A. Dener and T. Munson. Accelerating limited-memory quasi-newton convergence for large-scale optimization. In J. M. F. Rodrigues and P. J. S. et al. Cardoso, editors, Computational Science – ICCS 2019, pages 495–507, Cham, 2019. Springer International Publishing. doi:10.1007/978-3-030-22744-9_39.
- Numerical methods for unconstrained optimization and nonlinear equations, volume 16 of Class. Appl. Math. Philadelphia, PA: SIAM, repr. edition, 1996. doi:10.1137/1.9781611971200.
- L. Failer and T. Richter. A Newton multigrid framework for optimal control of fluid-structure interactions. Optim. Eng., 22(4):2009–2037, 2021. doi:10.1007/s11081-020-09498-8.
- A parallel-in-time multiple shooting algorithm for large-scale pde-constrained optimal control problems. J. Comput. Phys., 452:110926, 2022. doi:10.1016/j.jcp.2021.110926.
- J. C. Gilbert and C. Lemaréchal. Some numerical experiments with variable-storage quasi-Newton algorithms. Math. Program., 45(3 (B)):407–435, 1989. doi:10.1007/BF01589113.
- A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal., 23:707–716, 1986. doi:10.1137/0723046.
- J. Gao and Y. Ou. A hybrid BB-type method for solving large scale unconstrained optimization. J. Appl. Math. Comput., 69(2):2105–2133, 2023. doi:10.1007/s12190-022-01826-8.
- A. Griewank. Rates of convergence for secant methods on nonlinear problems in Hilbert space. In J.-P. Hennart, editor, Numerical Analysis, pages 138–157. Springer Berlin Heidelberg, 1986. doi:10.1007/BFb0072677.
- A. Griewank. The global convergence of partitioned BFGS on problems with convex decompositions and Lipschitzian gradients. Math. Program., 50(2 (A)):141–175, 1991. doi:10.1007/BF01594933.
- W. A. Gruver and E. Sachs. Algorithmic methods in optimal control. Research Notes in Mathematics, 47. Boston, London, Melbourne: Pitman Advanced Publishing Program. X, 1981.
- L. Grippo and M. Sciandrone. Nonmonotone globalization techniques for the Barzilai-Borwein gradient method. Comput. Optim. Appl., 23(2):143–169, 2002. doi:10.1023/A:1020587701058.
- A. Hosseini Dehmiry. The global convergence of the BFGS method under a modified Yuan-Wei-Lu line search technique. Numer. Algorithms, 84(2):781–793, 2020. doi:10.1007/s11075-019-00779-7.
- M. Hintermüller and M. Ulbrich. A mesh-independence result for semismooth Newton methods. Math. Program., 101(1 (B)):151–184, 2004. doi:10.1007/s10107-004-0540-9.
- A modified self-scaling memoryless Broyden-Fletcher-Goldfarb-Shanno method for unconstrained optimization. J. Optim. Theory Appl., 165(1):209–224, 2015. doi:10.1007/s10957-014-0528-4.
- Quasi-newton methods and unconstrained optimal control problems. SIAM J. Control Optim., 25(6):1503–1516, 1987. doi:10.1137/0325083.
- C. Kanzow and D. Steck. Regularization of limited memory quasi-Newton methods for large-scale nonconvex minimization. Math. Program. Comput., 15(3):417–444, 2023. doi:10.1007/s12532-023-00238-4.
- F.-S. Kupfer. An infinite-dimensional convergence theory for reduced SQP methods in Hilbert space. SIAM J. Optim., 6(1):126–163, 1996. doi:10.1137/0806008.
- Optical wavefront reconstruction: theory and numerical methods. SIAM Rev., 44(2):169–224, 2002. doi:10.1137/S003614450139075.
- D. Li and M. Fukushima. A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math., 129(1-2):15–35, 2001. doi:10.1016/S0377-0427(00)00540-9.
- D.-H. Li and M. Fukushima. On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM J. Optim., 11(4):1054–1064, 2001. doi:10.1137/S1052623499354242.
- T.-W. Liu. A regularized limited memory BFGS method for nonconvex unconstrained minimization. Numer. Algorithms, 65(2):305–323, 2014. doi:10.1007/s11075-013-9706-y.
- Analysis of continuous H−1superscript𝐻1H^{-1}italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT-least-squares methods for the steady Navier-Stokes system. Appl. Math. Optim., 83(1):461–488, 2021. doi:10.1007/s00245-019-09554-5.
- D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Math. Program., 45(3 (B)):503–528, 1989. doi:10.1007/BF01589116.
- Diagonal BFGS updates and applications to the limited memory BFGS method. Comput. Optim. Appl., 81(3):829–856, 2022. doi:10.1007/s10589-022-00353-3.
- A structured L-BFGS method and its application to inverse problems. Inverse Probl. (status: major revision required), arXiv preprint available, 2023. doi:10.48550/arXiv.2310.07296.
- W. F. Mascarenhas. The BFGS method with exact line searches fails for non-convex objective functions. Math. Program., 99(1 (A)):49–61, 2004. doi:10.1007/s10107-003-0421-7.
- Diagonal Hessian approximation for limited memory quasi-Newton via variational principle. J. Appl. Math., 2013:8, 2013. Id/No 523476. doi:10.1155/2013/523476.
- A family of variable metric methods in function space, without exact line searches. J. Optim. Theory Appl., 31:303–329, 1980. doi:10.1007/BF01262975.
- F. Mannel and A. Rund. A hybrid semismooth quasi-Newton method for nonsmooth optimal control with PDEs. Optim. Eng., 22(4):2087–2125, 2021. doi:10.1007/s11081-020-09523-w.
- Line search algorithms with guaranteed sufficient decrease. ACM Trans. Math. Softw., 20(3):286–307, 1994. doi:10.1145/192115.192132.
- Y. Nesterov. Lectures on convex optimization, volume 137 of Springer Optim. Appl. Cham: Springer, 2nd edition edition, 2018. doi:10.1007/978-3-319-91578-4.
- J. Nocedal. Updating quasi-Newton matrices with limited storage. Math. Comput., 35:773–782, 1980. doi:10.2307/2006193.
- On the efficiency of gradient based optimization algorithms for dns-based optimal control in a turbulent channel flow. Comput. Fluids, 125:11–24, 2016. doi:10.1016/j.compfluid.2015.10.019.
- J. Nocedal and S. J. Wright. Numerical optimization. New York, NY: Springer, 2nd edition, 2006. doi:10.1007/978-0-387-40065-5.
- S. S. Oren. Perspectives on self-scaling variable metric algorithms. J. Optim. Theory Appl., 37:137–147, 1982. doi:10.1007/BF00934764.
- M. J. D. Powell. Algorithms for nonlinear constraints that use Lagrangian functions. Math. Program., 14:224–248, 1978. doi:10.1007/BF01588967.
- M. Raydan. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim., 7(1):26–33, 1997. doi:10.1137/S1052623494266365.
- F. Schöpfer. Linear convergence of descent methods for the unconstrained minimization of restricted strongly convex functions. SIAM J. Optim., 26(3):1883–1911, 2016. doi:10.1137/140992990.
- A new regularized limited memory BFGS-type method based on modified secant conditions for unconstrained optimization problems. J. Glob. Optim., 63(4):709–728, 2015. doi:10.1007/s10898-015-0310-7.
- A regularized limited memory BFGS method for large-scale unconstrained optimization and its efficient implementations. Comput. Optim. Appl., 82(1):61–88, 2022. doi:10.1007/s10589-022-00351-5.
- On the derivation of quasi-Newton formulas for optimization in function spaces. Numer. Funct. Anal. Optim., 41(13):1564–1587, 2020. doi:10.1080/01630563.2020.1785496.
- New cautious BFGS algorithm based on modified Armijo-type line search. J. Inequal. Appl., 2012:10, 2012. Id/No 241. doi:10.1186/1029-242X-2012-241.
- New quasi-Newton methods for unconstrained optimization problems. Appl. Math. Comput., 175(2):1156–1188, 2006. doi:10.1016/j.amc.2005.08.027.
- Global convergence of a modified limited memory BFGS method for non-convex minimization. Acta Math. Appl. Sin., Engl. Ser., 29(3):555–566, 2013. doi:10.1007/s10255-013-0233-3.
- Y. Yang. A robust bfgs algorithm for unconstrained nonlinear optimization problems. Optimization, 0(0):1–23, 2022. doi:10.1080/02331934.2022.2124869.
- The global convergence of the BFGS method with a modified WWP line search for nonconvex functions. Numer. Algorithms, 91(1):353–365, 2022. doi:10.1007/s11075-022-01265-3.
- Global convergence of BFGS and PRP methods under a modified weak Wolfe-Powell line search. Appl. Math. Modelling, 47:811–825, 2017. doi:10.1016/j.apm.2017.02.008.
- The projection technique for two open problems of unconstrained optimization problems. J. Optim. Theory Appl., 186(2):590–619, 2020. doi:10.1007/s10957-020-01710-0.
- Adaptive scaling damped BFGS method without gradient Lipschitz continuity. Appl. Math. Lett., 124:7, 2022. Id/No 107634. doi:10.1016/j.aml.2021.107634.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.