A novel interpretation of Nesterov's acceleration via variable step-size linear multistep methods
Abstract: Nesterov's acceleration in continuous optimization can be understood in a novel way when Nesterov's accelerated gradient (NAG) method is considered as a linear multistep (LM) method for gradient flow. Although the NAG method for strongly convex functions (NAG-sc) has been fully discussed, the NAG method for $L$-smooth convex functions (NAG-c) has not. To fill this gap, we show that the existing NAG-c method can be interpreted as a variable step size LM (VLM) for the gradient flow. Surprisingly, the VLM allows linearly increasing step sizes, which explains the acceleration in the convex case. Here, we introduce a novel technique for analyzing the absolute stability of VLMs. Subsequently, we prove that NAG-c is optimal in a certain natural class of VLMs. Finally, we construct a new broader class of VLMs by optimizing the parameters in the VLM for ill-conditioned problems. According to numerical experiments, the proposed method outperforms the NAG-c method in ill-conditioned cases. These results imply that the numerical analysis perspective of the NAG is a promising working environment, and considering a broader class of VLMs could further reveal novel methods.
- Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α≤3𝛼3\alpha\leq 3italic_α ≤ 3. ESAIM Control Optim. Calc. Var., 25:Paper No. 2, 34, 2019.
- H. Attouch and J. Peypouquet. The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k21superscript𝑘21/k^{2}1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. SIAM J. Optim., 26(3):1824–1834, 2016.
- A0subscript𝐴0A_{0}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-stability of variable stepsize BDF methods. J. Comput. Appl. Math., 45(1-2):29–39, 1993.
- Dissipative numerical schemes on Riemannian manifolds with applications to gradient flows. SIAM J. Sci. Comput., 40(6):A3789–A3806, 2018.
- Y. Drori. The exact information-based complexity of smooth convex minimization. J. Complexity, 39:1–16, 2017.
- Discrete gradient methods for solving variational image regularisation models. J. Phys. A, 50(29):295201, 21, 2017.
- Solving ordinary differential equations. I, volume 8 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, second edition, 1993.
- E. Hairer and G. Wanner. Solving ordinary differential equations. II, volume 14 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, 2010.
- Accelerated mirror descent in continuous and discrete time. In Advances in Neural Information Processing Systems, volume 28, 2015.
- On the equivalence between SOR-type methods for linear systems and the discrete gradient methods for gradient systems. J. Comput. Appl. Math., 342:58–69, 2018.
- Y. Nesterov. Introductory lectures on convex optimization, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, 2004.
- Y. E. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k2)𝑂1superscript𝑘2O(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Dokl. Akad. Nauk SSSR, 269(3):543–547, 1983.
- B. O’Donoghue and E. Candès. Adaptive restart for accelerated gradient schemes. Found. Comput. Math., 15(3):715–732, 2015.
- A geometric integration approach to nonsmooth, nonconvex optimisation. Found. Comput. Math., 22(5):1351–1394, 2022.
- Variational image regularization with Euler’s elastica using a discrete gradient scheme. SIAM J. Imaging Sci., 11(4):2665–2691, 2018.
- The connections between Lyapunov functions for some optimization algorithms and differential equations. SIAM J. Numer. Anal., 59(3):1542–1565, 2021.
- Integration methods and optimization algorithms. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Understanding the acceleration phenomenon via high-resolution differential equations. Math. Program., 195(1-2):79–148, 2022.
- Acceleration via symplectic discretization of high-resolution differential equations. In Advances in Neural Information Processing Systems, volume 32, 2019.
- G. W. Stewart. On the powers of a matrix with perturbations. Numer. Math., 96(2):363–376, 2003.
- A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res., 17(153):1–43, 2016.
- A unified discretization framework for differential equation approach with Lyapunov arguments for convex optimization. In Advances in Neural Information Processing Systems, volume 36, pages 26092–26120, 2023.
- A. Wilson. Lyapunov arguments in optimization. University of California, Berkeley, 2018.
- Direct Runge-Kutta discretization achieves acceleration. In Advances in Neural Information Processing Systems, volume 31, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.