Papers
Topics
Authors
Recent
Search
2000 character limit reached

Are Convex Optimization Curves Convex?

Published 13 Mar 2025 in math.OC and cs.LG | (2503.10138v3)

Abstract: In this paper, we study when we might expect the optimization curve induced by gradient descent to be \emph{convex} -- precluding, for example, an initial plateau followed by a sharp decrease, making it difficult to decide when optimization should stop. Although such undesirable behavior can certainly occur when optimizing general functions, might it also occur in the benign and well-studied case of smooth convex functions? As far as we know, this question has not been tackled in previous work. We show, perhaps surprisingly, that the answer crucially depends on the choice of the step size. In particular, for the range of step sizes which are known to result in monotonic convergence to an optimal value, we characterize a regime where the optimization curve will be provably convex, and a regime where the curve can be non-convex. We also extend our results to gradient flow, and to the closely-related but different question of whether the gradient norm decreases monotonically.

Summary

  • The paper demonstrates that for L-smooth convex functions, gradient descent yields convex optimization curves when the step size is in the range (0, 1/L].
  • It reveals that larger step sizes, specifically in (1.75/L, 2/L), can result in non-convex curves despite a monotonically decreasing objective function.
  • The study extends these insights to continuous gradient flow, confirming convex curves and a steadily decreasing gradient norm, thus improving convergence interpretability.

Are Convex Optimization Curves Convex?

The paper "Are Convex Optimization Curves Convex?" (2503.10138) investigates the conditions under which optimization curves produced by gradient descent on convex functions are themselves convex. Although this problem seems straightforward, the answer is contingent upon the choice of step size.

Gradient Descent and Optimization Curves

Gradient descent is a widely-used optimization algorithm defined for a differentiable function f:Rn→Rf:\mathbb{R}^n \rightarrow \mathbb{R}, starting at an initial point x0x_0, with iterations given by xn=xn−1−η∇f(xn−1)x_{n} = x_{n-1} - \eta \nabla f(x_{n-1}), where η>0\eta > 0 is the step size. The optimization curve is the linear interpolation of points {(n,f(xn))}\{(n, f(x_n))\}, representing the path traced by the objective function values across iterations.

Convexity of the Optimization Curve

Main Result

The main finding of this paper is that for any LL-smooth convex function, if the step size η\eta is chosen in the range (0,1L](0, \frac{1}{L}], the optimization curve is indeed convex. This finding is surprising as it delineates conditions beyond mere monotonic convergence to an optimal value that assure the convexity of the curve. Figure 1

Figure 1

Figure 1: Gradient descent on f(x)=x2f(x)=x^2 with step-size η=0.1\eta=0.1 and initial point x0=3x_0=3.

Non-Convexity in Certain Regimes

For step sizes η\eta within (1.75L,2L)(\frac{1.75}{L}, \frac{2}{L}), the optimization curve can lose its convexity, even though the function values continue to decrease monotonically. This indicates the presence of multiple regimes in which gradient descent neither ensures convexity nor non-convexity independent of monotonicity over points.

Gradient Norms and Monotonic Decrease

Separately, the paper addresses whether the norm of gradients {∥∇f(xn)∥}\{\|\nabla f(x_n)\|\} decreases monotonically. It demonstrates that for LL-smooth functions and for any η∈(0,2L]\eta \in (0, \frac{2}{L}], the sequence of gradient norms is always monotonically decreasing. This highlights a distinction between gradient norm monotonicity and optimization curve convexity.

Extension to Gradient Flow

The findings are extended to continuous gradient flow, conceptualized as the limit case of gradient descent as the step size approaches zero. Within this framework, the paper affirms that the optimization curve induced by gradient flow is always convex for smooth convex functions. Additionally, the gradient norms decrease continuously, assuring a smoother convergence trajectory without oscillations encountered in discrete settings.

Conclusion

The implications of this research provide insights into the behavior of gradient descent, emphasizing the influence of step size on the convexity of optimization curves. This offers practical guidance for selecting step sizes that maintain desirable curve properties and improve interpretability, which is critical in applications where interpretability of convergence behavior is crucial, such as real-time optimization settings or when early stopping criteria are dynamically adjusted.

Future investigations could explore whether the identified convexity regimes for discrete optimization become apparent in practical large-scale optimization problems or whether these conceptual benchmarks may guide the development of adaptive methods ensuring both convexity of curves and efficient convergence. The paper questions whether more general optimization settings or analogous properties for other algorithms might possess different behaviors or require alternate step-size considerations.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 12 tweets with 34520 likes about this paper.