Are Convex Optimization Curves Convex?

Published 13 Mar 2025 in math.OC and cs.LG | (2503.10138v3)

Abstract: In this paper, we study when we might expect the optimization curve induced by gradient descent to be \emph{convex} -- precluding, for example, an initial plateau followed by a sharp decrease, making it difficult to decide when optimization should stop. Although such undesirable behavior can certainly occur when optimizing general functions, might it also occur in the benign and well-studied case of smooth convex functions? As far as we know, this question has not been tackled in previous work. We show, perhaps surprisingly, that the answer crucially depends on the choice of the step size. In particular, for the range of step sizes which are known to result in monotonic convergence to an optimal value, we characterize a regime where the optimization curve will be provably convex, and a regime where the curve can be non-convex. We also extend our results to gradient flow, and to the closely-related but different question of whether the gradient norm decreases monotonically.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that for L-smooth convex functions, gradient descent yields convex optimization curves when the step size is in the range (0, 1/L].
It reveals that larger step sizes, specifically in (1.75/L, 2/L), can result in non-convex curves despite a monotonically decreasing objective function.
The study extends these insights to continuous gradient flow, confirming convex curves and a steadily decreasing gradient norm, thus improving convergence interpretability.

Are Convex Optimization Curves Convex?

The paper "Are Convex Optimization Curves Convex?" (2503.10138) investigates the conditions under which optimization curves produced by gradient descent on convex functions are themselves convex. Although this problem seems straightforward, the answer is contingent upon the choice of step size.

Gradient Descent and Optimization Curves

Gradient descent is a widely-used optimization algorithm defined for a differentiable function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ , starting at an initial point $x_0$ , with iterations given by $x_{n} = x_{n-1} - \eta \nabla f(x_{n-1})$ , where $\eta > 0$ is the step size. The optimization curve is the linear interpolation of points $\{(n, f(x_n))\}$ , representing the path traced by the objective function values across iterations.

Convexity of the Optimization Curve

Main Result

The main finding of this paper is that for any $L$ -smooth convex function, if the step size $\eta$ is chosen in the range $(0, \frac{1}{L}]$ , the optimization curve is indeed convex. This finding is surprising as it delineates conditions beyond mere monotonic convergence to an optimal value that assure the convexity of the curve.

Figure 1: Gradient descent on $f(x)=x^2$ with step-size $\eta=0.1$ and initial point $x_0=3$ .

Non-Convexity in Certain Regimes

For step sizes $\eta$ within $(\frac{1.75}{L}, \frac{2}{L})$ , the optimization curve can lose its convexity, even though the function values continue to decrease monotonically. This indicates the presence of multiple regimes in which gradient descent neither ensures convexity nor non-convexity independent of monotonicity over points.

Gradient Norms and Monotonic Decrease

Separately, the paper addresses whether the norm of gradients $\{\|\nabla f(x_n)\|\}$ decreases monotonically. It demonstrates that for $L$ -smooth functions and for any $\eta \in (0, \frac{2}{L}]$ , the sequence of gradient norms is always monotonically decreasing. This highlights a distinction between gradient norm monotonicity and optimization curve convexity.

Extension to Gradient Flow

The findings are extended to continuous gradient flow, conceptualized as the limit case of gradient descent as the step size approaches zero. Within this framework, the paper affirms that the optimization curve induced by gradient flow is always convex for smooth convex functions. Additionally, the gradient norms decrease continuously, assuring a smoother convergence trajectory without oscillations encountered in discrete settings.

Conclusion

The implications of this research provide insights into the behavior of gradient descent, emphasizing the influence of step size on the convexity of optimization curves. This offers practical guidance for selecting step sizes that maintain desirable curve properties and improve interpretability, which is critical in applications where interpretability of convergence behavior is crucial, such as real-time optimization settings or when early stopping criteria are dynamically adjusted.

Future investigations could explore whether the identified convexity regimes for discrete optimization become apparent in practical large-scale optimization problems or whether these conceptual benchmarks may guide the development of adaptive methods ensuring both convexity of curves and efficient convergence. The paper questions whether more general optimization settings or analogous properties for other algorithms might possess different behaviors or require alternate step-size considerations.

Markdown Report Issue