Nonmonotone Line-Search Framework

Updated 22 January 2026

Nonmonotone Line-Search Framework is a global optimization strategy that relaxes strict descent rules by using reference values from past iterates to enable temporary merit function increases.
It employs adaptive acceptance conditions with memory and averaging schemes to improve performance in nonconvex, nonsmooth, and multiobjective optimization tasks.
The framework guarantees convergence and optimal complexity while facilitating faster escape from local minima and enhancing overall numerical efficiency.

A nonmonotone line-search framework is an algorithmic globalization strategy in numerical optimization that relaxes the requirement of monotonic decrease in the merit function value at each iteration. Instead, some controllable increase in the function is allowed, subject to a prescribed nonmonotonicity tolerance—often enforced through a reference or aggregate value computed from past iterates. Such frameworks are central to modern large-scale optimization, particularly for nonconvex, nonsmooth, or multiobjective problems, and have been shown to enable more aggressive progress, escape from shallow local minima, and enhanced numerical performance while retaining provable convergence and complexity guarantees.

1. Foundational Principles and Acceptance Conditions

The defining feature of nonmonotone line-search algorithms is the replacement of the strict monotonicity (e.g., $f(x_{k+1}) \leq f(x_k)$ ) with a relaxed sufficient decrease condition against a reference value derived from a window or an average of past function values. The general template is to accept a trial $x_{k+1}$ if

$f(x_{k+1}) \leq R_k + \rho \alpha_k \nabla f(x_k)^\top d_k + \nu_k,$

where $R_k$ is a nonmonotone reference term (e.g., max, weighted mean, or convex combination of recent $f(x_{k-j})$ ), $\rho\in(0,1)$ is the Armijo parameter, $\alpha_k$ is the trial step size, $d_k$ the search direction, and $\nu_k\geq0$ a relaxation parameter. The classical Grippo–Lampariello–Lucidi (GLL) rule sets $R_k = \max_{0\leq j\leq N} f(x_{k-j})$ ; the Zhang–Hager rule forms a convex average with exponentially decaying weights (Grapiglia et al., 2019). A variety of memory and averaging strategies have been developed, including Metropolis-inspired and summable relaxation terms (Aminifard et al., 26 Feb 2025, Pinheiro et al., 2024).

This relaxation enables the method to tolerate temporary increases in the objective, provided that overall progress (possibly measured through an aggregate merit function or Lyapunov value) trends downward. The same principle extends to nonsmooth, stochastic, subgradient, and manifold settings, with appropriate modifications for stationarity measures and projections (Aragón-Artacho et al., 22 Oct 2025, Oviedo et al., 2017, Galli et al., 2023).

2. Algorithmic Structures and Prototypical Schemes

Nonmonotone line-search frameworks share a common structure:

Direction Selection: A descent or sufficient-descent direction is computed (e.g., negative gradient, subgradient, Newton/quasi-Newton update, proximal-gradient, projected/minimum-norm direction in constrained or manifold contexts).
Trial Step and Reference Evaluation: The algorithm builds trial iterates according to $x_{k+1} = x_k + \alpha_k d_k$ or, more generally, a problem-tailored update, and evaluates the merit function against the nonmonotone reference value.
Nonmonotone Acceptance: Using a line search based on reference value $R_k$ , the method accepts $x_{k+1}$ as soon as the nonmonotone sufficient decrease is met.
Reference Update: $R_{k+1}$ is recomputed based on a prescribed rule (e.g., moving window, weighted average, or adaptive recursion).
Auxiliary Mechanisms: In many variants, spectral step-sizes (Barzilai–Borwein), projection operators, extrapolation, or memory/weight adjustments are integrated.

A minimal pseudocode template:

for k in 0,1,...
    Compute direction d_k (satisfying descent or related conditions)
    Set α ← initial trial step
    while not accepted:
        x_trial ← x_k + α d_k
        Compute R_k (nonmonotone reference)
        If f(x_trial) ←≤ R_k + ρ α ∇f(x_k)^T d_k + ν_k:
            x_{k+1} ← x_trial; break
        Else:
            α ← α * backtracking_factor
    Update reference R_{k+1}

This structure adapts seamlessly to constraints (projected search), manifolds (retraction/projection-based descent, e.g., Stiefel manifold (Oviedo et al., 2017)), composite problems (proximal step (Marchi, 2022, Yang et al., 27 Nov 2025)), nonconvex nonsmooth/upper-

\mathcal{C}^2

functions (Aragón-Artacho et al., 22 Oct 2025), stochastic/mini-batch or subsampled settings (Galli et al., 2023, Bellavia et al., 2018), and multiobjective formulations (Pinheiro et al., 2024, Mansueto, 2 Sep 2025).

3. Theoretical Convergence, Complexity, and KL Rates

Under classical smoothness and boundedness assumptions—such as Lipschitz continuity of the gradient, coercivity or bounded level sets, and summability or decay of the nonmonotonicity parameters—it is established that nonmonotone frameworks achieve global convergence to first-order or stationary points with worst-case complexity

$O(\epsilon^{-2})$

to attain $\|\nabla f(x_k)\|\leq \epsilon$ (Grapiglia et al., 2019, Marchi, 2022, Qian et al., 2024).

For Kurdyka–Łojasiewicz (KL) functions, more refined results can be established: if the KL exponent at the critical point is $\theta \in (0,1)$ , then

For $\theta\leq 1/2$ , the convergence of $\{x_k\}$ is R-linear (geometric).
For $\theta\in(1/2,1)$ , the rate is sublinear, with $\|x_k-x^*\| = O(k^{-\frac{1-\theta}{2\theta-1}})$ (Qian et al., 2022, Qian et al., 2024, Yang et al., 27 Nov 2025). Auxiliary relative error conditions remain necessary for nonsmooth and nonconvex models to obtain full-sequence convergence.

For multiobjective and Hölder-gradient problems, iteration complexity can scale on the order $O(\epsilon^{-(1+1/\theta_\min)})$, where $\theta_\min$ is the minimum smoothness exponent among objectives (Pinheiro et al., 2024).

The convergence of the merit/reference function is generally ensured, and in most frameworks, every cluster point is either first-order stationary, Pareto-stationary, or Clarke (nonsmooth) stationary with respect to the problem structure (Aragón-Artacho et al., 22 Oct 2025, Marchi, 2022, Ma et al., 15 Jan 2026).

4. Practical Mechanisms: Spectral Steps, Averaging, and Adaptive Memory

Nonmonotone line searches are often paired with acceleration or adaptivity techniques:

Spectral Stepsizes (Barzilai–Borwein): At each iteration, compute BB1 or BB2 stepsize choices based on previous $(s_{k-1},y_{k-1})$ differences and safeguard the step size. This exploits local curvature information and enables substantial practical speedups, particularly when paired with nonmonotonicity (Hazaimah, 5 Jan 2025, Marchi, 2022, Ma et al., 15 Jan 2026).
Averaged/Window Reference Values: Both short and long memory strategies are used, from moving max windows (GLL) to exponentially weighted or convex averages (Zhang–Hager, Metropolis-type, combined forms), with parameters or temperature controlling decay (Grapiglia et al., 2019, Aminifard et al., 26 Feb 2025, Ahookhosh et al., 2014).
Adaptive Memory and Self-Tuning: Self-adaptive methods adjust the memory window or extrapolation parameter based on acceptance/rejection in the line-search, enhancing robustness and aggression in nonconvex/nonsmooth regimes (Aragón-Artacho et al., 22 Oct 2025, Yang et al., 27 Nov 2025).
Composite and Subsampled Variants: Proximal, subgradient, projected, and manifold-based variants are seamlessly integrated, with line-search conditions customized for structure (e.g., composite DC models, spectral-projected subgradient, bound constraints, Riemannian settings) (Marchi, 2022, Jerinkić et al., 2022, Ma et al., 15 Jan 2026, Oviedo et al., 2017).
Extrapolation: Acceleration can be achieved by FISTA-type extrapolation in the trial step, combined with nonmonotone tests on suitable Lyapunov/potential functions (Yang et al., 27 Nov 2025, Qian et al., 2022).

5. Applications and Empirical Performance

Nonmonotone line-search frameworks are broadly applied:

Unconstrained and bound-constrained nonlinear optimization (smooth and nonsmooth): Nonmonotone Armijo–type strategies are integrated with Newton, quasi-Newton, and Barzilai–Borwein gradient methods, with global and superlinear convergence (Ahookhosh et al., 2014, Burdakov et al., 2017, Bellavia et al., 2018).
Composite optimization, including dictionary learning, sparse regression, and piecewise penalty problems: Nonmonotonicity enables larger step sizes in proximal-gradient and alternating-minimization algorithms, especially when extrapolation is employed (Marchi, 2022, Yang et al., 27 Nov 2025, Themelis et al., 2016, Qian et al., 2022).
Stochastic and online learning: Relaxed nonmonotonicity allows aggressive learning rates in stochastic gradient frameworks, improving epoch-wise convergence and generalization in deep learning (Galli et al., 2023).
Multiobjective optimization: Nonmonotone extensions admit temporary increases in selected objectives, enabling faster expansion of the Pareto front while retaining global convergence (Pinheiro et al., 2024, Mansueto, 2 Sep 2025).
PDE-constrained control, variational inclusions, DC programs, and manifold optimization: Variants are constructed for infinite-dimensional or structured spaces, exploiting projections or retractions and customized stationarity measures (Azmi et al., 2023, Hazaimah, 5 Jan 2025, Oviedo et al., 2017).
Empirical evaluation: Across diverse benchmarks (global optimization testbeds, large-scale imaging, PDE-constrained control, high-dimensional classification), nonmonotone line-search schemes outperform or match their monotone analogues in iteration count, function/gradient calls, and wall-clock time (Ahookhosh et al., 2014, Aminifard et al., 26 Feb 2025, Marchi, 2022).

6. Representative Complexity and Convergence Tables

Below is a summary table of representative nonmonotone reference strategies and their theoretical guarantees, as established in cited works:

Rule Type	Reference Update	Complexity / Convergence
GLL-window	$R_k = \max_{0\leq j\leq m} f(x_{k-j})$	$O(\epsilon^{-2})$ (Grapiglia et al., 2019)
Zhang–Hager average	$C_k=(\eta_{k-1} Q_{k-1} C_{k-1} + f_k)/Q_k$	$O(\epsilon^{-2})$ , full sequence KL (Qian et al., 2024)
Metropolis-inspired	$R_k = ... + \sigma\exp(-...)$	$O(\epsilon^{-2/\theta})$ (if strong nonmonotonicity) (Aminifard et al., 26 Feb 2025, Pinheiro et al., 2024)
Summable relaxation	$R_k = f(x_k) + \zeta_k$ , $\sum\zeta_k<\infty$	$O(\epsilon^{-2})$ (Bellavia et al., 2018, Burdakov et al., 2017)
Averaged Lyapunov	$R_{k+1} = (1-p_{k+1})R_k + p_{k+1} F(x^{k+1})$	$O(\epsilon^{-2})$ , KL-linear/sublinear (Marchi, 2022, Yang et al., 27 Nov 2025)

Selection and tuning of memory/relaxation is problem dependent: larger windows or more aggressive relaxation favor exploration, while vanishing (or summable) parameters restore monotonicity in the limit and ensure theoretical guarantees.

7. Significance, Trends, and Extensions

Nonmonotone line-search frameworks have unified and extended global convergence analyses across smooth, nonsmooth, nonconvex, stochastic, multiobjective, and manifold optimization. They provide robust globalization tools for acceleration methods, proximal and subgradient variants, and Newton/quasi-Newton updates in settings where strict monotonicity is practically and theoretically restrictive.

Trends include integration with KL-analysis, adaptive parameter/memory schemes, stochastic or subsampled variants for data-driven models, and structure-preserving extensions for PDEs, variational inequalities, and matrix manifolds (Yang et al., 27 Nov 2025, Ma et al., 15 Jan 2026, Oviedo et al., 2017). The flexibility of nonmonotone globalization is key in large-scale, high-dimensional, and nonconvex settings, as evidenced by comprehensive recent complexity analyses and extensive empirical validation (Aminifard et al., 26 Feb 2025, Marchi, 2022, Bellavia et al., 2018, Aragón-Artacho et al., 22 Oct 2025, Galli et al., 2023).