MINRES-Based Newton-Type Algorithm

Updated 11 January 2026

The algorithm replaces the classical CG solver with MINRES to harness second-order information and detect negative curvature for efficient descent.
It employs Lanczos-based Krylov subspace techniques and adaptive Armijo line-search to ensure convergence under both first-order and second-order stationarity.
Empirical results show enhanced robustness and reduced Hessian-vector products in large-scale, ill-conditioned nonconvex optimization tasks.

A MINRES-based Newton-type algorithm is an iterative optimization method for unconstrained, smooth (C²) nonconvex problems—minimizing a real-valued function f on ℝⁿ—that replaces the classical conjugate gradient (CG) solver in Newton’s method with the minimal residual (MINRES) algorithm for finding search directions. The MINRES approach possesses a distinctive ability to exploit second-order information, rigorously detect negative curvature (NPC) directions in indefinite Hessians, and guarantee competitive complexity under both first-order and second-order stationarity criteria. Recent advances have unified global and local convergence theory for this class of methods, leveraging regularization and advanced line-search procedures to avoid saddle points and achieve rapid convergence in practical problems (Liu et al., 2022, Zeng et al., 4 Jan 2026, Liu et al., 2022, Roosta et al., 2018, Frye, 2019).

1. Core Principle and Algorithmic Structure

The essential objective is to solve

$\min_{x\in\mathbb{R}^d} f(x) \quad \text{with} \quad f\in C^2(\mathbb{R}^d),$

where the gradient $g(x) = \nabla f(x)$ and Hessian $H(x)=\nabla^2 f(x)$ are available, possibly only via Hessian-vector products. At each iteration $k$ , the method reformulates the Newton step as a least-squares subproblem: $\min_{d \in \mathrm{K}_t(H_k, g_k)} \| H_k d + g_k \|^2,$ where $\mathrm{K}_t(H_k, g_k)$ denotes the t-th Krylov subspace generated by H and g. The MINRES algorithm iteratively builds an orthonormal basis for this Krylov subspace via the Lanczos process, solves the associated tridiagonal system using QR updates and Givens rotations, and, depending on stopping conditions, outputs either (1) an approximate Newton direction (small residual norm relative to tolerance), or (2) an NPC direction discovered when the tridiagonal matrix loses positive-definiteness (Liu et al., 2022, Liu et al., 2022).

Upon obtaining a direction, a line-search is performed—typically Armijo-type—tailored to the direction type (SOL: approximate Newton, NPC: negative curvature), with possible forward or backward search to accommodate unbounded step sizes along strictly descending NPC directions. The iteration is then updated as $x_{k+1} = x_k + \alpha_k d_k$ .

2. Negative Curvature Detection and Regularization

A key feature of MINRES is rigorous, cheap, and automatic NPC detection. As Lanczos tridiagonalizations progress, the occurrence of a nonpositive principal minor (i.e., loss of definiteness in the tridiagonal matrix $T_t$ ) signals negative curvature in H, providing a direction $r_{t-1}$ with $r_{t-1}^\top H r_{t-1} \leq 0$ , which is exploited for descent or escape from saddle points (Liu et al., 2022, Zeng et al., 4 Jan 2026, Liu et al., 2022).

To ensure robustness near saddle points and nonisolated minima, the Hessian may be regularized by adding $\zeta_k I$ , with a sequence $\zeta_k \to 0$ ensuring that as $k \to \infty$ , the true spectrum is still accurately reflected, but numerical degeneracies and stagnation along nearly flat directions are avoided. The algorithm can dynamically increase $\zeta_k$ when rapid local convergence is required, such as under the Polyak–Łojasiewicz (PL) condition near minima (Zeng et al., 4 Jan 2026).

3. Theoretical Guarantees and Complexity

First-Order and Second-Order Stationarity

The Newton-MR framework guarantees convergence to approximate first-order stationary points ( $\|g(x)\| \leq \varepsilon_g$ ) under standard C² and Lipschitz Hessian conditions. For second-order guarantees, the method incorporates an additional random right-hand side in MINRES on a shifted Hessian ( $H + \frac{1}{2}\varepsilon_H I$ ), yielding points with both small gradient and smallest eigenvalue of Hessian greater than $-\varepsilon_H$ . Established complexity results are as follows (Liu et al., 2022):

First-order (Newton-MR₁): $O(\varepsilon_g^{-3/2})$ iteration and oracle complexity.
Second-order (Newton-MR₂): $O(\max\{\varepsilon_g^{-3/2},\,\varepsilon_H^{-3}\})$ iterations; operation complexity $O(\max\{\varepsilon_g^{-3/2}, \varepsilon_H^{-7/2}\})$ , with improvements for benign-saddle problems.

Convergence Under KL and PL Inequalities

Advanced theoretical results cover convergence under the Kurdyka–Łojasiewicz (KL) property and local PL inequality. Under the KL framework, global convergence to critical points is ensured, and the algorithm provably avoids strict saddles given a mild NPC-detectability condition and properly decaying regularization. Locally around a minimum satisfying PL, fast (superlinear or almost quadratic) convergence of the gradient norm is obtained when regularization parameters are suitably chosen, matching the best-known rates for Newton-type methods (Zeng et al., 4 Jan 2026).

Sketch of Proof Techniques

Analysis combines:

Descent Certificiation: Each direction (SOL or NPC) is proven to be strictly descent, guaranteeing reduction of the objective or surrogate measure such as the gradient norm.
Lower Bounds: Step sizes and norm of the directions are bounded away from zero by properties of MINRES and line-search, ensuring nonvanishing progress.
Accumulation Point Argument: A classical argument proves that, unless at a stationary point, the aggregate progress exceeds the total objective variation, forcing convergence.
KL/PL Arguments: Lyapunov functions and the finite-length property, enforced by the KL or PL inequalities, ensure strong convergence properties often unattainable for general nonconvex problems.

4. MINRES vs. Classical Conjugate Gradient

A direct comparison reveals sharply contrasting properties:

Descent and Model Monotonicity: MINRES guarantees that before any negative curvature is found, iterates are strict descent directions for both f and its quadratic model, and the residual norms decrease monotonically (Liu et al., 2022).
Curvature Detection: NPC is detected at the precise moment the tridiagonal Lanczos matrix becomes indefinite, often before CG's breakdown occurs.
Residual-Based Stopping: Residual decay in MINRES provides a clean, universally-applicable inexactness criterion, while CG’s residual can stall or be misleading for indefinite systems.
Empirical Performance: On a variety of large-scale test problems and machine learning tasks, MINRES-based Newton methods achieve superior or comparable performance to CG variants, often using 20–50% fewer Hessian-vector products (Liu et al., 2022, Liu et al., 2022).

A plausible implication is that MINRES-based Newton algorithms represent a strict generalization of Newton–CG methods, both in inexactness control and in ability to handle indefinite and nearly singular Hessians without ad hoc modifications.

5. Extensions: Bound-Constrained and Active-Set Frameworks

Extension to bound-constrained nonconvex optimization is achieved by embedding MINRES-based Newton directions within active-set schemes. On each face (set of variables free from active constraints), descending directions are obtained by solving the in-face Newton system using MINRES. Escape from faces via Spectral Projected Gradient or cubic regularization ensures robust progress even when free variables are exhausted. The result is optimal or near-optimal worst-case complexity for both first- and second-order stationarity, with numerical experiments confirming superior robustness and speed to CG-based analogues (Birgin et al., 28 Aug 2025).

Algorithmic Domain	Key Role of MINRES	Empirical Benefit
Unconstrained nonconvex	NPC detection, fast Hessian-vector solves	Fewer oracle calls, saddle escape
Bound-constrained/active-set	Face-wise Newton solves	Increased robustness, global rates
Large-scale machine learning	Hessian-free computation	Accelerated convergence

6. Practical Properties and Numerical Results

Practical studies on CUTEst, deep auto-encoders (CIFAR-10, MNIST), and regularized nonlinear least-squares (Gisette, STL-10) universally report that MINRES-based Newton-type algorithms exhibit:

Rapid convergence to high-accuracy solutions ( $\|g\| \leq 10^{-10}$ ), unattainable by pure first-order methods or CG variants that neglect NPC.
Fast escape from saddle points, facilitated by automatic exploitation of NPC directions.
Superior robustness on ill-conditioned and highly nonconvex instances: for example, Newton-MR attains the lowest objective values and uses significantly fewer Hessian-vector products and wall-clock time compared to L-BFGS, Newton-CR, and Newton-CG (Liu et al., 2022, Zeng et al., 4 Jan 2026, Roosta et al., 2018).

Empirical speedups of 20–30% over CG-based active set methods (e.g., Gencan) have been observed, primarily due to graceful handling of indefinite or nearly singular Hessians (Birgin et al., 28 Aug 2025). Only NPC-aware methods (Newton-MR, L-BFGS-MR) reliably reach the tightest gradient-norm tolerances; CG- or pure quasi-Newton variants often stall or fail once indefinite Hessians are encountered.

7. Limitations and Research Directions

Current limitations of MINRES-based Newton-type algorithms include:

The absence of explicit global iteration bounds for some variants (only asymptotic rates under KL analysis).
The necessity of tuning sequences for inexactness and regularization.
The need for further extensions to stochastic, large-scale, or non-Euclidean settings.

Emerging research directions include adaptation to alternative Krylov subspace solvers (such as CG and CR with improved curvature detection), refined nonasymptotic complexity results paralleling those for inexact Newton-MR (Roosta et al., 2018), and stochastic-quadratic frameworks for high-dimensional learning tasks (Zeng et al., 4 Jan 2026, Liu et al., 2022). These developments are actively pursued by optimization and machine learning communities.

For comprehensive technical depth, see (Liu et al., 2022, Liu et al., 2022, Zeng et al., 4 Jan 2026, Birgin et al., 28 Aug 2025, Roosta et al., 2018, Frye, 2019).