Derivative-Free Saddle-Search Algorithm

Updated 14 January 2026

The algorithm finds index‑k saddle points using only function evaluations, bypassing the need for gradients or Hessians in high-dimensional settings.
It employs stochastic approximations and Monte Carlo estimators to recover unstable eigenspaces and drive a two-loop iterative process for saddle-point discovery.
The method achieves provable linear convergence under proper parameter tuning, making it highly effective for black-box simulations and complex optimization landscapes.

A derivative-free saddle-search algorithm refers to any computational scheme that finds index- $k$ saddle points of a smooth objective $f:\mathbb{R}^d\to\mathbb{R}$ (i.e., stationary points with exactly $k$ unstable directions) using only function evaluations, not explicit gradients or Hessians. Such approaches are particularly valuable for high-dimensional physical landscapes, black-box simulation models, or machine learning loss surfaces where derivatives are unavailable or unreliable. Recent algorithms combine stochastic approximation, Monte Carlo estimators, and nested iterative architectures to achieve provable convergence and practical efficiency in saddle-point discovery while maintaining strictly zeroth-order (function-value-evaluation only) operations (Du et al., 6 Jan 2026).

1. Formulation and Notational Framework

Let $f:\mathbb{R}^d\to\mathbb{R}$ be $C^6$ -smooth. An index- $k$ saddle $x^*$ satisfies $\nabla f(x^*)=0$ and $\nabla^2 f(x^*)$ has precisely $k$ negative eigenvalues. In chemical physics, the index- $f:\mathbb{R}^d\to\mathbb{R}$ 0 case is the prototypical transition state; higher index saddles characterize complex transition bottlenecks.

Classical (gradient-based) saddle-search methods employ the so-called surface-walking ODE:

$f:\mathbb{R}^d\to\mathbb{R}$ 1

where $f:\mathbb{R}^d\to\mathbb{R}$ 2 spans the unstable subspace. The derivative-free challenge is to implement the analog of this dynamics using only function queries $f:\mathbb{R}^d\to\mathbb{R}$ 3, without access to $f:\mathbb{R}^d\to\mathbb{R}$ 4 or $f:\mathbb{R}^d\to\mathbb{R}$ 5.

2. Zeroth-Order Gradient and Hessian Estimators

Fundamental to the algorithm are high-precision zeroth-order estimators for the gradient and Hessian-vector products, constructed via Monte Carlo techniques:

Gradient Estimator: For $f:\mathbb{R}^d\to\mathbb{R}$ 6 and $f:\mathbb{R}^d\to\mathbb{R}$ 7,

$f:\mathbb{R}^d\to\mathbb{R}$ 8

This estimator is unbiased for the gradient of the Gaussian-convolved $f:\mathbb{R}^d\to\mathbb{R}$ 9:

$k$ 0

The variance satisfies $k$ 1.

Hessian-Vector Estimator: Using the gradient estimator,

$k$ 2

This approximates $k$ 3. For any fixed $k$ 4, $k$ 5 is unbiased for the Hessian of $k$ 6 applied to $k$ 7, with variance $k$ 8 for $k$ 9.

This strict zeroth-order character is essential for problems where only black-box evaluations are possible.

3. Nested Two-Loop Algorithm: Architecture, Steps, and Parameters

The method operates in a double-loop structure:

Inner Loop: Stochastic Unstable Eigenspace Recovery

For each $f:\mathbb{R}^d\to\mathbb{R}$ 0 (the index), solve the Rayleigh-quotient minimization

$f:\mathbb{R}^d\to\mathbb{R}$ 1

via power/lanczos-type stochastic iteration:

$f:\mathbb{R}^d\to\mathbb{R}$ 2

where $f:\mathbb{R}^d\to\mathbb{R}$ 3 and $f:\mathbb{R}^d\to\mathbb{R}$ 4 is a decaying step size. Convergence to the $f:\mathbb{R}^d\to\mathbb{R}$ 5-th smallest eigenvector is certified by norm reduction in the residual.

Outer Loop: Surface-Walking Saddle Search

Given $f:\mathbb{R}^d\to\mathbb{R}$ 6 and the matrix $f:\mathbb{R}^d\to\mathbb{R}$ 7, draw a random direction $f:\mathbb{R}^d\to\mathbb{R}$ 8 and estimate the gradient as above.
Update:

$f:\mathbb{R}^d\to\mathbb{R}$ 9

with either a decaying or constant step size $C^6$ 0. The sign flip in the unstable subspace $C^6$ 1 reflects the “gentlest ascent” prescription for driving $C^6$ 2 toward an index- $C^6$ 3 transition state.

Stopping Criteria: Inner: residual norm of eigenvector step $C^6$ 4; Outer: $C^6$ 5 or plateauing error.

Parameter Schedules and Implementation:

Decay choices for $C^6$ 6 and difference length $C^6$ 7 ensure bias/variance is balanced.
Practical cost per iteration is $C^6$ 8 function-evaluations (outer plus $C^6$ 9-eigenvector recovery, each via $k$ 0 stochastic iterations).

4. Theoretical Analysis: Convergence Guarantees and Complexity

Key Assumptions (A.1–A.7):

$k$ 1-smooth $k$ 2, uniform Hessian spectral gap near $k$ 3, Lipschitz continuity of $k$ 4.
Sampling in $k$ 5 is i.i.d. standard normal.
Subspace estimation error $k$ 6 is controlled so that $k$ 7 for Hessian eigen-gap $k$ 8.

Inner Loop: By Robbins–Siegmund stochastic approximation, the eigenvector iteration converges almost surely to the desired unstable subspace of $k$ 9. The Davis–Kahan theorem yields that the subspace error to the true Hessian is $x^*$ 0.

Outer Loop (Decaying Step): With decaying $x^*$ 1 and $x^*$ 2, error converges to zero almost surely if the initial condition is within a domain of attraction.

Outer Loop (Constant Step/Linear Rate):

For constant step sizes satisfying strict bounds, conditional mean-square error decays linearly:

$x^*$ 3

where

$x^*$ 4

The error exhibits a two-stage regime: initial linear contraction followed by a plateau of size $x^*$ 5 dominated by estimator variance.

Complexity: To reach given tolerance in subspace estimation ( $x^*$ 6), $x^*$ 7 inner iterations are required. Outer convergence to $x^*$ 8 neighborhood takes $x^*$ 9 steps, so the total query count is $\nabla f(x^*)=0$ 0. For practical purposes, choosing $\nabla f(x^*)=0$ 1 optimally matches subspace noise to estimator bias.

5. Application Examples and Representative Benchmarks

Empirical tests validate both the theory and versatility:

Müller–Brown Potential (2D, Index-1): The algorithm accurately tracks the transition state with linear ( $\nabla f(x^*)=0$ 2) contraction to the saddle, before error plateaus at $\nabla f(x^*)=0$ 3. Iterative reduction in $\nabla f(x^*)=0$ 4 (difference length) sharpens final accuracy.
Nested Minimized Function (Implicit Surface): For $\nabla f(x^*)=0$ 5, the saddle is robustly located despite the function being defined only implicitly via $\nabla f(x^*)=0$ 6-minimizations.
High-Dimensional Modified Rosenbrock (e.g., $\nabla f(x^*)=0$ 7, $\nabla f(x^*)=0$ 8): Eigenvector recovery and saddle location remain efficient and accurate, provided the Hessian-vector estimator is deployed (full-H estimators become unstable due to $\nabla f(x^*)=0$ 9 variance in high dimensions; the Hessian-vector form maintains $\nabla^2 f(x^*)$ 0 variance).
Linear Neural Network Loss (degenerate saddles, $\nabla^2 f(x^*)$ 1): The same framework successfully locates saddles of arbitrarily high Morse index, even in the presence of degeneracy (zero Hessian eigenvalues), as long as the unstable subspace is well separated.

A summary table appears below.

Test Case	Dimensionality / Index	Observed Convergence
Müller–Brown potential	$\nabla^2 f(x^)$ 2, $\nabla^2 f(x^)$ 3	Linear decay to $\nabla^2 f(x^*)$ 4 plateau
Modified Rosenbrock	$\nabla^2 f(x^)$ 5, $\nabla^2 f(x^)$ 6	Two-phase: fast-steep/slow-flat, matches theory
Neural net loss	$\nabla^2 f(x^)$ 7, $\nabla^2 f(x^)$ 8, $\nabla^2 f(x^*)$ 9	Linear decay; plateau at $k$ 0
Implicit surface	$k$ 1, $k$ 2	Linear, unbiased, robust

All data are from (Du et al., 6 Jan 2026).

The derivative-free linear-convergence architecture described in (Du et al., 6 Jan 2026) builds upon, and is complementary to:

Stochastic Surface-Walking (with unbiased gradient/Hessian estimates): The algorithm in (Shi et al., 15 Oct 2025) follows a similar nested reflection-eigenvector paradigm but uses stochastic or mini-batch first-order/proxy gradients, achieving almost sure local convergence with $k$ 3 error decay. Both approaches exploit stochastic approximation theory and spectral gap assumptions.
Model-Free Shrinking-Dimer: Surrogate-based dimer dynamics (Gaussian process force surrogates plus trust-region active sampling) attain robust high-index saddle search with orders-of-magnitude reduction in true force evaluations when forces are accessible but expensive (Zhang et al., 2022). The principal difference is reliance on a learned and actively refined surrogate model for the gradient, with similar overall dynamics but distinct statistical error and robustness properties.
Derivative-Free Level-Set and Minimax Frameworks: Level-set-based (Pang, 2010) and minimization-oracle-based (Akimoto, 2021) methods exploit geometric/topological features to bracket and refine saddle points without any gradient or Hessian computation, providing rigorous superlinear convergence in certain regimes.

This places the derivative-free, linear-convergence saddle-search algorithm within a broader landscape of stochastic approximation and sampling-based critical-point discovery schemes.

7. Limitations, Practical Considerations, and Future Prospects

The effectiveness of derivative-free saddle-search algorithms depends critically on:

Spectral gap at the saddle: The gap between the $k$ 4-th negative and $k$ 5-th nonnegative Hessian eigenvalues governs both the contraction rate and the subspace identification accuracy.
Variance of stochastic estimators: High-dimensional problems require Hessian-vector estimators (not full Hessians) to maintain stable variance.
Parameter tuning: Both step size and smoothing parameter ( $k$ 6) must be carefully balanced to avoid bias, arrest early plateauing, and match theoretical linear contraction.
Initialization: Convergence to a saddle point with the prescribed Morse index generally requires initialization within its basin of attraction.

A plausible implication is that further acceleration, robustness to ill-conditioning, and more scalable zeroth-order saddle-search may be achievable through adaptive variance reduction in estimator draws, multi-fidelity hybridization with surrogate models, or leveraging parallel architectures for expensive black-box evaluations.

References:

(Du et al., 6 Jan 2026): "A Derivative-Free Saddle-search Algorithm With Linear Convergence Rate" (Shi et al., 15 Oct 2025): "A Stochastic Algorithm for Searching Saddle Points with Convergence Guarantee" (Zhang et al., 2022): "A model-free shrinking-dimer saddle dynamics for finding saddle point and solution landscape" (Pang, 2010): "Level set methods for finding saddle points of general Morse index" (Akimoto, 2021): "Saddle Point Optimization with Approximate Minimization Oracle"