Papers
Topics
Authors
Recent
Search
2000 character limit reached

Derivative-Free Saddle-Search Algorithm

Updated 14 January 2026
  • The algorithm finds index‑k saddle points using only function evaluations, bypassing the need for gradients or Hessians in high-dimensional settings.
  • It employs stochastic approximations and Monte Carlo estimators to recover unstable eigenspaces and drive a two-loop iterative process for saddle-point discovery.
  • The method achieves provable linear convergence under proper parameter tuning, making it highly effective for black-box simulations and complex optimization landscapes.

A derivative-free saddle-search algorithm refers to any computational scheme that finds index-kk saddle points of a smooth objective f:RdRf:\mathbb{R}^d\to\mathbb{R} (i.e., stationary points with exactly kk unstable directions) using only function evaluations, not explicit gradients or Hessians. Such approaches are particularly valuable for high-dimensional physical landscapes, black-box simulation models, or machine learning loss surfaces where derivatives are unavailable or unreliable. Recent algorithms combine stochastic approximation, Monte Carlo estimators, and nested iterative architectures to achieve provable convergence and practical efficiency in saddle-point discovery while maintaining strictly zeroth-order (function-value-evaluation only) operations (Du et al., 6 Jan 2026).

1. Formulation and Notational Framework

Let f:RdRf:\mathbb{R}^d\to\mathbb{R} be C6C^6-smooth. An index-kk saddle xx^* satisfies f(x)=0\nabla f(x^*)=0 and 2f(x)\nabla^2 f(x^*) has precisely kk negative eigenvalues. In chemical physics, the index-f:RdRf:\mathbb{R}^d\to\mathbb{R}0 case is the prototypical transition state; higher index saddles characterize complex transition bottlenecks.

Classical (gradient-based) saddle-search methods employ the so-called surface-walking ODE:

f:RdRf:\mathbb{R}^d\to\mathbb{R}1

where f:RdRf:\mathbb{R}^d\to\mathbb{R}2 spans the unstable subspace. The derivative-free challenge is to implement the analog of this dynamics using only function queries f:RdRf:\mathbb{R}^d\to\mathbb{R}3, without access to f:RdRf:\mathbb{R}^d\to\mathbb{R}4 or f:RdRf:\mathbb{R}^d\to\mathbb{R}5.

2. Zeroth-Order Gradient and Hessian Estimators

Fundamental to the algorithm are high-precision zeroth-order estimators for the gradient and Hessian-vector products, constructed via Monte Carlo techniques:

  • Gradient Estimator: For f:RdRf:\mathbb{R}^d\to\mathbb{R}6 and f:RdRf:\mathbb{R}^d\to\mathbb{R}7,

f:RdRf:\mathbb{R}^d\to\mathbb{R}8

This estimator is unbiased for the gradient of the Gaussian-convolved f:RdRf:\mathbb{R}^d\to\mathbb{R}9:

kk0

The variance satisfies kk1.

  • Hessian-Vector Estimator: Using the gradient estimator,

kk2

This approximates kk3. For any fixed kk4, kk5 is unbiased for the Hessian of kk6 applied to kk7, with variance kk8 for kk9.

This strict zeroth-order character is essential for problems where only black-box evaluations are possible.

3. Nested Two-Loop Algorithm: Architecture, Steps, and Parameters

The method operates in a double-loop structure:

Inner Loop: Stochastic Unstable Eigenspace Recovery

  • For each f:RdRf:\mathbb{R}^d\to\mathbb{R}0 (the index), solve the Rayleigh-quotient minimization

f:RdRf:\mathbb{R}^d\to\mathbb{R}1

via power/lanczos-type stochastic iteration:

f:RdRf:\mathbb{R}^d\to\mathbb{R}2

where f:RdRf:\mathbb{R}^d\to\mathbb{R}3 and f:RdRf:\mathbb{R}^d\to\mathbb{R}4 is a decaying step size. Convergence to the f:RdRf:\mathbb{R}^d\to\mathbb{R}5-th smallest eigenvector is certified by norm reduction in the residual.

Outer Loop: Surface-Walking Saddle Search

  • Given f:RdRf:\mathbb{R}^d\to\mathbb{R}6 and the matrix f:RdRf:\mathbb{R}^d\to\mathbb{R}7, draw a random direction f:RdRf:\mathbb{R}^d\to\mathbb{R}8 and estimate the gradient as above.
  • Update:

f:RdRf:\mathbb{R}^d\to\mathbb{R}9

with either a decaying or constant step size C6C^60. The sign flip in the unstable subspace C6C^61 reflects the “gentlest ascent” prescription for driving C6C^62 toward an index-C6C^63 transition state.

  • Stopping Criteria: Inner: residual norm of eigenvector step C6C^64; Outer: C6C^65 or plateauing error.

Parameter Schedules and Implementation:

  • Decay choices for C6C^66 and difference length C6C^67 ensure bias/variance is balanced.
  • Practical cost per iteration is C6C^68 function-evaluations (outer plus C6C^69-eigenvector recovery, each via kk0 stochastic iterations).

4. Theoretical Analysis: Convergence Guarantees and Complexity

Key Assumptions (A.1–A.7):

  • kk1-smooth kk2, uniform Hessian spectral gap near kk3, Lipschitz continuity of kk4.
  • Sampling in kk5 is i.i.d. standard normal.
  • Subspace estimation error kk6 is controlled so that kk7 for Hessian eigen-gap kk8.

Inner Loop: By Robbins–Siegmund stochastic approximation, the eigenvector iteration converges almost surely to the desired unstable subspace of kk9. The Davis–Kahan theorem yields that the subspace error to the true Hessian is xx^*0.

Outer Loop (Decaying Step): With decaying xx^*1 and xx^*2, error converges to zero almost surely if the initial condition is within a domain of attraction.

Outer Loop (Constant Step/Linear Rate):

  • For constant step sizes satisfying strict bounds, conditional mean-square error decays linearly:

xx^*3

where

xx^*4

The error exhibits a two-stage regime: initial linear contraction followed by a plateau of size xx^*5 dominated by estimator variance.

  • Complexity: To reach given tolerance in subspace estimation (xx^*6), xx^*7 inner iterations are required. Outer convergence to xx^*8 neighborhood takes xx^*9 steps, so the total query count is f(x)=0\nabla f(x^*)=00. For practical purposes, choosing f(x)=0\nabla f(x^*)=01 optimally matches subspace noise to estimator bias.

5. Application Examples and Representative Benchmarks

Empirical tests validate both the theory and versatility:

  • Müller–Brown Potential (2D, Index-1): The algorithm accurately tracks the transition state with linear (f(x)=0\nabla f(x^*)=02) contraction to the saddle, before error plateaus at f(x)=0\nabla f(x^*)=03. Iterative reduction in f(x)=0\nabla f(x^*)=04 (difference length) sharpens final accuracy.
  • Nested Minimized Function (Implicit Surface): For f(x)=0\nabla f(x^*)=05, the saddle is robustly located despite the function being defined only implicitly via f(x)=0\nabla f(x^*)=06-minimizations.
  • High-Dimensional Modified Rosenbrock (e.g., f(x)=0\nabla f(x^*)=07, f(x)=0\nabla f(x^*)=08): Eigenvector recovery and saddle location remain efficient and accurate, provided the Hessian-vector estimator is deployed (full-H estimators become unstable due to f(x)=0\nabla f(x^*)=09 variance in high dimensions; the Hessian-vector form maintains 2f(x)\nabla^2 f(x^*)0 variance).
  • Linear Neural Network Loss (degenerate saddles, 2f(x)\nabla^2 f(x^*)1): The same framework successfully locates saddles of arbitrarily high Morse index, even in the presence of degeneracy (zero Hessian eigenvalues), as long as the unstable subspace is well separated.

A summary table appears below.

Test Case Dimensionality / Index Observed Convergence
Müller–Brown potential 2f(x)\nabla^2 f(x^*)2, 2f(x)\nabla^2 f(x^*)3 Linear decay to 2f(x)\nabla^2 f(x^*)4 plateau
Modified Rosenbrock 2f(x)\nabla^2 f(x^*)5, 2f(x)\nabla^2 f(x^*)6 Two-phase: fast-steep/slow-flat, matches theory
Neural net loss 2f(x)\nabla^2 f(x^*)7, 2f(x)\nabla^2 f(x^*)8, 2f(x)\nabla^2 f(x^*)9 Linear decay; plateau at kk0
Implicit surface kk1, kk2 Linear, unbiased, robust

All data are from (Du et al., 6 Jan 2026).

The derivative-free linear-convergence architecture described in (Du et al., 6 Jan 2026) builds upon, and is complementary to:

  • Stochastic Surface-Walking (with unbiased gradient/Hessian estimates): The algorithm in (Shi et al., 15 Oct 2025) follows a similar nested reflection-eigenvector paradigm but uses stochastic or mini-batch first-order/proxy gradients, achieving almost sure local convergence with kk3 error decay. Both approaches exploit stochastic approximation theory and spectral gap assumptions.
  • Model-Free Shrinking-Dimer: Surrogate-based dimer dynamics (Gaussian process force surrogates plus trust-region active sampling) attain robust high-index saddle search with orders-of-magnitude reduction in true force evaluations when forces are accessible but expensive (Zhang et al., 2022). The principal difference is reliance on a learned and actively refined surrogate model for the gradient, with similar overall dynamics but distinct statistical error and robustness properties.
  • Derivative-Free Level-Set and Minimax Frameworks: Level-set-based (Pang, 2010) and minimization-oracle-based (Akimoto, 2021) methods exploit geometric/topological features to bracket and refine saddle points without any gradient or Hessian computation, providing rigorous superlinear convergence in certain regimes.

This places the derivative-free, linear-convergence saddle-search algorithm within a broader landscape of stochastic approximation and sampling-based critical-point discovery schemes.

7. Limitations, Practical Considerations, and Future Prospects

The effectiveness of derivative-free saddle-search algorithms depends critically on:

  • Spectral gap at the saddle: The gap between the kk4-th negative and kk5-th nonnegative Hessian eigenvalues governs both the contraction rate and the subspace identification accuracy.
  • Variance of stochastic estimators: High-dimensional problems require Hessian-vector estimators (not full Hessians) to maintain stable variance.
  • Parameter tuning: Both step size and smoothing parameter (kk6) must be carefully balanced to avoid bias, arrest early plateauing, and match theoretical linear contraction.
  • Initialization: Convergence to a saddle point with the prescribed Morse index generally requires initialization within its basin of attraction.

A plausible implication is that further acceleration, robustness to ill-conditioning, and more scalable zeroth-order saddle-search may be achievable through adaptive variance reduction in estimator draws, multi-fidelity hybridization with surrogate models, or leveraging parallel architectures for expensive black-box evaluations.

References:

(Du et al., 6 Jan 2026): "A Derivative-Free Saddle-search Algorithm With Linear Convergence Rate" (Shi et al., 15 Oct 2025): "A Stochastic Algorithm for Searching Saddle Points with Convergence Guarantee" (Zhang et al., 2022): "A model-free shrinking-dimer saddle dynamics for finding saddle point and solution landscape" (Pang, 2010): "Level set methods for finding saddle points of general Morse index" (Akimoto, 2021): "Saddle Point Optimization with Approximate Minimization Oracle"

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Derivative-Free Saddle-Search Algorithm.