Papers
Topics
Authors
Recent
Search
2000 character limit reached

Prob-SARAH: Loopless Stochastic Variance Reduction

Updated 28 January 2026
  • Prob-SARAH is a loopless, variance-reduced algorithm for finite-sum optimization that addresses both convex and nonconvex objectives through stochastic recursive gradient estimators.
  • Its randomized restart mechanism eliminates the need for nested loops, simplifies implementation, and achieves optimal complexity in expectation and with high probability.
  • Empirical evaluations show that Prob-SARAH effectively controls gradient variance, leading to improved performance in logistic regression and neural network training tasks.

Prob-SARAH is a family of stochastic recursive variance-reduced algorithms for finite-sum optimization, designed to address both convex and nonconvex objectives. It generalizes the SARAH method by deploying a randomized "loopless" architecture, often referred to as Loopless SARAH (L2S), and, more recently, by providing probabilistic guarantees on its stochastic recursive gradient estimators. Prob-SARAH achieves optimal complexity in both expectation and high probability, matching or improving upon previous results for finite-sum problems in both theory and empirical performance (Li et al., 2019, &&&1&&&).

1. Formulation and Algorithmic Principles

Given an objective of finite-sum form,

minxXRd  f(x)  =  1ni=1nfi(x)\min_{x\in\mathcal{X}\subseteq\mathbb{R}^d}\;f(x)\;=\;\frac1n\sum_{i=1}^n f_i(x)

with each fif_i assumed to be LL-smooth (and potentially nonconvex), Prob-SARAH targets finding approximate stationary points: f(x)ε.\|\nabla f(x)\|\leq \varepsilon. Unlike classic SARAH, which uses a double-loop structure, Prob-SARAH/Loopless SARAH replaces the double loop by a single loop where, at each iteration tt, the algorithm probabilistically "restarts" (computes a full gradient with probability p=1/mp=1/m for a positive integer mm) or applies a SARAH-style recursive update otherwise.

The canonical (loopless) algorithm proceeds:

  • At each iteration tt:

    • With probability $1/m$: set vt=F(xt)\mathbf{v}_t = \nabla F(\mathbf{x}_t).
    • Otherwise: sample iti_t uniformly and set

    vt=fit(xt)fit(xt1)+vt1.\mathbf v_t = \nabla f_{i_t}(\mathbf x_t) - \nabla f_{i_t}(\mathbf x_{t-1}) + \mathbf v_{t-1}. - Update the iterate:

    xt+1=xtηvt.\mathbf x_{t+1} = \mathbf x_t - \eta \mathbf v_t.

  • Output a randomly selected iterate from the trajectory.

Prob-SARAH maintains a biased but tightly controlled estimator; the mean-squared error of the gradient estimator decays exponentially between restarts, which is pivotal for convergence analysis (Li et al., 2019).

2. Probabilistic Guarantees and High-Probability Complexity

Prob-SARAH extends classic in-expectation analysis to high-probability statements. The high-probability regime is motivated by the need for single-run performance guarantees, particularly for robust optimization.

A new dimension-free Azuma–Hoeffding inequality for vector-valued martingales with random individual norm bounds enables these high-probability results. For a martingale-difference sequence {Dk}\{D_k\} with norm bounds {rk}\{r_k\}, the following holds (with probability at least 1δ1 - \delta): k=1tDk29max{k=1trk2,b}(log2δ+loglogBb)\left\|\sum_{k=1}^t D_k\right\|^2 \leq 9\max\Bigl\{\sum_{k=1}^t r_k^2,\, b\Bigr\} \left(\log\frac{2}{\delta} + \log\log\frac{B}{b}\right) for any 1tK1 \le t \le K, except possibly on rare large-deviation trajectories (Zhong et al., 2024).

By adapting the recursive estimator and employing parameter schedules tied to statistical confidence, Prob-SARAH achieves: Pr{f(x^)ε}  1δ\Pr\{\|\nabla f(\hat x)\|\le\varepsilon\}\;\ge 1 - \delta with total stochastic-gradient complexity

O~(1ε3nε2)\widetilde O\left(\tfrac1{\varepsilon^3} \wedge \tfrac{\sqrt n}{\varepsilon^2}\right)

where logarithmic factors in δ\delta and problem parameters are suppressed.

3. Convergence, Step Size Regimes, and Complexity Bounds

Strongly Convex Objective

If FF is μ\mu-strongly convex, Prob-SARAH achieves linear convergence up to constants. Selecting m=Θ(κ2)m = \Theta(\kappa^2), where κ=L/μ\kappa = L/\mu, yields information-theoretic optimality: O((n+κ2)ln(1/ϵ))\mathcal{O}\left((n+\kappa^2)\ln(1/\epsilon)\right) gradient evaluations. If each fif_i is individually strongly convex, m=Θ(κ)m = \Theta(\kappa) suffices (Li et al., 2019).

Convex Objective

  • With an nn-independent step size η=Θ(1/L)\eta = \Theta(1/L):

O(n+n/ϵ)\mathcal{O}(n+n/\epsilon) complexity with m=Θ(n)m = \Theta(\sqrt{n}) or m=Θ(n)m = \Theta(n).

  • With an nn-dependent step size η=O(1/(Ln))\eta = \mathcal{O}(1/(L\sqrt{n})):

O(n+n/ϵ)\mathcal{O}(n+\sqrt{n}/\epsilon) complexity. This regime is preferable when nn is large.

Original SARAH required additional non-divergence assumptions for nn-independent step sizes, which are not necessary for Prob-SARAH (Li et al., 2019).

Nonconvex Objective

Prob-SARAH matches the best known in-expectation bounds: O(n+n/ϵ)\mathcal{O}(n+\sqrt{n}/\epsilon) gradient computations to find an ϵ\epsilon-stationary point. In high probability, up to logarithmic factors,

O~(1ε3nε2)\widetilde O\left(\frac{1}{\varepsilon^3} \wedge \frac{\sqrt{n}}{\varepsilon^2}\right)

stochastic gradients are required (Zhong et al., 2024).

Feature SARAH Prob-SARAH (Loopless/L2S)
Loops Double Single, stochastic restarts
Step size (convex) O(1/(Ln))\mathcal{O}(1/(L\sqrt{n})) nn-independent Θ(1/L)\Theta(1/L) allowed
Non-divergence assumption Needed Not needed
Complexity (nonconvex) O(n+n/ε)\mathcal{O}(n+\sqrt n/\varepsilon) Same, with high-probability bounds
Gradient estimator variance Lower, fixed restarts Higher, randomized restarts
Generalization (empirical) Standard Often superior due to noise injection

Prob-SARAH achieves algorithmic simplicity—eschewing nested loops in favor of Bernoulli-scheduled restarts. It permits larger step sizes in convex settings without additional assumptions. Empirical evidence shows that the variance-increasing effect of stochastic restarts enables better escape from sharp local minimizers in deep learning tasks, often improving test accuracy relative to SARAH (Li et al., 2019).

5. Empirical Evaluation and Applications

Prob-SARAH demonstrates strong empirical performance on both classic and modern machine learning problems. On logistic regression with nonconvex regularization (LIBSVM datasets: mushrooms/ijcnn1/w7a), Prob-SARAH attains lower high-quantiles of gradient-norm squared compared to SGD, SVRG, and SCSG, indicating superior probabilistic control over stationarity (Zhong et al., 2024). For training a two-layer neural network with GELU activations on MNIST, the method achieves the best probabilistic control of gradient-norm in early epochs and yields competitive validation accuracy, while baselines such as SVRG are prone to poor local minima.

These experiments validate the practical value of the high-probability theoretical guarantees: for users demanding risk control on gradient norms in individual runs, Prob-SARAH not only matches in-expectation results but typically offers improved reliability.

6. Technical Advances and Theoretical Significance

The core technical innovation enabling Prob-SARAH's high-probability guarantees is a new dimension-free Azuma–Hoeffding inequality for martingales with random norm-bounds. This analytic tool facilitates tight, sample-dependent error control in recursive variance-reduced gradient estimators (Zhong et al., 2024). The loopless structure of Prob-SARAH leads to an exponentially decaying memory effect, as reflected in convergence proofs, and eliminates the need for problem-specific outer-loop scheduling.

Prob-SARAH unifies and improves upon techniques from SARAH [Nguyen et al. 2017], SCSG [Lei & Jordan 2017], and recent loopless variants [Kovalev et al. 2019], demonstrating that stochastic recursion with randomized restarts attains state-of-the-art complexity across convex, strongly convex, and nonconvex optimization with robust probabilistic guarantees (Li et al., 2019, Zhong et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Prob-SARAH.