Prob-SARAH: Loopless Stochastic Variance Reduction

Updated 28 January 2026

Prob-SARAH is a loopless, variance-reduced algorithm for finite-sum optimization that addresses both convex and nonconvex objectives through stochastic recursive gradient estimators.
Its randomized restart mechanism eliminates the need for nested loops, simplifies implementation, and achieves optimal complexity in expectation and with high probability.
Empirical evaluations show that Prob-SARAH effectively controls gradient variance, leading to improved performance in logistic regression and neural network training tasks.

Prob-SARAH is a family of stochastic recursive variance-reduced algorithms for finite-sum optimization, designed to address both convex and nonconvex objectives. It generalizes the SARAH method by deploying a randomized "loopless" architecture, often referred to as Loopless SARAH (L2S), and, more recently, by providing probabilistic guarantees on its stochastic recursive gradient estimators. Prob-SARAH achieves optimal complexity in both expectation and high probability, matching or improving upon previous results for finite-sum problems in both theory and empirical performance (Li et al., 2019, &&&1&&&).

1. Formulation and Algorithmic Principles

Given an objective of finite-sum form,

$\min_{x\in\mathcal{X}\subseteq\mathbb{R}^d}\;f(x)\;=\;\frac1n\sum_{i=1}^n f_i(x)$

with each $f_i$ assumed to be $L$ -smooth (and potentially nonconvex), Prob-SARAH targets finding approximate stationary points: $\|\nabla f(x)\|\leq \varepsilon.$ Unlike classic SARAH, which uses a double-loop structure, Prob-SARAH/Loopless SARAH replaces the double loop by a single loop where, at each iteration $t$ , the algorithm probabilistically "restarts" (computes a full gradient with probability $p=1/m$ for a positive integer $m$ ) or applies a SARAH-style recursive update otherwise.

The canonical (loopless) algorithm proceeds:

At each iteration $t$ $t$ :
- With probability $1/m$: set $\mathbf{v}_t = \nabla F(\mathbf{x}_t)$ .
- Otherwise: sample $i_t$ uniformly and set
$\mathbf v_t = \nabla f_{i_t}(\mathbf x_t) - \nabla f_{i_t}(\mathbf x_{t-1}) + \mathbf v_{t-1}.$ - Update the iterate:

$\mathbf x_{t+1} = \mathbf x_t - \eta \mathbf v_t.$
Output a randomly selected iterate from the trajectory.

Prob-SARAH maintains a biased but tightly controlled estimator; the mean-squared error of the gradient estimator decays exponentially between restarts, which is pivotal for convergence analysis (Li et al., 2019).

2. Probabilistic Guarantees and High-Probability Complexity

Prob-SARAH extends classic in-expectation analysis to high-probability statements. The high-probability regime is motivated by the need for single-run performance guarantees, particularly for robust optimization.

A new dimension-free Azuma–Hoeffding inequality for vector-valued martingales with random individual norm bounds enables these high-probability results. For a martingale-difference sequence $\{D_k\}$ with norm bounds $\{r_k\}$ , the following holds (with probability at least $1 - \delta$ ): $\left\|\sum_{k=1}^t D_k\right\|^2 \leq 9\max\Bigl\{\sum_{k=1}^t r_k^2,\, b\Bigr\} \left(\log\frac{2}{\delta} + \log\log\frac{B}{b}\right)$ for any $1 \le t \le K$ , except possibly on rare large-deviation trajectories (Zhong et al., 2024).

By adapting the recursive estimator and employing parameter schedules tied to statistical confidence, Prob-SARAH achieves: $\Pr\{\|\nabla f(\hat x)\|\le\varepsilon\}\;\ge 1 - \delta$ with total stochastic-gradient complexity

$\widetilde O\left(\tfrac1{\varepsilon^3} \wedge \tfrac{\sqrt n}{\varepsilon^2}\right)$

where logarithmic factors in $\delta$ and problem parameters are suppressed.

3. Convergence, Step Size Regimes, and Complexity Bounds

Strongly Convex Objective

If $F$ is $\mu$ -strongly convex, Prob-SARAH achieves linear convergence up to constants. Selecting $m = \Theta(\kappa^2)$ , where $\kappa = L/\mu$ , yields information-theoretic optimality: $\mathcal{O}\left((n+\kappa^2)\ln(1/\epsilon)\right)$ gradient evaluations. If each $f_i$ is individually strongly convex, $m = \Theta(\kappa)$ suffices (Li et al., 2019).

Convex Objective

With an $n$ -independent step size $\eta = \Theta(1/L)$ :

$\mathcal{O}(n+n/\epsilon)$ complexity with $m = \Theta(\sqrt{n})$ or $m = \Theta(n)$ .

With an $n$ -dependent step size $\eta = \mathcal{O}(1/(L\sqrt{n}))$ :

$\mathcal{O}(n+\sqrt{n}/\epsilon)$ complexity. This regime is preferable when $n$ is large.

Original SARAH required additional non-divergence assumptions for $n$ -independent step sizes, which are not necessary for Prob-SARAH (Li et al., 2019).

Nonconvex Objective

Prob-SARAH matches the best known in-expectation bounds: $\mathcal{O}(n+\sqrt{n}/\epsilon)$ gradient computations to find an $\epsilon$ -stationary point. In high probability, up to logarithmic factors,

$\widetilde O\left(\frac{1}{\varepsilon^3} \wedge \frac{\sqrt{n}}{\varepsilon^2}\right)$

stochastic gradients are required (Zhong et al., 2024).

Feature	SARAH	Prob-SARAH (Loopless/L2S)
Loops	Double	Single, stochastic restarts
Step size (convex)	$\mathcal{O}(1/(L\sqrt{n}))$	$n$ -independent $\Theta(1/L)$ allowed
Non-divergence assumption	Needed	Not needed
Complexity (nonconvex)	$\mathcal{O}(n+\sqrt n/\varepsilon)$	Same, with high-probability bounds
Gradient estimator variance	Lower, fixed restarts	Higher, randomized restarts
Generalization (empirical)	Standard	Often superior due to noise injection

Prob-SARAH achieves algorithmic simplicity—eschewing nested loops in favor of Bernoulli-scheduled restarts. It permits larger step sizes in convex settings without additional assumptions. Empirical evidence shows that the variance-increasing effect of stochastic restarts enables better escape from sharp local minimizers in deep learning tasks, often improving test accuracy relative to SARAH (Li et al., 2019).

5. Empirical Evaluation and Applications

Prob-SARAH demonstrates strong empirical performance on both classic and modern machine learning problems. On logistic regression with nonconvex regularization (LIBSVM datasets: mushrooms/ijcnn1/w7a), Prob-SARAH attains lower high-quantiles of gradient-norm squared compared to SGD, SVRG, and SCSG, indicating superior probabilistic control over stationarity (Zhong et al., 2024). For training a two-layer neural network with GELU activations on MNIST, the method achieves the best probabilistic control of gradient-norm in early epochs and yields competitive validation accuracy, while baselines such as SVRG are prone to poor local minima.

These experiments validate the practical value of the high-probability theoretical guarantees: for users demanding risk control on gradient norms in individual runs, Prob-SARAH not only matches in-expectation results but typically offers improved reliability.

6. Technical Advances and Theoretical Significance

The core technical innovation enabling Prob-SARAH's high-probability guarantees is a new dimension-free Azuma–Hoeffding inequality for martingales with random norm-bounds. This analytic tool facilitates tight, sample-dependent error control in recursive variance-reduced gradient estimators (Zhong et al., 2024). The loopless structure of Prob-SARAH leads to an exponentially decaying memory effect, as reflected in convergence proofs, and eliminates the need for problem-specific outer-loop scheduling.

Prob-SARAH unifies and improves upon techniques from SARAH [Nguyen et al. 2017], SCSG [Lei & Jordan 2017], and recent loopless variants [Kovalev et al. 2019], demonstrating that stochastic recursion with randomized restarts attains state-of-the-art complexity across convex, strongly convex, and nonconvex optimization with robust probabilistic guarantees (Li et al., 2019, Zhong et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

On the Convergence of SARAH and Beyond (2019)

Probabilistic Guarantees of Stochastic Recursive Gradient in Non-Convex Finite Sum Problems (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Prob-SARAH.