Stochastic Variance-Reduced FBHF Splitting
- The paper introduces a stochastic variance-reduced algorithm that generalizes FBHF splitting, offering strong convergence guarantees under flexible structural assumptions.
- It leverages the finite-sum structure by using stochastic estimators and reference updates to reduce per-iteration complexity from O(N) to constant time.
- Empirical results demonstrate linear convergence and significant reductions in CPU time compared to deterministic methods in large-scale optimization.
The stochastic variance-reduced forward-backward-half forward splitting (VRFBHF) algorithm is a state-of-the-art operator splitting scheme for solving structured monotone inclusion problems in Hilbert spaces. It addresses inclusions where the sum consists of a maximally monotone operator, a maximally monotone and Lipschitz continuous operator (typically with finite-sum structure), and a cocoercive operator. By integrating variance-reduced stochastic updates within the forward-backward-half forward (FBHF) framework, VRFBHF generalizes and improves upon classic deterministic and stochastic splitting methods, providing strong convergence guarantees under flexible structural assumptions while achieving substantial computational benefits in large-scale settings.
1. Structured Monotone Inclusions and Algorithmic Framework
The prototypical problem solved by VRFBHF is to find satisfying
with
- maximally monotone (possibly set-valued),
- single-valued, -Lipschitz, maximally monotone, and of finite-sum form , each -Lipschitz,
- -cocoercive.
This structure encompasses a diverse range of applications, including constrained convex optimization, finite-sum composite minimization, and structured variational inequalities where constraints and penalties naturally admit a splitting.
The stochastic VRFBHF algorithm maintains two coupled sequences and and employs stochastic variance-reduced estimators to access without computing the full sum at each iteration: is the resolvent: . denotes a stochastic estimator of constructed via either uniform or importance sampling, and tunes the extrapolation.
2. Variance Reduction Mechanism and Stochastic Iterates
Variance reduction is achieved by replacing with that leverages the finite-sum structure of . At each iteration, a coordinate or mini-batch is sampled (distribution ), and only the selected components are computed. This reduces per-iteration complexity from to potentially constant time, a key advantage for large .
The sequence acts as a stochastically updated reference point (anchor), updated to with probability , otherwise kept fixed. This mechanism, akin to "loopless" SVRG, allows VRFBHF to maintain unbiased estimates and adapt variance dynamically, while extrapolation via improves stability and convergence.
3. Convergence Analysis and Lyapunov Framework
Convergence guarantees for VRFBHF are established under mild operator assumptions:
- maximally monotone,
- monotone, single-valued, -Lipschitz with unbiased stochastic estimator of controlled variance,
- -cocoercive,
- Existence of a solution: nonempty.
A Lyapunov function is introduced to measure the combined progress of current and reference iterates: for any fixed solution . The expected decrease property
ensures that the sequence converges almost surely (weakly) to a solution. This supermartingale property forms the backbone of the convergence proof.
When either or is strongly monotone (with modulus ), a contraction in expectation is established: for an explicit depending on monotonicity and algorithmic parameters, implying exponential decay (linear rate) in mean squared error.
4. Numerical Performance and Empirical Observations
Comprehensive experiments on constrained convex optimization (monotone inclusions with linear/nonlinear constraints) empirically validate VRFBHF. Compared to deterministic FBHF-type algorithms, VRFBHF achieves:
- Significantly reduced computation time and number of iterations, especially as or increases,
- Equal or superior error decay (measured via duality gap or distance to solution),
- Robustness to the selection of reference update probability ; optimal convergence is empirically achieved for moderate (e.g., ).
Empirical results confirm that stochastic updates maintain convergence while dramatically lowering per-iteration cost, a critical feature for large-scale optimization.
| Problem Size () | Deterministic FBHF: CPU/Iterations | VRFBHF: CPU/Iterations |
|---|---|---|
| Small | Higher | Lower |
| Large | Substantially higher | Significantly lower |
(Figures/tables in the original work substantiate these comparisons.)
5. Generalization and Connections to Other Splitting Methods
VRFBHF subsumes classical forward-backward and forward-backward-half forward methods and extends the applicability of stochastic splitting beyond settings requiring all operators to have finite-sum structures. The algorithmic template allows variance reduction without averaging, preserving properties such as sparsity of iterates, and retains strong theoretical guarantees for stochastic operator splitting. VRFBHF also improves upon variance-reduced extragradient, stochastic FBF, and other splitting approaches by marrying lower complexity with general convergence theory (almost sure and linear rates).
The Lyapunov-based analysis, reference point randomization, and handling of finite-sum/coordinate structure are shared motifs with recently developed accelerated and momentum-based splitting schemes, but VRFBHF provides a concretely implementable, parameter-robust, and highly scalable alternative.
6. Implications and Applicability
VRFBHF enables practical solution of large-scale monotone inclusions and variational inequalities in settings where the cost of summing all operator components is prohibitive. It offers a unifying principle for stochastic and variance-reduced splitting algorithms, with theoretical assurances of convergence and real-world efficacy supported by empirical performance, and is particularly effective in machine learning and signal processing applications with structured regularization or complex constraints.
The approach is broadly compatible with recent trends in operator splitting—including nonlinear kernel corrections and momentum—but demonstrates that much of the practical benefit can be achieved with a simple stochastic, variance-reduced randomization mechanism, without sacrificing provable convergence.