Papers
Topics
Authors
Recent
Search
2000 character limit reached

Stochastic ADMM: Algorithms & Convergence

Updated 1 May 2026
  • Stochastic ADMM is a family of first-order optimization algorithms that incorporates stochastic approximations to solve large-scale, linearly constrained problems.
  • It extends classical ADMM by integrating variance reduction, proximal updates, and adaptive schemes to enhance convergence and scalability in applications like machine learning and signal processing.
  • The approach offers rigorous convergence guarantees for both convex and nonconvex objectives, with advanced variants supporting continuous-time analyses and robust distributed implementations.

Stochastic Alternating Direction Method of Multipliers

Stochastic Alternating Direction Method of Multipliers (Stochastic ADMM) refers to a family of first-order optimization algorithms for solving linearly constrained problems in which some or all components of the objective function are given in stochastic (data-driven or sample-based) form. These methods extend classical ADMM to accommodate large-scale, nonsmooth, and nonconvex objectives by leveraging stochastic approximations, variance reduction, and, in recent years, continuous-time analysis. Stochastic ADMM is foundational in distributed, online, and robust machine learning as well as in high-dimensional signal processing and control.

1. Problem Formulation and ADMM Fundamentals

Stochastic ADMM targets convex or nonconvex optimization problems with separable objective and linear constraints: minxX,yY Eξ[θ1(x,ξ)]+θ2(y)s.t.Ax+By=b\min_{x\in\mathcal{X},\,y\in\mathcal{Y}} \ \mathbb{E}_{\xi}[\theta_1(x, \xi)] + \theta_2(y) \quad \text{s.t.} \quad A x + B y = b where:

  • xRd1x\in\mathbb{R}^{d_1}, yRd2y\in\mathbb{R}^{d_2}; ARm×d1A\in\mathbb{R}^{m\times d_1}, BRm×d2B\in\mathbb{R}^{m\times d_2}, bRmb\in\mathbb{R}^m.
  • θ1(x,ξ)\theta_1(x, \xi) is a (possibly nonsmooth/nonconvex) instance-specific loss, θ2(y)\theta_2(y) is a separable regularizer.
  • X,Y\mathcal{X},\mathcal{Y} are closed, convex constraint sets; {ξk}\{\xi_k\} is a sequence of i.i.d. data samples.
  • The (augmented) Lagrangian is xRd1x\in\mathbb{R}^{d_1}0.

Classical (deterministic) ADMM alternates between minimization over xRd1x\in\mathbb{R}^{d_1}1 and xRd1x\in\mathbb{R}^{d_1}2 and a dual ascent. However, in stochastic and data-intensive regimes, replacing deterministic subproblems with stochastic approximations or variance-reduced updates is critical for computational scalability (Ouyang et al., 2012).

2. Core Algorithmic Schemes

Stochastic ADMM schemes retain the three-step outer iteration of ADMM but use random samples and often introduce proximal or linearized subproblems. Canonical stochastic ADMM (Ouyang et al., 2012) for convex objectives is:

  1. xRd1x\in\mathbb{R}^{d_1}3-update:

xRd1x\in\mathbb{R}^{d_1}4

  1. xRd1x\in\mathbb{R}^{d_1}5-update:

xRd1x\in\mathbb{R}^{d_1}6

  1. Dual-update:

xRd1x\in\mathbb{R}^{d_1}7

Step-size xRd1x\in\mathbb{R}^{d_1}8 is adapted based on problem regularity and the convergence regime (Ouyang et al., 2012).

Variants include:

  • Linearized/Proximal ADMM: adding Bregman-divergence or explicit quadratic regularizers (Zhao et al., 2013).
  • Mini-batch and variance-reduced ADMM: integrating control variates or snapshot-based estimators for improved convergence, including SVRG-ADMM, SAGA-ADMM, and SCAS-ADMM (Zhao et al., 2015, Zheng et al., 2016, Huang et al., 2016).
  • Adaptive stochastic ADMM: using time-varying or coordinatewise proximal matrices for per-coordinate adaptivity (Zhao et al., 2013).
  • Accelerated stochastic ADMM: Nesterov-type extrapolation and momentum incorporation for xRd1x\in\mathbb{R}^{d_1}9 non-ergodic rates (Fang et al., 2017).

Continuous-time formulations recast the stochastic ADMM iterates as weak approximations to stochastic differential equations (SDEs), shedding light on the role of over-relaxation, noise, and bias-variance trade-offs (Zhou et al., 2020, Li, 2024).

3. Theoretical Guarantees and Convergence Rates

Rigorous analysis requires assumptions on bounded second-moment or variance of the stochastic gradients, convexity or strong convexity of objective terms, and (optionally) smoothness or the Kurdyka-Łojasiewicz (KL) property for nonconvexity (Ouyang et al., 2012, Bian et al., 2020).

Key results:

  • For general convex objectives:

yRd2y\in\mathbb{R}^{d_2}0

where averages yRd2y\in\mathbb{R}^{d_2}1 are over yRd2y\in\mathbb{R}^{d_2}2 iterates (Ouyang et al., 2012).

  • For yRd2y\in\mathbb{R}^{d_2}3-strongly convex objectives:

yRd2y\in\mathbb{R}^{d_2}4

(Ouyang et al., 2012).

  • Variance-reduced and accelerated schemes (e.g., SA-ADMM, SCAS-ADMM, SVRG-ADMM):

yRd2y\in\mathbb{R}^{d_2}5

in both objective gap and feasibility violation, matching batch ADMM under similar regularity (see Table below) (Zhong et al., 2013, Zhao et al., 2015, Zheng et al., 2016, Fang et al., 2017).

Method Rate Memory cost
Batch ADMM O(1/T) O(lp + lq)
SA-ADMM O(1/T) O(np + lp + lq)
SCAS-ADMM O(1/T) O(lp + lq)
SVRG-ADMM O(1/T) O(d d̃)
ACC-SADMM O(1/T) non-ergodic O(d)

In nonconvex problems with variance reduction, yRd2y\in\mathbb{R}^{d_2}6 rates in expectation for stationary solutions are established under L-smoothness and bounded gradient assumptions (Huang et al., 2016, Huang et al., 2020).

Recent Hilbert-space extensions incorporate infinite-dimensional constraints (e.g., PDE-constrained optimal control), achieving nonergodic yRd2y\in\mathbb{R}^{d_2}7 convergence in the strongly convex case and yRd2y\in\mathbb{R}^{d_2}8 in the general convex by integrating Nesterov extrapolation and adaptive penalty schedules (Deng et al., 10 Mar 2026).

4. Advanced Variants and Extensions

Variance-Reduced and Accelerated Stochastic ADMM

Variance reduction, by maintaining history (SAG-ADMM, SAGA-ADMM) or snapshot-based control (SVRG-ADMM, SCAS-ADMM), permits yRd2y\in\mathbb{R}^{d_2}9 convergence in expectation and, with suitable acceleration (e.g., momentum, Nesterov extrapolation), can reach non-ergodic ARm×d1A\in\mathbb{R}^{m\times d_1}0 rates optimal for separable linearly constrained problems (Zheng et al., 2016, Fang et al., 2017). Accelerated stochastic ADMM achieves further improvements with optimal dependence on the smoothness constant for empirical risk minimization (Zhang et al., 2016).

Nonconvex and Nonsmooth Stochastic ADMM

For nonconvex objectives, recent research deploys variance-reduced estimators (SVRG, SAGA, SARAH, SPIDER) in the ADMM inner loop, ensuring global convergence under finite-sum or expectation-based objectives and sometimes requiring the KL property for global analysis (Bian et al., 2020, Huang et al., 2020). Under mild regularity, algorithms achieve ARm×d1A\in\mathbb{R}^{m\times d_1}1 complexity for ARm×d1A\in\mathbb{R}^{m\times d_1}2-stationarity (Huang et al., 2016).

Adaptive and Robust Versions

Adaptive stochastic ADMM generalizes the proximal term to per-coordinate Bregman divergences, closely related to AdaGrad, and can provably minimize the dual-norm regret term over time, especially beneficial in high-dimensional or ill-conditioned regimes (Zhao et al., 2013).

Distributed and byzantine-robust stochastic ADMM extends the formulation for multi-agent scenarios, adding consensus-form constraints and robustness penalties to manage untrusted or faulty nodes (Lin et al., 2021).

Continuous-Time and SME Theory

By interpreting stochastic ADMM iterates as discrete samples of an SDE (“stochastic modified equation”) (Zhou et al., 2020, Li, 2024), new insight emerges into the bias–variance trade-off, role of over-relaxation, and optimal stopping: for instance, under proper scaling, the ARm×d1A\in\mathbb{R}^{m\times d_1}3-trajectory of G-sADMM weakly converges to

ARm×d1A\in\mathbb{R}^{m\times d_1}4

where the matrix ARm×d1A\in\mathbb{R}^{m\times d_1}5 incorporates algorithmic parameters and underpins the bias-variance dynamics (Li, 2024).

5. Implementation Practices and Empirical Behavior

Pseudocode for basic stochastic ADMM is:

bRmb\in\mathbb{R}^m8 (Ouyang et al., 2012)

Variance-reduced, mini-batch, block-wise, and accelerated versions have increased per-iteration complexity, but exhibit superior empirical scaling and rate, especially on large-scale objectives (see, e.g., comparisons in (Zhao et al., 2015, Zheng et al., 2016, Zhao et al., 2013, Fang et al., 2017)). Empirical benchmarks consistently report:

  • Stochastic and variance-reduced ADMM methods outperform batch/deterministic ADMM in early and mid-stage optimization.
  • Storage cost is a critical consideration—SVRG/SCAS-type approaches with ARm×d1A\in\mathbb{R}^{m\times d_1}6 memory scale to large ARm×d1A\in\mathbb{R}^{m\times d_1}7, while SAG-style require ARm×d1A\in\mathbb{R}^{m\times d_1}8.
  • Implementation details such as penalty schedule, step-size tuning, and constraint over-relaxation may affect both speed and feasibility violation (Ouyang et al., 2012, Zhao et al., 2015, Li, 2024).

6. Applications and Extensions

Stochastic ADMM and its variants are foundational in:

Ongoing advances involve multi-block extensions, adaptive or dynamic constraint penalty schemes, continuous-time formulations, and robust or decentralized communication design.

7. Summary Table: Stochastic ADMM Landscape

Algorithm Objective Type Rate (convex) Memory Key Features Reference
Stochastic ADMM NSE, convex BRm×d2B\in\mathbb{R}^{m\times d_2}1 BRm×d2B\in\mathbb{R}^{m\times d_2}2 Proximal stochastic x-update (Ouyang et al., 2012)
SA-ADMM NSE, convex BRm×d2B\in\mathbb{R}^{m\times d_2}3 BRm×d2B\in\mathbb{R}^{m\times d_2}4 Surrogate gradient, full memory (Zhong et al., 2013)
SCAS-ADMM Smooth, convex BRm×d2B\in\mathbb{R}^{m\times d_2}5 BRm×d2B\in\mathbb{R}^{m\times d_2}6 Variance reduction, sparse memory (Zhao et al., 2015)
SVRG-ADMM Smooth, convex/NCX BRm×d2B\in\mathbb{R}^{m\times d_2}7 / BRm×d2B\in\mathbb{R}^{m\times d_2}8 BRm×d2B\in\mathbb{R}^{m\times d_2}9 Epoch-based variance reduction (Zheng et al., 2016)
SADMM (KL) Nonsmooth, nonconvex bRmb\in\mathbb{R}^m0 (stationarity) - VRADMM, global convergence under KL (Bian et al., 2020)
ADA-SADMM Convex bRmb\in\mathbb{R}^m1 bRmb\in\mathbb{R}^m2 Adaptive Bregman proximal, AdaGrad link (Zhao et al., 2013)
ACC-SADMM Convex bRmb\in\mathbb{R}^m3 non-erg. bRmb\in\mathbb{R}^m4 Accelerated, Nesterov, dual compensation (Fang et al., 2017)
SM-ADMM Any (via SDE) SDE analysis bRmb\in\mathbb{R}^m5 Weak convergence, bias-variance trade-off (Zhou et al., 2020)
Hilbert-SADMM Hilbert/Infinite bRmb\in\mathbb{R}^m6/ bRmb\in\mathbb{R}^m7 - Nesterov acceleration, nonergodic rates (Deng et al., 10 Mar 2026)

Abbreviations: NSE = Nonsmooth (Separable); NCX = Nonconvex; VR = Variance Reduction; KL = Kurdyka-Łojasiewicz property.


Stochastic ADMM crystallizes the union of stochastic optimization, convex analysis, and modern algorithmic design. Variants exploiting adaptive preconditioning, variance reduction, Nesterov acceleration, and continuous-time theory continue to extend both its theoretical boundaries and its practical reach (Ouyang et al., 2012, Zhao et al., 2015, Zheng et al., 2016, Fang et al., 2017, Zhao et al., 2013, Deng et al., 10 Mar 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Stochastic Alternating Direction Method of Multipliers.