Stochastic ADMM Variants

Updated 22 February 2026

Stochastic ADMM Variants are optimization methods that extend classical ADMM by using stochastic updates and adaptive Bregman divergences for data-dependent regularization.
They enable large-scale machine learning and empirical risk minimization by processing mini-batches to achieve faster convergence and tighter regret bounds.
Practical implementations balance efficiency and accuracy through diagonal or full-matrix adaptive prox strategies, ensuring robust performance on high-dimensional and streaming data.

Stochastic ADMM Variants are a class of optimization algorithms that generalize classical Alternating Direction Method of Multipliers (ADMM) to stochastic regimes by incorporating single-sample or minibatch-based updates and leveraging adaptive Bregman divergences as proximal regularization. These variants are particularly relevant for large-scale machine learning and empirical risk minimization, where evaluating the entire empirical objective at every iteration is computationally prohibitive. By randomizing the update direction and incorporating data-adaptive curvature (via Bregman divergences tuned by the observed gradients), stochastic ADMM variants achieve improved regret bounds, faster empirical convergence, and scalable applicability to high-dimensional and streaming data contexts (Zhao et al., 2013).

1. Problem Setting and Motivation

Stochastic ADMM variants address problems of the form: $\min_{x,\,z} f(x) + g(z) \quad\text{subject to}\quad A x + B z = c$ where $f$ is typically an expectation or empirical mean over a large data set (e.g., $f(x) = \mathbb{E}_{\xi} \ell(x;\xi)$ ), and $g$ is convex (possibly with structure-inducing constraints). The classical ADMM framework, while well-established for convex problems with full-batch access, becomes non-viable when access to the expected loss or its full gradient is computationally infeasible.

Stochastic ADMM approaches substitute the full expected loss by a randomly sampled loss term or its stochastic gradient at each iteration, mimicking stochastic gradient descent, and introduce an adaptive Bregman divergence as the regularizing proximal term, replacing the static quadratic penalty of conventional ADMM. This mechanism enables adaptive, data-dependent regularization that aligns with the geometry encountered over the stochastic sequence of gradients (Zhao et al., 2013).

2. Algorithmic Structure and Bregman Divergences

The stochastic Bregman ADMM variant maintains the essential multi-block iterative nature of ADMM but crucially incorporates two modifications:

Stochastic sampling: At iteration $t$ , only a single random example (or a mini-batch) is processed, resulting in a stochastic (sub)gradient $g_t$ .
Adaptive Bregman Proximal: The update for each variable includes a regularization term based on a Bregman divergence $D_{\phi_t}(u,v)$ , where the "prox-function" $\phi_t$ is chosen adaptively to capture the curvature witnessed over prior iterations. The general update reads: $x^{t+1} = \arg\min_x \left\{ f_t(x) + \langle \lambda^t, A x + B z^t - c\rangle + \rho \, D_{\phi_t}(x, x^t) \right\}$

$z^{t+1} = \arg\min_z \left\{ g(z) + \langle \lambda^t, A x^{t+1} + B z - c\rangle + \rho \, D_{\psi_t}(z, z^t) \right\}$

$\lambda^{t+1} = \lambda^t + \rho (A x^{t+1} + B z^{t+1} - c)$

where $f_t(x)$ denotes the instantaneous loss on the sampled data at step $t$ (Zhao et al., 2013).

The choice of Bregman divergence is crucial: while the classical quadratic form $D_\phi(u,v) = \tfrac{1}{2}\|u-v\|_2^2$ recovers Euclidean ADMM, stochastic variants adapt $\phi_t$ in a data-driven manner (diagonally or full-matrix weighted norms) to track the accumulated curvature of the trajectory.

3. Adaptive Proximal Terms and Online Mirror Descent Connection

A defining property of stochastic ADMM variants is the use of optimal, history-dependent adaptive curvature. At every iteration $t$ , the algorithm sets the prox-function as a quadratic norm with curvature matrix $H_t$ : $\phi_t(w) = \tfrac{1}{2}\|w\|_{H_t}^2 \implies D_{\phi_t}(w, w') = \tfrac{1}{2}(w-w')^\top H_t (w-w')$ where $H_t$ is online chosen to (approximately) minimize accumulated regret relative to the sequence of observed stochastic gradients $g_1,\dots,g_t$ . Typical constructions include:

$H_t = a I + \operatorname{diag}(s_t)$ , $s_{t,i} = \|\{g_{1:t, i}\}\|_2$ (coordinate-wise adaptive)
$H_t = a I + G_t^{1/2}$ , $G_t = \sum_{i=1}^t g_i g_i^\top$ (full-matrix, AdaGrad-like)

Such adaptive strategies ensure that the regret of the stochastic ADMM instance is never worse than the regret of the best prox chosen in hindsight, up to problem-dependent constants. This methodology matches the best possible adaptive subgradient regret bounds in the online learning literature (Zhao et al., 2013).

4. Convergence Properties and Regret Bounds

Stochastic Bregman ADMM achieves rigorous convergence and optimal regret guarantees under convexity. The main result (Theorem 2.1 in (Zhao et al., 2013)) asserts that for $T$ iterations and averaged iterates $\bar{w}_T = \frac{1}{T}\sum_{t=1}^T w_t$ , $\bar{v}_T=\frac{1}{T}\sum_{t=1}^T v_t$ , one has: $\mathbb{E}\left[ f(\bar{w}_T, \bar{v}_T) - f(w^*, v^*) + \rho \|A\bar{w}_T + B\bar{v}_T - b\| \right] \leq O\left( \frac{1}{T} \right)$ with leading constant matching the cumulative adaptive-norm of the gradients.

Specifically, for coordinate-wise adaptive Bregman divergences, the dominant term is $\sum_i \|g_{1:T,i}\|_2$ , while for fully-adaptive $H_t$ , it is $\operatorname{tr}(G_T^{1/2})$ . In both cases, the method achieves $O(1/T)$ ergodic convergence in the objective and constraint residual (Zhao et al., 2013).

5. Practical Implementation and Empirical Performance

Each update of stochastic ADMM variants requires only observing a single stochastic gradient per iteration, making the per-iteration cost independent of the dataset size. The block ADMM structure ensures that updates decouple naturally if $f$ or $g$ admit simple proximal mappings. The use of diagonal adaptive Bregman divergences is preferred in high-dimensional applications due to superior speed/memory trade-offs, while full-matrix adaptation is effective but often limited to moderate dimensions due to computational constraints (Zhao et al., 2013).

Empirical tests across datasets demonstrate that:

Adaptive-diagonal and full-matrix stochastic Bregman ADMM achieve substantially faster reduction in objective or feasibility gap per epoch than static-prox stochastic ADMM.
For instance, on the a9a dataset, two epochs of Ada-diag yield test error 15.01%, outperforming SADMM baseline at 16.46% after the same time budget (Zhao et al., 2013).
The convergence rate is insensitive to ill-conditioning due to adaptive regularization.

6. Theoretical and Practical Comparison to Deterministic and Nonadaptive Variants

Compared to classical (deterministic) batch ADMM, stochastic variants offer a direct reduction in wall-clock cost by avoiding full-data passes. Compared to nonadaptive stochastic ADMM, the use of history-adaptive Bregman divergences ensures strictly tighter regret bounds with data-dependent leading constants and empirically faster convergence for high-dimensional or non-uniform feature spaces (Zhao et al., 2013).

Theoretical guarantees match the best possible for stochastic online optimization via mirror descent: $O(1/T)$ for general convexity and $O(\log T/T)$ under strong convexity.

The stochastic Bregman ADMM framework unifies ideas from stochastic mirror descent, online adaptive subgradient methods, and operator splitting. It is compatible with additional regularization and constraint structures, mini-batch variants, and can be generalized to multi-block splitting and distributed computational architectures. Related developments include Bregman-proximal ADMM for nonnegative matrix factorization, optimal transport, and large-scale distributed assignment, where the underlying ideas of stochastic sampling and adaptive Bregman regularization are integral (Chrétien et al., 2015, Zhou et al., 2022, Wang et al., 2013).

References

"Adaptive Stochastic Alternating Direction Method of Multipliers" (Zhao et al., 2013)
"Bregman Alternating Direction Method of Multipliers" (Wang et al., 2013)
"A Bregman Proximal ADMM for NMF with Outliers" (Chrétien et al., 2015)
"A Practical Distributed ADMM Solver for Billion-Scale Generalized Assignment Problems" (Zhou et al., 2022)

Markdown Report Issue Upgrade to Chat

References (4)

Adaptive Stochastic Alternating Direction Method of Multipliers (2013)

A Bregman Proximal ADMM for NMF with Outliers: Estimating features with missing values and outliers: a Bregman-proximal point algorithm for robust Non-negative Matrix Factorization with application to gene expression analysis (2015)

A Practical Distributed ADMM Solver for Billion-Scale Generalized Assignment Problems (2022)

Bregman Alternating Direction Method of Multipliers (2013)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Stochastic ADMM Variants.

Stochastic ADMM Variants

1. Problem Setting and Motivation

2. Algorithmic Structure and Bregman Divergences

3. Adaptive Proximal Terms and Online Mirror Descent Connection

4. Convergence Properties and Regret Bounds

5. Practical Implementation and Empirical Performance

6. Theoretical and Practical Comparison to Deterministic and Nonadaptive Variants

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Stochastic ADMM Variants

1. Problem Setting and Motivation

2. Algorithmic Structure and Bregman Divergences

3. Adaptive Proximal Terms and Online Mirror Descent Connection

4. Convergence Properties and Regret Bounds

5. Practical Implementation and Empirical Performance

6. Theoretical and Practical Comparison to Deterministic and Nonadaptive Variants

7. Extensions and Related Methodologies

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research