Bregman-ADMM for Structured Optimization

Updated 22 February 2026

Bregman-ADMM is an optimization framework that replaces the quadratic penalty with problem-specific Bregman divergences to better capture underlying geometry.
It enables scalable and parallelizable solutions for large-scale, nonconvex, and distributed problems with applications in machine learning, imaging, and statistical inference.
The method offers strong theoretical guarantees, including global convergence in convex settings and ergodic O(1/T) rates under suitable conditions.

Bregman Alternating Direction Method of Multipliers (BADMM) generalizes the classical ADMM by replacing the standard quadratic penalty with a problem-adapted Bregman divergence. This modification enables enhanced flexibility in exploiting problem geometry (such as structure, sparsity, or manifold constraints) and enables scalable, parallelizable, and theoretically principled algorithms for large-scale, nonconvex, or distributed optimization in machine learning, signal processing, and statistics.

1. Fundamentals of Bregman-ADMM

Classical ADMM approaches split optimization problems with separable objectives and linear constraints into iterative updates on primal and dual variables, using a quadratic (Euclidean) penalty to enforce agreement between splits. BADMM introduces Bregman divergences—distance-like measures defined by a strictly convex function $\phi$ —in lieu of Euclidean terms: $D_{\phi}(u,v) = \phi(u) - \phi(v) - \langle \nabla\phi(v), u-v \rangle$ This generalization preserves convexity while adapting the geometry of the update steps to constraints such as nonnegativity, simplex or information-theoretic structure, yielding proximal-like updates that are often closed-form and more numerically stable for structured problems (Wang et al., 2013).

The canonical two-block BADMM for

$\min_{x,z} \ f(x) + g(z) \quad \text{s.t.} \quad Ax + Bz = c$

proceeds via the following steps at iteration $k$ : $\begin{aligned} x^{k+1} &= \arg\min_x \ f(x) + \langle y^k, A x + B z^k - c \rangle + \rho D_\phi(c-Ax,\ B z^k) \ z^{k+1} &= \arg\min_z \ g(z) + \langle y^k, A x^{k+1} + B z - c \rangle + \rho D_\phi(Bz,\ c-A x^{k+1}) \ y^{k+1} &= y^k + \tau (A x^{k+1} + B z^{k+1} - c) \end{aligned}$ where $\rho > 0$ is a penalty parameter, $\tau$ is often set to $\rho$ , and $D_\phi$ can be designed to exploit the structure of $x,z$ (Wang et al., 2013, Wang et al., 2014).

2. Algorithmic Variants and Multiblock Extensions

BADMM admits straightforward extension to multi-block variables and to nonconvex composite objectives. For a general N-block problem: $\min_{x_i} \ \sum_{i=1}^N f_i(x_i) \quad \text{s.t. } \sum_{i=1}^N A_i x_i = 0$ with convex (possibly nonsmooth) or smooth $f_i$ , BADMM performs cyclic updates: $x_i^{k+1} = \arg\min_{x_i} L_\alpha(x_1^{k+1},...,x_{i-1}^{k+1}, x_i, x_{i+1}^k,...,x_N^k, p^k) + \Delta_{\phi_i}(x_i, x_i^k)$ and dual ascent on $p$ (Wang et al., 2015, Pham et al., 2024). Each block can have its own Bregman divergence, allowing maximal exploitation of distinct block-geometry.

The BADMM paradigm also supports blockwise linearization and additional Bregman-regularization, which greatly facilitate tractable and parallel updates for large- or high-dimensional problems (Pham et al., 2024, Zhou et al., 2022).

3. Convergence and Theoretical Guarantees

In convex settings with strongly convex Bregman generators and mild assumptions (such as full rank of constraint matrices), BADMM admits global convergence to a KKT point of the original constrained problem, with $O(1/T)$ ergodic rate for primal/dual residuals (Wang et al., 2013). Multiblock and nonconvex BADMM can also converge to stationary points under subanalyticity or Kurdyka–Łojasiewicz property and blockwise strong convexity (Wang et al., 2015, Wang et al., 2014, Pham et al., 2024).

Core convergence theorems state that, with suitable step sizes and strong convexity/Lipschitz constants for the Bregman generators, the sequence of iterates asymptotically vanishes in residual, with the full trajectory converging to a stationary (possibly global in the convex case) solution (Wang et al., 2015, Wang et al., 2014, Pham et al., 2024). For problems on simplexes with KL-divergence, BADMM can enjoy a theoretical speedup over standard ADMM proportional to $O(n/\log n)$ (Wang et al., 2013).

4. Choice of Bregman Divergence and Computational Implications

Selection of the Bregman generator $\phi$ is critical:

Quadratic: $\phi(x) = \frac{1}{2} \|x\|^2$ , leading to standard ADMM.
KL-divergence: $\phi(x) = \sum_i x_i \ln x_i$ for probability simplex or positive measures, yielding multiplicative or mirror descent-like updates.
Mahalanobis or log-determinant: for structured covariance or graphical model settings (Khoo et al., 7 Feb 2025, Babagholami-Mohamadabadi et al., 2015).
Entropic or Itakura–Saito: for NMF and information-theoretic matrix factorizations (Chrétien et al., 2015).

The choice often determines whether the subproblem updates are closed-form, elementwise, or require iterative inner solves. In high-dimensional, distributed, or massive scale settings, careful design of $\phi$ and related Bregman proximals can reduce communication and enable maximum parallelism (Zhou et al., 2022, Wang et al., 2013).

5. Applications

BADMM finds application in a broad array of domains:

Large-scale assignment and resource allocation: Billion-variable assignment problems are decomposed into tractable item-wise subproblems via BADMM, enabling efficient MapReduce or GPU implementations routinely deployed in production (Zhou et al., 2022).
Robust Nonnegative Matrix Factorization: Incorporating Bregman-proximal steps for factor blocks stabilizes convergence and handles nonnegativity in NMF with outliers and missing data (Chrétien et al., 2015).
Variational Inference for Distributed Bayesian Learning: BADMM employing reverse-KL (log-partition based) divergence yields closed-form parameter updates, facilitating parallel mean field inference in sensor networks and distributed matrix factorization (Babagholami-Mohamadabadi et al., 2015).
Compressed Sensing and Imaging: Split Bregman (which is algebraically equivalent to BADMM in quadratic settings) is a state-of-the-art algorithm for TV- and sparsity-regularized reconstructions (Nien et al., 2014).
Graphical Models and Statistical Physics: BADMM permits nonlinear dual updates and entropy-type regularization for Bethe variational problems and quantum extensions (Khoo et al., 7 Feb 2025).
Adaptive Stochastic Optimization: BADMM with adaptive (AdaGrad-type) second-order Bregman prox allows data-dependent preconditioning, with regret matching the best offline scaling (Zhao et al., 2013).

6. Distributed, Parallel, and Scalable Implementations

BADMM enables fully distributed and massively parallel solutions, particularly by exploiting problem separability in the primal variables through suitable Bregman-prox terms. As in generalized assignment and resource allocation, embedding a Bregman divergence tailored to block partition kills primal cross-terms, yielding block-wise separable subproblems solvable independently and aggregatable via MapReduce or message-passing architectures (Zhou et al., 2022, Babagholami-Mohamadabadi et al., 2015).

This block separability can further be exploited for high parallel efficiency in GPU or multi-core implementations (e.g., optimal transport, matrix factorization, structured mean-field inference) (Wang et al., 2013, Babagholami-Mohamadabadi et al., 2015).

7. Extensions, Limitations, and Future Directions

BADMM has been extended to handle

Nonconvex and nonsmooth objectives, with global or subsequential convergence results via the Kurdyka–Łojasiewicz property (Wang et al., 2015, Pham et al., 2024).
Nonlinear dual updates for entropy-minimization or Bethe variational inference (Khoo et al., 7 Feb 2025).
Stochastic and online settings via adaptive Bregman increments (e.g., AdaGrad, mirror-ADMM) for optimal per-coordinate or per-block scaling (Zhao et al., 2013).
Extensions to the quantum variational domain via matrix-exponential updates (Khoo et al., 7 Feb 2025).

The main limitations remain in the efficient handling of highly coupled or nonconvex blocks when closed-form updates become intractable; further research addresses advanced linearization, warm-starting, or hybridization with alternative splitting schemes (Pham et al., 2024). Open questions include global convergence in complex nonconvex settings and optimal divergence choice under additional structure or stochasticity constraints.

In summary, BADMM enables structure-exploiting, convergent, and scalable decomposition of complex optimization problems by leveraging the flexibility of Bregman divergences. Its theoretical properties and concrete algorithmic forms have driven adoption across combinatorial, statistical learning, imaging, and distributed inference applications (Wang et al., 2013, Wang et al., 2015, Zhou et al., 2022, Chrétien et al., 2015, Nien et al., 2014, Zhao et al., 2013, Khoo et al., 7 Feb 2025, Pham et al., 2024, Babagholami-Mohamadabadi et al., 2015).