Papers
Topics
Authors
Recent
Search
2000 character limit reached

ADMM-Based Optimization

Updated 30 May 2026
  • ADMM-based optimization is a method that decomposes global constrained problems into smaller subproblems using augmented Lagrangian techniques.
  • It employs iterative schemes like Gauss–Seidel and Jacobian updates, with modifications ensuring convergence even in multi-block scenarios.
  • Its applications span distributed smart grids, communication networks, and nonconvex optimization, providing scalability and efficiency in big-data settings.

The alternating direction method of multipliers (ADMM) is a versatile optimization framework for solving structured, constrained problems, particularly those arising in large-scale and distributed settings. ADMM works by decomposing a global problem into smaller subproblems, typically corresponding to separable or loosely coupled variables, with coordination enforced via augmented Lagrangian and dual variable updates. Its strong scalability, decomposition properties, and ability to handle nonsmooth and constrained objectives have established ADMM as a foundational tool across convex, nonconvex, distributed, and big-data optimization landscapes.

1. Canonical Formulations of ADMM-Based Optimization

The standard large-scale convex consensus problem targeted by ADMM is formulated as:

minx1,,xNi=1Nfi(xi)subject toi=1NAixi=b,xiXi\min_{x_1, \dots, x_N} \sum_{i=1}^N f_i(x_i) \quad \text{subject to} \quad \sum_{i=1}^N A_i x_i = b, \quad x_i \in \mathcal X_i

with each fif_i closed, proper, convex, and each Xi\mathcal X_i a closed convex set, AiRm×niA_i \in \mathbb{R}^{m \times n_i}, bRmb \in \mathbb{R}^m (Liu et al., 2015).

This can be equivalently reformulated in a two-block form by grouping variables and introducing auxiliary variables:

minxRn,zRpf(x)+g(z)subject toAx+Bz=c\min_{x \in \mathbb{R}^n, z \in \mathbb{R}^p} f(x) + g(z) \quad \text{subject to} \quad A x + B z = c

The corresponding augmented Lagrangian is

Lρ(x,z,λ)=f(x)+g(z)+λT(Ax+Bzc)+ρ2Ax+Bzc22L_\rho(x, z, \lambda) = f(x) + g(z) + \lambda^T (A x + B z - c) + \frac{\rho}{2} \|A x + B z - c\|_2^2

where λ\lambda is the dual variable and ρ>0\rho > 0 is the penalty parameter.

2. ADMM Iterative Schemes and Multi-Block Extensions

2.1 Two-Block ADMM

The classical two-block ADMM applies Gauss–Seidel updates as follows:

  • Update xx: fif_i0
  • Update fif_i1: fif_i2
  • Update dual: fif_i3

For convex fif_i4 and feasible constraints, convergence to a primal–dual solution is guaranteed, with both ergodic and non-ergodic rates fif_i5 in the objective residual (Liu et al., 2015).

2.2 Multi-Block ADMM

Direct generalizations to fif_i6 blocks can be made using two main strategies:

  • Direct Gauss–Seidel (sequential): Update each block sequentially, always using the freshest values of previously updated blocks:

fif_i7

fif_i8

This scheme generally lacks global convergence guarantees and can diverge without additional assumptions (e.g., small dual steps, strong convexity, or randomized update order).

  • Direct Jacobian (parallel): Update all fif_i9 in parallel using only previous values from the last iteration:

Xi\mathcal X_i0

Xi\mathcal X_i1

This variant converges only under stringent structural conditions (e.g., near-orthogonality of Xi\mathcal X_i2, or full-column rank blocks).

3. Convergent Multi-Block ADMM Modifications

To restore global convergence for Xi\mathcal X_i3-block problems, several modifications are employed (Liu et al., 2015, Liu et al., 2015):

3.1 Variable Splitting ADMM

  • Introduce auxiliary variables Xi\mathcal X_i4 and constraints Xi\mathcal X_i5, Xi\mathcal X_i6, reducing the problem to a 2-block structure in Xi\mathcal X_i7 and Xi\mathcal X_i8 which is amenable to standard ADMM analysis with Xi\mathcal X_i9 convergence rate.
  • This increases the number of variables and constraints linearly in AiRm×niA_i \in \mathbb{R}^{m \times n_i}0.

3.2 ADMM with Gaussian Back Substitution

  • Run a forward Gauss–Seidel sweep (predict), then correct via a backward sweep using block-triangular systems involving explicit matrices AiRm×niA_i \in \mathbb{R}^{m \times n_i}1.
  • Proven to converge globally if each AiRm×niA_i \in \mathbb{R}^{m \times n_i}2 is nonsingular, achieving AiRm×niA_i \in \mathbb{R}^{m \times n_i}3 objective rate.

3.3 Proximal Jacobian ADMM

  • Add per-block proximal regularizers AiRm×niA_i \in \mathbb{R}^{m \times n_i}4 to each AiRm×niA_i \in \mathbb{R}^{m \times n_i}5 minimization and a damped dual update, with suitable parameter choices ensuring global convergence and AiRm×niA_i \in \mathbb{R}^{m \times n_i}6 rate.

4. Distributed and Parallel Implementation Paradigms

ADMM's decomposition structure is well-suited for distributed and parallel computing environments (Liu et al., 2015, Liu et al., 2015, Summers et al., 2012):

  • Distributed models: Each block AiRm×niA_i \in \mathbb{R}^{m \times n_i}7 and its associated AiRm×niA_i \in \mathbb{R}^{m \times n_i}8 are handled by separate compute nodes. Dual variable aggregation and constraint enforcement are achieved through collective operations (e.g., MPI all-reduce, parameter-server pull/push, Spark RDD reductions).
  • Synchronization patterns:
    • Gauss–Seidel: Sequential subproblem solves yield high synchronization cost.
    • Jacobian/proximal: All AiRm×niA_i \in \mathbb{R}^{m \times n_i}9 solves are parallel, requiring only collective communication for constraint aggregation per iteration.
  • Big data strategies:
    • Data locality: Assign bRmb \in \mathbb{R}^m0 to the same node.
    • Communication-efficient ADMM: Employ quantization or low-rank sketching to reduce network load.
    • Adaptive penalty control: Dynamically update bRmb \in \mathbb{R}^m1 to accelerate constraint residual decay.

5. Applications Across Domains

5.1 Large-Scale Communication and Power Networks

Security-Constrained Optimal Power Flow (SCOPF):

  • Formulated with block variables for each contingency, decomposed such that each block solves a local OPF with extra quadratic terms, while maintaining global generator and line limits.
  • ADMM yields full decomposition across contingencies, with linear scalability in their count (Liu et al., 2015).

Mobile Data Offloading in SDN:

  • Traffic allocation from base stations to WiFi/femtocells is cast in a consensus form, with separable convex objectives and capacity constraints.
  • Proximal Jacobian ADMM gives parallel updates for all traffic variables under confidentiality and scalability requirements (Liu et al., 2015).

Distributed Robust State Estimation:

  • Each power grid area enforces local data integrity with bRmb \in \mathbb{R}^m2 penalties and consensus constraints between overlapping states.
  • Multi-block ADMM applied with proximal regularization achieves global convergence (Liu et al., 2015).

5.2 Model Predictive Consensus

  • Distributed model predictive control over dynamical networks leverages ADMM to enforce trajectory and input consensus while decomposing the global cost. Closed-loop performance with a few tens of ADMM iterations matches centralized solvers in practice, with rapid per-iteration times achievable via code generation techniques (Summers et al., 2012).

6. ADMM for Nonconvex and Heuristic Optimization

While ADMM is grounded in convex optimization theory, empirical studies confirm its effectiveness in diverse nonconvex scenarios, provided careful penalty parameterization (Xu et al., 2016):

  • l₀-regularized regression/denoising, phase retrieval, eigenvector computation: ADMM demonstrates robust convergence with appropriately tuned or adaptively updated penalty parameters.
  • Interpretation: Adaptive-penalty variants (e.g., residual balancing, spectral heuristics) reliably find high-quality approximate solutions, often with far fewer iterations than grid-searched fixed penalties. Global optimality is not ensured for highly nonconvex landscapes, but practical outcomes are frequently acceptable.

Recent work extends ADMM-based approaches to combinatorial nonconvex problems (e.g., spanning tree–constrained mixed-integer programs), by relaxing binary variables, solving convex subproblems, and projecting onto the feasible set via combinatorial algorithms (e.g., MST or MWRA). These methods yield high-quality feasible solutions with substantial computational savings over exact MILP solvers in empirical studies (Mokhtari, 14 Aug 2025).

7. Theoretical Equivalences and Algorithm Selection

A detailed equivalence theory establishes relationships among the many possible ADMM formulations for problems of the form bRmb \in \mathbb{R}^m3 (Yan et al., 2014):

  • ADM algorithms applied to primal and dual forms are mutually equivalent via affine changes of variables. Only a handful of truly distinct ADMM schemes result, typically characterized by the computational form of their block subproblems (e.g., whether updating bRmb \in \mathbb{R}^m4 or bRmb \in \mathbb{R}^m5 first).
  • When one term is quadratic, update-order equivalence holds, so computational “friendliness” (ease of solving the subproblems) becomes the primary criterion for selecting a variant.

This framework guides practitioners to select the ADMM instance whose subproblems admit the most efficient solution given their problem’s specific structure.


References:

  • (Liu et al., 2015) Multi-Block ADMM for Big Data Optimization in Modern Communication Networks
  • (Liu et al., 2015) Multi-Block ADMM for Big Data Optimization in Smart Grid
  • (Xu et al., 2016) An Empirical Study of ADMM for Nonconvex Problems
  • (Summers et al., 2012) Distributed Model Predictive Consensus via the Alternating Direction Method of Multipliers
  • (Mokhtari, 14 Aug 2025) A Heuristic ADMM-based Approach for Tree-Constrained Optimization
  • (Yan et al., 2014) Self Equivalence of the Alternating Direction Method of Multipliers

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ADMM-Based Optimization.