Papers
Topics
Authors
Recent
Search
2000 character limit reached

Proximal-Proximal-Gradient (PPG) Method

Updated 28 January 2026
  • PPG is a framework for structured convex optimization that generalizes proximal-gradient, ADMM, and forward-backward splitting methods.
  • It decouples smooth and coupled nonsmooth terms via auxiliary variables and dual updates, enabling efficient parallel and distributed implementations.
  • The method offers theoretical guarantees such as O(1/k) convergence and linear rates under strong convexity, with applications in signal processing and machine learning.

The Proximal-Proximal-Gradient (PPG) method is an algorithmic framework for structured convex optimization problems that generalizes, extends, and unifies several operator-splitting and first-order proximal methods. PPG and its variants are designed to enable efficient minimization of composite convex objectives, particularly when the objective is the sum of smooth and (possibly) coupled nonsmooth terms and classical proximal-gradient methods are either inapplicable or computationally inefficient. The methodology is particularly attractive for problems with block-coupling, nonseparable penalties, or large numbers of nonsmooth component functions. Several notable algorithmic developments and variations exist under the PPG umbrella, including the original form for convex composite problems, parallel/distributed multiterm variants, and asynchronous penalized distributed schemes.

1. Mathematical Formulation and Scope

PPG methods address composite convex programs of the form

minxRdr(x)+1ni=1n(fi(x)+gi(x))\min_{x \in \mathbb{R}^d} r(x) + \frac{1}{n} \sum_{i=1}^n \bigl( f_i(x) + g_i(x) \bigr)

where rr is a closed, convex, and proximable function; each fif_i is convex and differentiable with LL-Lipschitz gradient; each gig_i is closed, convex, and proximable but not necessarily separable. The setup supports arbitrary structures in the gig_i, allowing general composite and coupled nonsmooth terms (Ryu et al., 2017).

A closely related problem formulation, targeted by earlier works, is

minzZF(z):=h(z)+P(Mzb)\min_{z \in \mathcal{Z}} F(z) := h(z) + P(Mz-b)

where hh is LL-smooth and convex, PP is closed, convex, and proximable, and MM is a nonzero linear operator (Pong, 2013). This structure covers linearly coupled nonsmooth regularizations occurring in, e.g., fused lasso, network lasso, matrix completion, and group-lasso problems.

2. Algorithmic Principles and Iterative Framework

Classical proximal-gradient methods are not directly applicable when the nonsmooth term is composed with a nontrivial affine map. The PPG algorithm decouples these difficulties by introducing dual variables and auxiliary proximal steps in a way that leverages the tractability of each nonsmooth summand (Pong, 2013, Ryu et al., 2017).

For the multiterm setup (Ryu et al., 2017), PPG operates as follows:

  • Introduce auxiliary variables ziz_i for each summand.
  • At iteration kk, compute:

    • xk+1/2=proxαr(1ni=1nzik)x^{k+1/2} = \operatorname{prox}_{\alpha r} \left( \frac{1}{n} \sum_{i=1}^n z_i^k \right),
    • For each ii, update

    xik+1=proxαgi(2xk+1/2zikαfi(xk+1/2))x_i^{k+1} = \operatorname{prox}_{\alpha g_i} \left(2 x^{k+1/2} - z_i^k - \alpha \nabla f_i (x^{k+1/2})\right)

    zik+1=zik+xik+1xk+1/2z_i^{k+1} = z_i^k + x_i^{k+1} - x^{k+1/2}

These steps parallelize naturally and are efficiently implementable in distributed or GPU architectures.

In single-block affine composition settings (Pong, 2013), the PPG update is constructed by alternating primal and dual proximal steps, which, for suitable parameters B,τ,γB, \tau, \gamma, read:

  • yk+1=proxτP(Tykb+MzkBMh(zk))y^{k+1} = \operatorname{prox}_{\tau P^*} (T y^k - b + Mz^k - B M\nabla h(z^k)),
  • zk+1=zkγ(h(zk)+Myk+1)z^{k+1} = z^k - \gamma (\nabla h(z^k) + M^* y^{k+1}), where T=τIBMMT = \tau I - B M M^*, and PP^* is the Fenchel conjugate of PP.

PPG generalizes the classical proximal-gradient method, several forms of Alternating Direction Method of Multipliers (ADMM), and multiterm forward-backward splitting, becoming equivalent to standard methods when particular terms vanish or coincide (Ryu et al., 2017).

3. Theoretical Convergence and Complexity

Under standard convexity and Lipschitz assumptions on fif_i and proximal tractability of rr and gig_i, deterministic PPG converges globally to an optimal solution. Theoretical results established for multiterm PPG (Ryu et al., 2017) include:

  • Ergodic convergence rate of O(1/k)O(1/k) for the averaged “pre-dual” suboptimality measure,
  • Pointwise fixed-point residual decaying at O(1/k)O(1/\sqrt{k}) rate,
  • Global linear convergence (Q–R rate) when one composite is strongly convex.

For the single-block affine-composite model, (Pong, 2013) proves that, for k1k\geq1,

F(zk)F(z)=O(1/k)F(z^{k}) - F(z^*) = O(1/k)

with explicit bounds depending on initial condition and algorithmic parameter choices.

Stochastic variants (S-PPG) maintain the ergodic O(1/k)O(1/k) convergence for appropriately diminishing or constant stepsizes under unbiased sampling (Ryu et al., 2017).

4. Variants: Distributed and Asynchronous PPG

The PPG framework has been extended to asynchronous and distributed settings to handle real-world networked multi-agent systems subject to communication delays and heterogeneous agent action clocks (Wang et al., 2021). In this context:

  • Each agent maintains local step-size and penalty parameters, performing independent updates based on delayed global state snapshots.
  • The Asynchronous Penalized PPG (Asyn-PPG) synchronizes penalty coefficients at slot boundaries and incrementally strengthens the consensus through a quadratic penalty term.
  • Provided synchronization of penalty parameters per slot, both suboptimality and constraint violation decrease at O(1/K)O(1/K) with KK slots; this is demonstrated in distributed LASSO and social welfare market optimization problems, with empirical trajectories matching the predicted $1/t$ decay (Wang et al., 2021).

5. Connections to Existing Optimization Methods

PPG subsumes and extends several established first-order and operator-splitting schemes:

  • Reduces to standard proximal-gradient (forward-backward splitting) when gi0g_i\equiv0 and M=IM=I,
  • Recovers consensus ADMM in the case fi0f_i\equiv0 and r0r\equiv0,
  • Yields multiterm forward-backward splitting for r=0r=0, tightly coupling PPG with generalized forward-backward constructions,
  • The stochastic PPG specialization includes methods such as Finito and MISO (Ryu et al., 2017).

The dual interpretation of PPG shows the method as a variant of the alternating minimization algorithm with a strongly convex dual subproblem, contributing to its robust convergence and tractable subproblem structure (Pong, 2013).

6. Applications and Practical Implementation

PPG has demonstrated efficacy across a diverse range of applications involving nonsmooth composite structure, including:

  • Overlapping group lasso (coupled group-wise 2\ell_2 penalties),
  • Robust PCA and matrix completion (combining nuclear norm and 1\ell_1 penalties),
  • Fused-lasso and total variation minimization,
  • Network lasso with colored graph-regularization terms,
  • Support vector machines and generalized linear models with structured nonsmooth regularizers.

Parallel and distributed implementation is readily facilitated: in a parameter-server model, each worker maintains local variables and computes proximal steps, with global synchronization required only for aggregate variables (Ryu et al., 2017). The asynchrony-robust form is designed for deployment in partially synchronous or delayed-communication networks with minimal coordination overhead (Wang et al., 2021).

7. Numerical Evidence and Observations

Empirical results for PPG and its asynchronous variant indicate:

  • Competitive or superior convergence rates compared to ADMM and forward-backward splitting on large-scale composite problems,
  • Ability to efficiently solve high-dimensional problems such as Hankel nuclear norm minimization and fused lasso logistic regression within a few dozen iterations and reasonable runtime,
  • Robustness to network delays, heterogeneous update frequencies, and limited communication synchronizations in distributed environments.

All methods with PPG structure attain objective and constraint gap closures at the theoretically predicted rates, and stochastic variants maintain efficiency for massive-scale nonsmooth convex optimization (Pong, 2013, Ryu et al., 2017, Wang et al., 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Proximal-Proximal-Gradient (PPG).