Papers
Topics
Authors
Recent
Search
2000 character limit reached

AltGDmin: Alternating Projected Gradient & Minimization

Updated 30 January 2026
  • AltGDmin is a high-dimensional optimization framework that partitions variables into global (Za) and decoupled (Zb) blocks, enabling tailored updates.
  • It integrates block coordinate descent with closed-form minimization and QR-based projections to achieve geometric convergence and improved sample complexity.
  • AltGDmin excels in federated and distributed settings by reducing per-iteration communication costs while speeding up convergence.

Alternating Projected Gradient and Minimization (AltGDmin) is an optimization strategy designed for high-dimensional inference problems that admit a natural partitioning of variables into two blocks—one “slow” (Za) and one “decoupled” (Zb). The decoupling of the Zb variables enables per-block minimization steps that are computationally cheap and highly parallelizable, especially in federated data architectures. The Za variables are updated via projected gradient descent. AltGDmin thus integrates block coordinate descent, closed-form block minimization, and geometric updates via projected gradient steps. This framework yields convergence rates and communication efficiencies that surpass classical alternating minimization (AltMin) and standard gradient descent approaches for a broad class of partly-decoupled nonconvex problems, including low rank matrix recovery, matrix completion, robust PCA, compressive sensing, phase retrieval, and dictionary learning (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023, Chatterji et al., 2017, Hyder et al., 2019).

1. Formal Framework and Problem Structure

Let Z=(Za,Zb)Z = (Z_a, Z_b) denote the split variables, where ZaXZ_a \in \mathcal{X} typically represents global structure (e.g. orthonormal basis, dictionary) and ZbZZ_b \in \mathcal{Z} comprises local, decoupled components (e.g. coefficients, column vectors, labels). The data is denoted D=DD = \bigcup_\ell D_\ell in federated settings. The objective function takes the form

f(Za,Zb;D)==1γf(Za,(Zb);D)f(Z_a, Z_b; D) = \sum_{\ell=1}^\gamma f_\ell(Z_a, (Z_b)_\ell; D_\ell)

The function is differentiable in ZaZ_a and “block-separable” in ZbZ_b, meaning the minimization with respect to ZbZ_b splits into γ\gamma independent low-dimensional subproblems:

minZbf(Za,Zb)=min(Zb)f(Za,(Zb))\min_{Z_b} f(Z_a, Z_b) = \sum_\ell \min_{(Z_b)_\ell} f_\ell(Z_a, (Z_b)_\ell)

This structure is present in low rank matrix completion (LRMC), low rank column-wise compressive sensing (LRCS), phase retrieval, tensor extensions, clustering, and mixed linear regression (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023).

2. Iterative Procedure: Alternating Minimization and Projected Gradient

AltGDmin consists of cycling between:

  • Zb–step (Minimization): For each block =1,,γ\ell=1,\dots,\gamma, solve

(Zb)(t)=argmin(Zb)f(Za(t1),(Zb);D)(Z_b)_\ell^{(t)} = \arg\min_{(Z_b)_\ell} f_\ell(Z_a^{(t-1)}, (Z_b)_\ell; D_\ell)

These solves are typically exact and exploit the decoupled geometry (e.g. column-wise least squares in LRCS).

  • Za–step (Projected Gradient): Compute gradient

G(t):=Zaf(Za(t1),Zb(t);D)G^{(t)} := \nabla_{Z_a} f(Z_a^{(t-1)}, Z_b^{(t)}; D)

Perform a gradient descent step and project back onto the feasible set X\mathcal{X} (e.g. Stiefel manifold for orthonormal bases):

Za~=Za(t1)ηG(t)\widetilde{Z_a} = Z_a^{(t-1)} - \eta G^{(t)}

Za(t)=ProjX(Za~)Z_a^{(t)} = \mathrm{Proj}_{\mathcal{X}}(\widetilde{Z_a})

In many high-dimensional applications, ProjX\mathrm{Proj}_{\mathcal{X}} is realized via a thin QR decomposition (Vaswani, 20 Apr 2025).

  • Pseudocode Summary:

1
2
3
4
5
6
7
8
9
10
11
# Algorithm AltGDmin
for t in range(T):
    # Zb-step (local, parallel)
    for l in range(γ):
        (Zb_l)[t] = argmin_{Zb_l} f_l(Za[t-1], Zb_l; D_l)
        send g_l = _{Za} f_l(Za[t-1], (Zb_l)[t]) to server
    # Za-step (global)
    Gather g = Σ_l g_l
    Za_tilde = Za[t-1] - η * g
    Za[t] = QR(Za_tilde) # projection
    Broadcast Za[t]

3. Convergence Theory and Complexity Bounds

Given ff is LL-smooth in ZaZ_a and block-decoupled in ZbZ_b, and assuming exact solutions for Zb-steps, AltGDmin admits rigorous geometric convergence. For the low rank column-wise sensing (LRCS) and matrix completion (LRMC) tasks:

  • Sample Complexity (LRCS):

mqκ4μ2(n+q)r(κ4r+log(1/ϵ)),mlog(1/ϵ)m q \gtrsim \kappa^4 \mu^2 (n+q) r \left( \kappa^4 r + \log(1/\epsilon) \right), \quad m \gtrsim \log(1/\epsilon)

  • Iteration Complexity:

T=O(κ2log(1/ϵ))T = O(\kappa^2 \log(1/\epsilon))

  • Error Bounds:

SubsDist2(U(T),U)ϵ,bk(T)bkbk\operatorname{SubsDist}_2(U^{(T)}, U^*) \leq \epsilon, \qquad \|b_k^{(T)} - b_k^*\| \leq \|b_k^*\|

Analogous results hold for phase retrieval and dictionary learning. In LRMC, AltGDmin achieves sample and communication complexity nearly optimal among iterative solvers. Convergence analysis leverages spectral initialization, incoherence propagation, blockwise error contraction, and matrix concentration inequalities (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023, Chatterji et al., 2017, Hyder et al., 2019).

4. Federated and Parallel Implementation

AltGDmin is particularly suited to federated learning with vertically partitioned data. Each node manages a local block DD_\ell:

  • Local computation: Zb-step proceeds without communication.
  • Global update: Only gradients of the Za-block are communicated.
  • Communication complexity per iteration: O(nr)O(n r) total, optimal in rr, with total communication O(nrT)O(n r T).

The minimization step is both privacy-preserving (raw data never leaves local nodes) and computationally efficient. Empirical studies demonstrate AltGDmin is $5$–10×10\times faster and $5$–10×10\times more communication-efficient than AltMin and FactGD in federated LRMC deployments (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023).

5. Specializations and Applications

Table: Representative problem classes expressible in AltGDmin's block structure.

Application Za (Global block) Zb (Decoupled block)
Low-rank matrix recovery U (basis) B (coefficients)
Phase retrieval w/ priors x (signal on manifold) p (phases)
Dictionary learning Dictionary A Sparse codes xi
  • Low Rank CS/LRMC: Minimization step solves for bkb_k via local least-squares; gradient step on U followed by QR projection.
  • Phase Retrieval: Alternates signal update (projected gradient on generator manifold) and closed-form phase estimation.
  • Dictionary learning: Alternates thresholded sparse encoding and dictionary gradient update, using MU-selectors and \ell_\infty error contraction.

6. Theoretical Advances and Empirical Evidence

AltGDmin advances the field by:

  • Providing local linear convergence in sharper matrix norms (2\ell_2, F\ell_F, \ell_\infty).
  • Relaxing operator-norm requirements (e.g. in dictionary learning) to \ell_\infty-norm bounds, accommodating arbitrary overcompleteness r=poly(d)r = \mathrm{poly}(d) and optimal sparsity s=O(d)s=O(\sqrt{d}) (Chatterji et al., 2017).
  • Achieving communication efficiency and scalability in federated and distributed settings.
  • Matching or exceeding the empirical convergence rates of FactGD, projected GD, and AltMin, but at lower per-iteration and total communication cost.

Across LRCS, LRMC, phase retrieval, and robust extensions, AltGDmin consistently demonstrates iteration complexity O(κ2log(1/ϵ))O(\kappa^2 \log(1/\epsilon)) and per-iteration complexity comparable to the dominating cost of local least-squares solves. In federated LRMC on AWS, AltGDmin achieves substantial speed-ups over alternative methods and demonstrates optimal privacy-preserving communication patterns (Vaswani, 20 Apr 2025, Abbasi et al., 2024).

AltGDmin is a superset of AltMin and projected gradient descent (PGD), combining their strengths for problems where blockwise decoupling exists. It applies in settings where closed-form minimization for one block is feasible and the other admits smooth projection. Its communication and speed advantages are contingent on the precise decoupling of Zb and well-conditioned gradient steps for Za.

Phase-retrieval variants utilize generative models for projection, extending AltGDmin to nonlinear inverse problems with structured priors. Compared to direct minimization in latent space, the two-block alternation helps avoid poor local minima and enhances exploration (Hyder et al., 2019).

For dictionary learning, AltGDmin achieves improved sample complexity and sharp local convergence under weaker norm constraints than previous analyses, especially in coherent and overcomplete settings (Chatterji et al., 2017). In matrix completion and compressive sensing, sample-splitting and blockwise concentration form the basis of proof techniques, yielding sharper rates and reduced communication complexity in federated deployments (Abbasi et al., 2024, Vaswani, 2023).

Limitations arise when the decoupling of subproblems is not exact, minimization steps cannot be reliably solved, or projection is nontrivial/unavailable for Za.


AltGDmin is established as a general-purpose framework for high-dimensional, partly-decoupled optimization, exhibiting favorable convergence, iteration, and communication complexity properties for a wide class of inference problems in centralized and federated settings (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023, Chatterji et al., 2017, Hyder et al., 2019).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Alternating Projected Gradient and Minimization (AltGDmin).