AltGDmin: Alternating Projected Gradient & Minimization

Updated 30 January 2026

AltGDmin is a high-dimensional optimization framework that partitions variables into global (Za) and decoupled (Zb) blocks, enabling tailored updates.
It integrates block coordinate descent with closed-form minimization and QR-based projections to achieve geometric convergence and improved sample complexity.
AltGDmin excels in federated and distributed settings by reducing per-iteration communication costs while speeding up convergence.

Alternating Projected Gradient and Minimization (AltGDmin) is an optimization strategy designed for high-dimensional inference problems that admit a natural partitioning of variables into two blocks—one “slow” (Za) and one “decoupled” (Zb). The decoupling of the Zb variables enables per-block minimization steps that are computationally cheap and highly parallelizable, especially in federated data architectures. The Za variables are updated via projected gradient descent. AltGDmin thus integrates block coordinate descent, closed-form block minimization, and geometric updates via projected gradient steps. This framework yields convergence rates and communication efficiencies that surpass classical alternating minimization (AltMin) and standard gradient descent approaches for a broad class of partly-decoupled nonconvex problems, including low rank matrix recovery, matrix completion, robust PCA, compressive sensing, phase retrieval, and dictionary learning (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023, Chatterji et al., 2017, Hyder et al., 2019).

1. Formal Framework and Problem Structure

Let $Z = (Z_a, Z_b)$ denote the split variables, where $Z_a \in \mathcal{X}$ typically represents global structure (e.g. orthonormal basis, dictionary) and $Z_b \in \mathcal{Z}$ comprises local, decoupled components (e.g. coefficients, column vectors, labels). The data is denoted $D = \bigcup_\ell D_\ell$ in federated settings. The objective function takes the form

$f(Z_a, Z_b; D) = \sum_{\ell=1}^\gamma f_\ell(Z_a, (Z_b)_\ell; D_\ell)$

The function is differentiable in $Z_a$ and “block-separable” in $Z_b$ , meaning the minimization with respect to $Z_b$ splits into $\gamma$ independent low-dimensional subproblems:

$\min_{Z_b} f(Z_a, Z_b) = \sum_\ell \min_{(Z_b)_\ell} f_\ell(Z_a, (Z_b)_\ell)$

This structure is present in low rank matrix completion (LRMC), low rank column-wise compressive sensing (LRCS), phase retrieval, tensor extensions, clustering, and mixed linear regression (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023).

2. Iterative Procedure: Alternating Minimization and Projected Gradient

AltGDmin consists of cycling between:

Zb–step (Minimization): For each block $\ell=1,\dots,\gamma$ , solve

$(Z_b)_\ell^{(t)} = \arg\min_{(Z_b)_\ell} f_\ell(Z_a^{(t-1)}, (Z_b)_\ell; D_\ell)$

These solves are typically exact and exploit the decoupled geometry (e.g. column-wise least squares in LRCS).

Za–step (Projected Gradient): Compute gradient

$G^{(t)} := \nabla_{Z_a} f(Z_a^{(t-1)}, Z_b^{(t)}; D)$

Perform a gradient descent step and project back onto the feasible set $\mathcal{X}$ (e.g. Stiefel manifold for orthonormal bases):

$\widetilde{Z_a} = Z_a^{(t-1)} - \eta G^{(t)}$

$Z_a^{(t)} = \mathrm{Proj}_{\mathcal{X}}(\widetilde{Z_a})$

In many high-dimensional applications, $\mathrm{Proj}_{\mathcal{X}}$ is realized via a thin QR decomposition (Vaswani, 20 Apr 2025).

Pseudocode Summary:

# Algorithm AltGDmin
for t in range(T):
    # Zb-step (local, parallel)
    for l in range(γ):
        (Zb_l)[t] = argmin_{Zb_l} f_l(Za[t-1], Zb_l; D_l)
        send g_l = ∇_{Za} f_l(Za[t-1], (Zb_l)[t]) to server
    # Za-step (global)
    Gather g = Σ_l g_l
    Za_tilde = Za[t-1] - η * g
    Za[t] = QR(Za_tilde) # projection
    Broadcast Za[t]

3. Convergence Theory and Complexity Bounds

Given $f$ is $L$ -smooth in $Z_a$ and block-decoupled in $Z_b$ , and assuming exact solutions for Zb-steps, AltGDmin admits rigorous geometric convergence. For the low rank column-wise sensing (LRCS) and matrix completion (LRMC) tasks:

Sample Complexity (LRCS):

$m q \gtrsim \kappa^4 \mu^2 (n+q) r \left( \kappa^4 r + \log(1/\epsilon) \right), \quad m \gtrsim \log(1/\epsilon)$

Iteration Complexity:

$T = O(\kappa^2 \log(1/\epsilon))$

Error Bounds:

$\operatorname{SubsDist}_2(U^{(T)}, U^*) \leq \epsilon, \qquad \|b_k^{(T)} - b_k^*\| \leq \|b_k^*\|$

Analogous results hold for phase retrieval and dictionary learning. In LRMC, AltGDmin achieves sample and communication complexity nearly optimal among iterative solvers. Convergence analysis leverages spectral initialization, incoherence propagation, blockwise error contraction, and matrix concentration inequalities (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023, Chatterji et al., 2017, Hyder et al., 2019).

4. Federated and Parallel Implementation

AltGDmin is particularly suited to federated learning with vertically partitioned data. Each node manages a local block $D_\ell$ :

Local computation: Zb-step proceeds without communication.
Global update: Only gradients of the Za-block are communicated.
Communication complexity per iteration: $O(n r)$ total, optimal in $r$ , with total communication $O(n r T)$ .

The minimization step is both privacy-preserving (raw data never leaves local nodes) and computationally efficient. Empirical studies demonstrate AltGDmin is $5$– $10\times$ faster and $5$– $10\times$ more communication-efficient than AltMin and FactGD in federated LRMC deployments (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023).

5. Specializations and Applications

Table: Representative problem classes expressible in AltGDmin's block structure.

Application	Za (Global block)	Zb (Decoupled block)
Low-rank matrix recovery	U (basis)	B (coefficients)
Phase retrieval w/ priors	x (signal on manifold)	p (phases)
Dictionary learning	Dictionary A	Sparse codes xⁱ

Low Rank CS/LRMC: Minimization step solves for $b_k$ via local least-squares; gradient step on U followed by QR projection.
Phase Retrieval: Alternates signal update (projected gradient on generator manifold) and closed-form phase estimation.
Dictionary learning: Alternates thresholded sparse encoding and dictionary gradient update, using MU-selectors and $\ell_\infty$ error contraction.

6. Theoretical Advances and Empirical Evidence

AltGDmin advances the field by:

Providing local linear convergence in sharper matrix norms ( $\ell_2$ , $\ell_F$ , $\ell_\infty$ ).
Relaxing operator-norm requirements (e.g. in dictionary learning) to $\ell_\infty$ -norm bounds, accommodating arbitrary overcompleteness $r = \mathrm{poly}(d)$ and optimal sparsity $s=O(\sqrt{d})$ (Chatterji et al., 2017).
Achieving communication efficiency and scalability in federated and distributed settings.
Matching or exceeding the empirical convergence rates of FactGD, projected GD, and AltMin, but at lower per-iteration and total communication cost.

Across LRCS, LRMC, phase retrieval, and robust extensions, AltGDmin consistently demonstrates iteration complexity $O(\kappa^2 \log(1/\epsilon))$ and per-iteration complexity comparable to the dominating cost of local least-squares solves. In federated LRMC on AWS, AltGDmin achieves substantial speed-ups over alternative methods and demonstrates optimal privacy-preserving communication patterns (Vaswani, 20 Apr 2025, Abbasi et al., 2024).

AltGDmin is a superset of AltMin and projected gradient descent (PGD), combining their strengths for problems where blockwise decoupling exists. It applies in settings where closed-form minimization for one block is feasible and the other admits smooth projection. Its communication and speed advantages are contingent on the precise decoupling of Zb and well-conditioned gradient steps for Za.

Phase-retrieval variants utilize generative models for projection, extending AltGDmin to nonlinear inverse problems with structured priors. Compared to direct minimization in latent space, the two-block alternation helps avoid poor local minima and enhances exploration (Hyder et al., 2019).

For dictionary learning, AltGDmin achieves improved sample complexity and sharp local convergence under weaker norm constraints than previous analyses, especially in coherent and overcomplete settings (Chatterji et al., 2017). In matrix completion and compressive sensing, sample-splitting and blockwise concentration form the basis of proof techniques, yielding sharper rates and reduced communication complexity in federated deployments (Abbasi et al., 2024, Vaswani, 2023).

Limitations arise when the decoupling of subproblems is not exact, minimization steps cannot be reliably solved, or projection is nontrivial/unavailable for Za.

AltGDmin is established as a general-purpose framework for high-dimensional, partly-decoupled optimization, exhibiting favorable convergence, iteration, and communication complexity properties for a wide class of inference problems in centralized and federated settings (Vaswani, 20 Apr 2025, Abbasi et al., 2024, Vaswani, 2023, Chatterji et al., 2017, Hyder et al., 2019).