Papers
Topics
Authors
Recent
Search
2000 character limit reached

FDGM-AA: Fenchel Dual Gradient with Anderson Acceleration

Updated 25 January 2026
  • FDGM-AA is an optimization approach that combines Anderson Acceleration with Fenchel dual gradient methods to efficiently solve distributed consensus problems.
  • It reformulates the global problem into local two-node subproblems, applying AA-enabled extrapolation with a safeguard to ensure convergence.
  • Empirical results show enhanced performance and robustness, achieving O(1/k) dual and O(1/√k) primal convergence in time-varying network settings.

The Fenchel Dual Gradient Method with Anderson Acceleration (FDGM-AA) is an optimization approach developed to solve distributed constrained optimization problems over time-varying networks. FDGM-AA integrates Anderson Acceleration (AA), originally designed for fixed-point iteration acceleration, into the Fenchel dual gradient paradigm by embedding local, edge-wise AA steps within the standard distributed gradient method, supplemented with a safeguard mechanism to ensure convergence. This formulation is particularly targeted at consensus problems with local constraints, where the agents' communication topology varies over time (Liu et al., 18 Jan 2026).

1. Problem Formulation and Fenchel Duality

The optimization setting is a distributed consensus problem over a network of nn agents: minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n where each fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\} is μ\mu-strongly convex, possibly non-smooth (e.g., indicator functions for constraint sets). Under standard conditions (nonempty intersection of domfi\operatorname{dom} f_i), strong duality holds.

The Fenchel conjugate did_i is defined as: di(wi):=maxxiRd{wixifi(xi)}d_i(w_i) := \max_{x_i \in \R^d} \left\{ w_i^\top x_i - f_i(x_i) \right\} and the Fenchel dual becomes

minw1,,wnRdi=1ndi(wi)s.t.i=1nwi=0\min_{w_1,\ldots,w_n \in \R^d} \sum_{i=1}^n d_i(w_i) \qquad \text{s.t.} \quad \sum_{i=1}^n w_i = 0

Each did_i is differentiable and LL-smooth with L=1/μL = 1/\mu; the gradient mapping is di(wi)=argmaxxi{wixifi(xi)}\nabla d_i(w_i) = \arg\max_{x_i}\{ w_i^\top x_i - f_i(x_i) \}, and one recovers the primal optimum by xi=di(wi)x^*_i = \nabla d_i(w^*_i) (Liu et al., 18 Jan 2026).

2. Standard Fenchel Dual Gradient Method (FDGM)

FDGM, adapted to time-varying undirected network graphs Gk=(V,Ek)G^k=(\mathcal{V},\mathcal{E}^k) at each iteration kk, computes the dual update as: wk+1=wkβ(HGkId)D(wk)\mathbf{w}^{k+1} = \mathbf{w}^k - \beta (H_{G^k} \otimes I_d) \nabla D(\mathbf{w}^k) where HGkH_{G^k} is the weighted network Laplacian, and w\mathbf{w} is the concatenation of all dual variables. In componentwise form: wik+1=wikβjNikhijk(di(wik)dj(wjk))w_i^{k+1} = w_i^k - \beta \sum_{j \in \mathcal{N}_i^k} h_{ij}^k \left( \nabla d_i(w_i^k) - \nabla d_j(w_j^k) \right) Provided β(0,1/L)\beta \in (0, 1/L) and with BB-connectivity (the union of network graphs over any BB consecutive steps is connected), FDGM achieves exact consensus with dual error O(1/k)O(1/k) and primal error O(1/k)O(1/\sqrt{k}) rates (Liu et al., 18 Jan 2026).

3. Reformulation as Local Edge Subproblems

FDGM's global update can be understood as a collection of local two-node Fenchel dual subproblems. For each edge {i,j}\{i,j\}, intermediate "gossip-variables" are introduced: wijk+1/2=wikβ(di(wik)dj(wjk)) wjik+1/2=wjkβ(dj(wjk)di(wik))\begin{aligned} w_{ij}^{k+1/2} &= w_i^k - \beta \left( \nabla d_i(w_i^k) - \nabla d_j(w_j^k) \right) \ w_{ji}^{k+1/2} &= w_j^k - \beta \left( \nabla d_j(w_j^k) - \nabla d_i(w_i^k) \right) \end{aligned} The aggregation

wik+1=(1jNikhijk)wik+jNikhijkwijk+1/2w_i^{k+1} = \left(1-\sum_{j \in \mathcal{N}_i^k} h_{ij}^k\right)w_i^k + \sum_{j \in \mathcal{N}_i^k} h_{ij}^k w_{ij}^{k+1/2}

corresponds to performing a single projected gradient step for each two-node subproblem: minwij,wjidi(wij)+dj(wji)s.t. wij+wji=wik+wjk\min_{w_{ij}, w_{ji}} \quad d_i(w_{ij}) + d_j(w_{ji}) \quad \text{s.t. } w_{ij} + w_{ji} = w_i^k + w_j^k In standard FDGM, these subproblems are solved inexactly with one gradient step; the FDGM-AA approach introduces acceleration at this local level (Liu et al., 18 Jan 2026).

4. Anderson Acceleration in Distributed Local Updates

Anderson Acceleration (AA) is integrated into FDGM by embedding edge-wise AA within each two-node subproblem. The procedure consists of:

a) Dual-gradient linearization: Each (i,j)(i,j) edge maintains a history of past iterates and gradients. Form affine combinations using coefficients αijt,k\alpha_{ij}^{t,k} over recent history, both to build a candidate next iterate and to approximate the gradient.

b) Coefficient determination via approximate KKT: The coefficients αijk,αjik\alpha_{ij}^k, \alpha_{ji}^k are determined by minimizing the mismatch between approximate gradients DijkαijkDjikαjik2\Vert D_{ij}^k \alpha_{ij}^k - D_{ji}^k \alpha_{ji}^k \Vert^2, under the constraints that enforce consistency and normalization.

c) Anderson-type half-step update: The extrapolated local iterates (wˉijk+1/2,wˉjik+1/2)(\bar w_{ij}^{k+1/2}, \bar w_{ji}^{k+1/2}) are constructed using the AA linearization and scaled differences of past gradients: wˉijk+1/2=w~ijk+1/2β(DijkαijkDjikαjik)\bar w_{ij}^{k+1/2} = \tilde w_{ij}^{k+1/2} - \beta (D_{ij}^k\,\alpha_{ij}^k - D_{ji}^k\,\alpha_{ji}^k) and analogously for wˉjik+1/2\bar w_{ji}^{k+1/2}.

d) Safe-guard/fallback mechanism: Since did_i is typically only smooth (not affine), AA steps need not ensure descent. A sufficient-descent safeguard is imposed: accept (wˉijk+1/2,wˉjik+1/2)(\bar w_{ij}^{k+1/2}, \bar w_{ji}^{k+1/2}) only if a specific decrease in the sum di+djd_i + d_j is verified, otherwise revert to the standard gradient step. This guarantees monotonicity in the global dual objective (Liu et al., 18 Jan 2026).

The local AA-enabled solutions are then re-aggregated as in standard FDGM, thus preserving distributedness at every step.

5. Convergence Properties

Global convergence of FDGM-AA is established under the following assumptions: each fif_i is μ\mu-strongly convex; the graphs {Gk}\{G^k\} ensure BB-connectivity; step size β(0,1/L)\beta \in (0,1/L); and edge weights satisfy jhijk1\sum_j h_{ij}^k \le 1.

The safeguard ensures that at each edge, the local dual objective decreases by at least a constant times the squared norm of gradient differences. Summing over all edges and aggregating across BB network steps, one obtains a contraction in the global dual gap: D(wk)D=O(1/k)D(\mathbf{w}^k) - D^* = O(1/k) for the dual objective, and

xkx=O(1/k)\|\mathbf{x}^k - \mathbf{x}^*\| = O(1/\sqrt{k})

for the primal variable convergence. The method yields O(1/k)O(1/\sqrt{k}) convergence for the primal sequence and O(1/k)O(1/k) for the dual, matching the best-known rates for distributed first-order methods in this regime (Liu et al., 18 Jan 2026).

6. Empirical Performance and Comparative Evaluation

FDGM-AA was tested on distributed 2\ell_2-regularized logistic regression problems with ball constraints, set over n=30n=30 agents, each holding M=20M=20 samples in R20\R^{20}, and local variable constraints. The methods compared include FDGM-AA, vanilla FDGM, distributed projected subgradient, and proximal minimization. The metric is average squared primal error 1nixikx2\frac{1}{n}\sum_{i} \|x_i^k - x^*\|^2 as a function of iteration.

FDGM-AA achieves consistently superior performance to all benchmarks. The acceleration compared to vanilla FDGM increases as the number of iterations and the AA history length mm are raised. The algorithm demonstrates robustness to longer network periods BB and to weaker regularization λ\lambda (Liu et al., 18 Jan 2026).

7. Summary and Theoretical Significance

FDGM-AA constitutes an advance in the distributed optimization of convex, possibly non-smooth, functions over time-varying networks by merging Anderson-type accelerated extrapolation into every two-node dual subproblem, regulated by a simple yet effective safeguard mechanism. This hybrid preserves distributed implementability and maintains global O(1/k)O(1/k) dual and O(1/k)O(1/\sqrt{k}) primal convergence rates, while yielding significant speedup and robustness in empirical scenarios involving time-varying topologies and constrained optimization (Liu et al., 18 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fenchel Dual Gradient Method with Anderson Acceleration (FDGM-AA).