FDGM-AA: Fenchel Dual Gradient with Anderson Acceleration
- FDGM-AA is an optimization approach that combines Anderson Acceleration with Fenchel dual gradient methods to efficiently solve distributed consensus problems.
- It reformulates the global problem into local two-node subproblems, applying AA-enabled extrapolation with a safeguard to ensure convergence.
- Empirical results show enhanced performance and robustness, achieving O(1/k) dual and O(1/√k) primal convergence in time-varying network settings.
The Fenchel Dual Gradient Method with Anderson Acceleration (FDGM-AA) is an optimization approach developed to solve distributed constrained optimization problems over time-varying networks. FDGM-AA integrates Anderson Acceleration (AA), originally designed for fixed-point iteration acceleration, into the Fenchel dual gradient paradigm by embedding local, edge-wise AA steps within the standard distributed gradient method, supplemented with a safeguard mechanism to ensure convergence. This formulation is particularly targeted at consensus problems with local constraints, where the agents' communication topology varies over time (Liu et al., 18 Jan 2026).
1. Problem Formulation and Fenchel Duality
The optimization setting is a distributed consensus problem over a network of agents: where each is -strongly convex, possibly non-smooth (e.g., indicator functions for constraint sets). Under standard conditions (nonempty intersection of ), strong duality holds.
The Fenchel conjugate is defined as: and the Fenchel dual becomes
Each is differentiable and -smooth with ; the gradient mapping is , and one recovers the primal optimum by (Liu et al., 18 Jan 2026).
2. Standard Fenchel Dual Gradient Method (FDGM)
FDGM, adapted to time-varying undirected network graphs at each iteration , computes the dual update as: where is the weighted network Laplacian, and is the concatenation of all dual variables. In componentwise form: Provided and with -connectivity (the union of network graphs over any consecutive steps is connected), FDGM achieves exact consensus with dual error and primal error rates (Liu et al., 18 Jan 2026).
3. Reformulation as Local Edge Subproblems
FDGM's global update can be understood as a collection of local two-node Fenchel dual subproblems. For each edge , intermediate "gossip-variables" are introduced: The aggregation
corresponds to performing a single projected gradient step for each two-node subproblem: In standard FDGM, these subproblems are solved inexactly with one gradient step; the FDGM-AA approach introduces acceleration at this local level (Liu et al., 18 Jan 2026).
4. Anderson Acceleration in Distributed Local Updates
Anderson Acceleration (AA) is integrated into FDGM by embedding edge-wise AA within each two-node subproblem. The procedure consists of:
a) Dual-gradient linearization: Each edge maintains a history of past iterates and gradients. Form affine combinations using coefficients over recent history, both to build a candidate next iterate and to approximate the gradient.
b) Coefficient determination via approximate KKT: The coefficients are determined by minimizing the mismatch between approximate gradients , under the constraints that enforce consistency and normalization.
c) Anderson-type half-step update: The extrapolated local iterates are constructed using the AA linearization and scaled differences of past gradients: and analogously for .
d) Safe-guard/fallback mechanism: Since is typically only smooth (not affine), AA steps need not ensure descent. A sufficient-descent safeguard is imposed: accept only if a specific decrease in the sum is verified, otherwise revert to the standard gradient step. This guarantees monotonicity in the global dual objective (Liu et al., 18 Jan 2026).
The local AA-enabled solutions are then re-aggregated as in standard FDGM, thus preserving distributedness at every step.
5. Convergence Properties
Global convergence of FDGM-AA is established under the following assumptions: each is -strongly convex; the graphs ensure -connectivity; step size ; and edge weights satisfy .
The safeguard ensures that at each edge, the local dual objective decreases by at least a constant times the squared norm of gradient differences. Summing over all edges and aggregating across network steps, one obtains a contraction in the global dual gap: for the dual objective, and
for the primal variable convergence. The method yields convergence for the primal sequence and for the dual, matching the best-known rates for distributed first-order methods in this regime (Liu et al., 18 Jan 2026).
6. Empirical Performance and Comparative Evaluation
FDGM-AA was tested on distributed -regularized logistic regression problems with ball constraints, set over agents, each holding samples in , and local variable constraints. The methods compared include FDGM-AA, vanilla FDGM, distributed projected subgradient, and proximal minimization. The metric is average squared primal error as a function of iteration.
FDGM-AA achieves consistently superior performance to all benchmarks. The acceleration compared to vanilla FDGM increases as the number of iterations and the AA history length are raised. The algorithm demonstrates robustness to longer network periods and to weaker regularization (Liu et al., 18 Jan 2026).
7. Summary and Theoretical Significance
FDGM-AA constitutes an advance in the distributed optimization of convex, possibly non-smooth, functions over time-varying networks by merging Anderson-type accelerated extrapolation into every two-node dual subproblem, regulated by a simple yet effective safeguard mechanism. This hybrid preserves distributed implementability and maintains global dual and primal convergence rates, while yielding significant speedup and robustness in empirical scenarios involving time-varying topologies and constrained optimization (Liu et al., 18 Jan 2026).