FDGM-AA: Fenchel Dual Gradient with Anderson Acceleration

Updated 25 January 2026

FDGM-AA is an optimization approach that combines Anderson Acceleration with Fenchel dual gradient methods to efficiently solve distributed consensus problems.
It reformulates the global problem into local two-node subproblems, applying AA-enabled extrapolation with a safeguard to ensure convergence.
Empirical results show enhanced performance and robustness, achieving O(1/k) dual and O(1/√k) primal convergence in time-varying network settings.

The Fenchel Dual Gradient Method with Anderson Acceleration (FDGM-AA) is an optimization approach developed to solve distributed constrained optimization problems over time-varying networks. FDGM-AA integrates Anderson Acceleration (AA), originally designed for fixed-point iteration acceleration, into the Fenchel dual gradient paradigm by embedding local, edge-wise AA steps within the standard distributed gradient method, supplemented with a safeguard mechanism to ensure convergence. This formulation is particularly targeted at consensus problems with local constraints, where the agents' communication topology varies over time (Liu et al., 18 Jan 2026).

1. Problem Formulation and Fenchel Duality

The optimization setting is a distributed consensus problem over a network of $n$ agents: $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ where each $f_i: \R^d \to \R \cup \{+\infty\}$ is $\mu$ -strongly convex, possibly non-smooth (e.g., indicator functions for constraint sets). Under standard conditions (nonempty intersection of $\operatorname{dom} f_i$ ), strong duality holds.

The Fenchel conjugate $d_i$ is defined as: $d_i(w_i) := \max_{x_i \in \R^d} \left\{ w_i^\top x_i - f_i(x_i) \right\}$ and the Fenchel dual becomes

$\min_{w_1,\ldots,w_n \in \R^d} \sum_{i=1}^n d_i(w_i) \qquad \text{s.t.} \quad \sum_{i=1}^n w_i = 0$

Each $d_i$ is differentiable and $L$ -smooth with $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 0; the gradient mapping is $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 1, and one recovers the primal optimum by $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 2 (Liu et al., 18 Jan 2026).

2. Standard Fenchel Dual Gradient Method (FDGM)

FDGM, adapted to time-varying undirected network graphs $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 3 at each iteration $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 4, computes the dual update as: $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 5 where $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 6 is the weighted network Laplacian, and $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 7 is the concatenation of all dual variables. In componentwise form: $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 8 Provided $\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n$ 9 and with $f_i: \R^d \to \R \cup \{+\infty\}$ 0-connectivity (the union of network graphs over any $f_i: \R^d \to \R \cup \{+\infty\}$ 1 consecutive steps is connected), FDGM achieves exact consensus with dual error $f_i: \R^d \to \R \cup \{+\infty\}$ 2 and primal error $f_i: \R^d \to \R \cup \{+\infty\}$ 3 rates (Liu et al., 18 Jan 2026).

3. Reformulation as Local Edge Subproblems

FDGM's global update can be understood as a collection of local two-node Fenchel dual subproblems. For each edge $f_i: \R^d \to \R \cup \{+\infty\}$ 4, intermediate "gossip-variables" are introduced: $f_i: \R^d \to \R \cup \{+\infty\}$ 5 The aggregation

$f_i: \R^d \to \R \cup \{+\infty\}$ 6

corresponds to performing a single projected gradient step for each two-node subproblem: $f_i: \R^d \to \R \cup \{+\infty\}$ 7 In standard FDGM, these subproblems are solved inexactly with one gradient step; the FDGM-AA approach introduces acceleration at this local level (Liu et al., 18 Jan 2026).

4. Anderson Acceleration in Distributed Local Updates

Anderson Acceleration (AA) is integrated into FDGM by embedding edge-wise AA within each two-node subproblem. The procedure consists of:

a) Dual-gradient linearization: Each $f_i: \R^d \to \R \cup \{+\infty\}$ 8 edge maintains a history of past iterates and gradients. Form affine combinations using coefficients $f_i: \R^d \to \R \cup \{+\infty\}$ 9 over recent history, both to build a candidate next iterate and to approximate the gradient.

b) Coefficient determination via approximate KKT: The coefficients $\mu$ 0 are determined by minimizing the mismatch between approximate gradients $\mu$ 1, under the constraints that enforce consistency and normalization.

c) Anderson-type half-step update: The extrapolated local iterates $\mu$ 2 are constructed using the AA linearization and scaled differences of past gradients: $\mu$ 3 and analogously for $\mu$ 4.

d) Safe-guard/fallback mechanism: Since $\mu$ 5 is typically only smooth (not affine), AA steps need not ensure descent. A sufficient-descent safeguard is imposed: accept $\mu$ 6 only if a specific decrease in the sum $\mu$ 7 is verified, otherwise revert to the standard gradient step. This guarantees monotonicity in the global dual objective (Liu et al., 18 Jan 2026).

The local AA-enabled solutions are then re-aggregated as in standard FDGM, thus preserving distributedness at every step.

5. Convergence Properties

Global convergence of FDGM-AA is established under the following assumptions: each $\mu$ 8 is $\mu$ 9-strongly convex; the graphs $\operatorname{dom} f_i$ 0 ensure $\operatorname{dom} f_i$ 1-connectivity; step size $\operatorname{dom} f_i$ 2; and edge weights satisfy $\operatorname{dom} f_i$ 3.

The safeguard ensures that at each edge, the local dual objective decreases by at least a constant times the squared norm of gradient differences. Summing over all edges and aggregating across $\operatorname{dom} f_i$ 4 network steps, one obtains a contraction in the global dual gap: $\operatorname{dom} f_i$ 5 for the dual objective, and

$\operatorname{dom} f_i$ 6

for the primal variable convergence. The method yields $\operatorname{dom} f_i$ 7 convergence for the primal sequence and $\operatorname{dom} f_i$ 8 for the dual, matching the best-known rates for distributed first-order methods in this regime (Liu et al., 18 Jan 2026).

6. Empirical Performance and Comparative Evaluation

FDGM-AA was tested on distributed $\operatorname{dom} f_i$ 9-regularized logistic regression problems with ball constraints, set over $d_i$ 0 agents, each holding $d_i$ 1 samples in $d_i$ 2, and local variable constraints. The methods compared include FDGM-AA, vanilla FDGM, distributed projected subgradient, and proximal minimization. The metric is average squared primal error $d_i$ 3 as a function of iteration.

FDGM-AA achieves consistently superior performance to all benchmarks. The acceleration compared to vanilla FDGM increases as the number of iterations and the AA history length $d_i$ 4 are raised. The algorithm demonstrates robustness to longer network periods $d_i$ 5 and to weaker regularization $d_i$ 6 (Liu et al., 18 Jan 2026).

7. Summary and Theoretical Significance

FDGM-AA constitutes an advance in the distributed optimization of convex, possibly non-smooth, functions over time-varying networks by merging Anderson-type accelerated extrapolation into every two-node dual subproblem, regulated by a simple yet effective safeguard mechanism. This hybrid preserves distributed implementability and maintains global $d_i$ 7 dual and $d_i$ 8 primal convergence rates, while yielding significant speedup and robustness in empirical scenarios involving time-varying topologies and constrained optimization (Liu et al., 18 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Anderson Acceleration for Distributed Constrained Optimization over Time-varying Networks (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fenchel Dual Gradient Method with Anderson Acceleration (FDGM-AA).