Papers
Topics
Authors
Recent
Search
2000 character limit reached

FDGM-AA: Fenchel Dual Gradient with Anderson Acceleration

Updated 25 January 2026
  • FDGM-AA is an optimization approach that combines Anderson Acceleration with Fenchel dual gradient methods to efficiently solve distributed consensus problems.
  • It reformulates the global problem into local two-node subproblems, applying AA-enabled extrapolation with a safeguard to ensure convergence.
  • Empirical results show enhanced performance and robustness, achieving O(1/k) dual and O(1/√k) primal convergence in time-varying network settings.

The Fenchel Dual Gradient Method with Anderson Acceleration (FDGM-AA) is an optimization approach developed to solve distributed constrained optimization problems over time-varying networks. FDGM-AA integrates Anderson Acceleration (AA), originally designed for fixed-point iteration acceleration, into the Fenchel dual gradient paradigm by embedding local, edge-wise AA steps within the standard distributed gradient method, supplemented with a safeguard mechanism to ensure convergence. This formulation is particularly targeted at consensus problems with local constraints, where the agents' communication topology varies over time (Liu et al., 18 Jan 2026).

1. Problem Formulation and Fenchel Duality

The optimization setting is a distributed consensus problem over a network of nn agents: minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n where each fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\} is μ\mu-strongly convex, possibly non-smooth (e.g., indicator functions for constraint sets). Under standard conditions (nonempty intersection of domfi\operatorname{dom} f_i), strong duality holds.

The Fenchel conjugate did_i is defined as: di(wi):=maxxiRd{wixifi(xi)}d_i(w_i) := \max_{x_i \in \R^d} \left\{ w_i^\top x_i - f_i(x_i) \right\} and the Fenchel dual becomes

minw1,,wnRdi=1ndi(wi)s.t.i=1nwi=0\min_{w_1,\ldots,w_n \in \R^d} \sum_{i=1}^n d_i(w_i) \qquad \text{s.t.} \quad \sum_{i=1}^n w_i = 0

Each did_i is differentiable and LL-smooth with minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n0; the gradient mapping is minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n1, and one recovers the primal optimum by minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n2 (Liu et al., 18 Jan 2026).

2. Standard Fenchel Dual Gradient Method (FDGM)

FDGM, adapted to time-varying undirected network graphs minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n3 at each iteration minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n4, computes the dual update as: minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n5 where minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n6 is the weighted network Laplacian, and minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n7 is the concatenation of all dual variables. In componentwise form: minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n8 Provided minx1,,xnRdi=1nfi(xi)s.t.x1=x2==xn\min_{x_1,\dots,x_n\in\R^d} \sum_{i=1}^n f_i(x_i) \qquad \text{s.t.} \quad x_1 = x_2 = \ldots = x_n9 and with fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}0-connectivity (the union of network graphs over any fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}1 consecutive steps is connected), FDGM achieves exact consensus with dual error fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}2 and primal error fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}3 rates (Liu et al., 18 Jan 2026).

3. Reformulation as Local Edge Subproblems

FDGM's global update can be understood as a collection of local two-node Fenchel dual subproblems. For each edge fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}4, intermediate "gossip-variables" are introduced: fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}5 The aggregation

fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}6

corresponds to performing a single projected gradient step for each two-node subproblem: fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}7 In standard FDGM, these subproblems are solved inexactly with one gradient step; the FDGM-AA approach introduces acceleration at this local level (Liu et al., 18 Jan 2026).

4. Anderson Acceleration in Distributed Local Updates

Anderson Acceleration (AA) is integrated into FDGM by embedding edge-wise AA within each two-node subproblem. The procedure consists of:

a) Dual-gradient linearization: Each fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}8 edge maintains a history of past iterates and gradients. Form affine combinations using coefficients fi:RdR{+}f_i: \R^d \to \R \cup \{+\infty\}9 over recent history, both to build a candidate next iterate and to approximate the gradient.

b) Coefficient determination via approximate KKT: The coefficients μ\mu0 are determined by minimizing the mismatch between approximate gradients μ\mu1, under the constraints that enforce consistency and normalization.

c) Anderson-type half-step update: The extrapolated local iterates μ\mu2 are constructed using the AA linearization and scaled differences of past gradients: μ\mu3 and analogously for μ\mu4.

d) Safe-guard/fallback mechanism: Since μ\mu5 is typically only smooth (not affine), AA steps need not ensure descent. A sufficient-descent safeguard is imposed: accept μ\mu6 only if a specific decrease in the sum μ\mu7 is verified, otherwise revert to the standard gradient step. This guarantees monotonicity in the global dual objective (Liu et al., 18 Jan 2026).

The local AA-enabled solutions are then re-aggregated as in standard FDGM, thus preserving distributedness at every step.

5. Convergence Properties

Global convergence of FDGM-AA is established under the following assumptions: each μ\mu8 is μ\mu9-strongly convex; the graphs domfi\operatorname{dom} f_i0 ensure domfi\operatorname{dom} f_i1-connectivity; step size domfi\operatorname{dom} f_i2; and edge weights satisfy domfi\operatorname{dom} f_i3.

The safeguard ensures that at each edge, the local dual objective decreases by at least a constant times the squared norm of gradient differences. Summing over all edges and aggregating across domfi\operatorname{dom} f_i4 network steps, one obtains a contraction in the global dual gap: domfi\operatorname{dom} f_i5 for the dual objective, and

domfi\operatorname{dom} f_i6

for the primal variable convergence. The method yields domfi\operatorname{dom} f_i7 convergence for the primal sequence and domfi\operatorname{dom} f_i8 for the dual, matching the best-known rates for distributed first-order methods in this regime (Liu et al., 18 Jan 2026).

6. Empirical Performance and Comparative Evaluation

FDGM-AA was tested on distributed domfi\operatorname{dom} f_i9-regularized logistic regression problems with ball constraints, set over did_i0 agents, each holding did_i1 samples in did_i2, and local variable constraints. The methods compared include FDGM-AA, vanilla FDGM, distributed projected subgradient, and proximal minimization. The metric is average squared primal error did_i3 as a function of iteration.

FDGM-AA achieves consistently superior performance to all benchmarks. The acceleration compared to vanilla FDGM increases as the number of iterations and the AA history length did_i4 are raised. The algorithm demonstrates robustness to longer network periods did_i5 and to weaker regularization did_i6 (Liu et al., 18 Jan 2026).

7. Summary and Theoretical Significance

FDGM-AA constitutes an advance in the distributed optimization of convex, possibly non-smooth, functions over time-varying networks by merging Anderson-type accelerated extrapolation into every two-node dual subproblem, regulated by a simple yet effective safeguard mechanism. This hybrid preserves distributed implementability and maintains global did_i7 dual and did_i8 primal convergence rates, while yielding significant speedup and robustness in empirical scenarios involving time-varying topologies and constrained optimization (Liu et al., 18 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fenchel Dual Gradient Method with Anderson Acceleration (FDGM-AA).