Papers
Topics
Authors
Recent
Search
2000 character limit reached

FedSUM: Efficient Federated Learning Algorithms

Updated 27 December 2025
  • FedSUM family of algorithms is a suite of federated learning methods that integrate delay metrics and a stochastic uplink-merge to manage irregular client participation and data heterogeneity.
  • They employ rigorous delay metrics to capture per-round, maximum, and average staleness, ensuring robust convergence guarantees under standard smoothness and variance assumptions.
  • The three variants—FedSUM-B, FedSUM, and FedSUM-CR—offer practical trade-offs between local computation, communication overhead, and memory usage for real-world deployments.

The FedSUM family comprises a suite of federated learning (FL) algorithms that address the challenge of arbitrary client participation patterns in practical distributed optimization. Unlike prior FL methods restricted to idealized participation assumptions or requiring additional constraints on client data heterogeneity, the FedSUM family unifies handling of both temporally irregular client activity and non-i.i.d. data with minimal assumptions. Central to these methods are delay metrics that quantify staleness due to intermittent client participation, and a “Stochastic Uplink-Merge” technique that robustly merges stale and fresh gradient contributions to correct local update bias and reduce communication overhead. The FedSUM family includes three variants—FedSUM-B (basic, without local steps), FedSUM (with local updates), and FedSUM-CR (communication-reduced)—that collectively generalize, and in specific regimes, recover, the behavior of popular FL algorithms such as FedAvg and SCAFFOLD. Convergence guarantees for all FedSUM variants hold under standard smoothness and bounded-variance assumptions, regardless of participation heterogeneity (You et al., 20 Dec 2025).

1. Delay Metrics and Participation Modeling

The FedSUM family introduces a rigorous framework for capturing variability in client participation using two delay metrics:

  • Per-round delay τt\tau_t: For round tt, τt=maxi(tai,t)\tau_t = \max_{i} (t - a_{i,t}), where ai,ta_{i,t} is the last round in which client ii was active.
  • Maximum delay τmax\tau_{\max}: Over TT rounds, τmax=max0t<Tτt\tau_{\max} = \max_{0 \leq t < T} \tau_t bounds the greatest inactivity gap among all clients.
  • Average delay τavg\tau_{\text{avg}}: Given by τavg=(1/T)t=0T1τt\tau_{\text{avg}} = (1/T)\sum_{t=0}^{T-1} \tau_t, it quantifies the typical staleness in the system.

Arbitrary participation, deterministic or random, is modeled entirely through these delay measures, allowing the framework to subsume nonuniform, cyclic, or adversarial dropouts as long as delays remain sub-linear (You et al., 20 Dec 2025).

2. Algorithmic Structure and Variants

Each FedSUM variant maintains a global model x(t)x^{(t)}, a server control vector y(t)y^{(t)}, and per-client control vectors hi(t)h_i^{(t)}. Their principal distinction lies in their local computation strategies and communication patterns.

Variant Local Steps Uplink Downlink
FedSUM-B None (single-step) δi\delta_i x(t)x^{(t)}
FedSUM KK SGD steps δi\delta_i x(t),y(t1)x^{(t)}, y^{(t-1)}
FedSUM-CR KK SGD steps δi\delta_i x(t)x^{(t)}
  • FedSUM-B: Active clients compute a mini-batch gradient at x(t)x^{(t)}, transmit the increment δi(t)=gi(x(t))hi(t)\delta_i^{(t)} = g_i(x^{(t)}) - h_i^{(t)}, and update their controls; the server updates the global model solely from aggregated deltas. No downlink of server control is needed.
  • FedSUM: Clients perform KK local SGD steps at each round, using server-provided y(t1)y^{(t-1)} for variance correction. Clients return a single delta encoding both model and control adjustments.
  • FedSUM-CR: Eliminates the extra downlink by letting clients reconstruct the correction direction locally from stored (ai,zi,hi)(a_i, z_i, h_i). Local control overhead increases, but downlink cost matches FedSUM-B.

The shared “Stochastic Uplink-Merge” protocol underlies all methods, robustly combining fresh and stale client contributions regardless of participation irregularity (You et al., 20 Dec 2025).

3. Convergence Guarantees and Theoretical Properties

Under the sole requirements that each fif_i is LL-smooth and each stochastic gradient has variance at most σ2\sigma^2, the FedSUM algorithms converge to a stationary point of f(x)=N1ifi(x)f(x) = N^{-1}\sum_i f_i(x) at a quantifiably delay-dependent rate. Explicitly, for properly chosen stepsizes

ηg=1/τmax,ηl=min{110τmaxKL,NτmaxΔfmax{1,τavg}KTLσ2}\eta_g = 1/\sqrt{\tau_{\max}},\quad \eta_l = \min\left\{\frac{1}{10 \sqrt{\tau_{\max} K L}},\,\sqrt{\frac{N \tau_{\max} \Delta_f}{\max\{1,\tau_{\text{avg}}\} K T L \sigma^2}}\right\}

the following holds for any (possibly adversarial) client activity sequence: 1Tt=0T1E[f(x(t))2]30max{1,τavg}Lσ2ΔfNKT+20τmaxLΔf+F0T\frac{1}{T}\sum_{t=0}^{T-1} \mathbb{E}[\|\nabla f(x^{(t)})\|^2] \leq 30 \sqrt{\frac{\max\{1,\tau_{\text{avg}}\}L \sigma^2 \Delta_f}{N K T}} + 20 \tau_{\max} \frac{L \Delta_f + F_0}{T} where Δf=f(x(0))f\Delta_f = f(x^{(0)})-f^* and F0=N1ifi(x(0))2F_0 = N^{-1}\sum_i \|\nabla f_i(x^{(0)})\|^2. For random participation, these bounds hold in expectation over the delay metrics.

This result unifies prior FL convergence analyses, recovers established rates for uniform or cyclical participation, and quantifies the precise effect of staleness via τmax\tau_{\max} and τavg\tau_{\text{avg}} (You et al., 20 Dec 2025).

4. Assumption Minimality and Heterogeneity

Contrasting with FedAvg and SCAFFOLD, which require constraints such as bounded gradient dissimilarity or two-vector communication, FedSUM imposes no restriction on fi(x)f(x)\|\nabla f_i(x) - \nabla f(x)\| for client distributions. The only requirements are LL-smoothness and bounded variance, alongside explicit tracking of delay metrics. This applicability encompasses:

  • Uniform and independent sampling (with piδp_i \geq \delta),
  • Cyclic or reshuffled scheduling,
  • Adversarially determined dropouts, as long as delay growth is sub-linear.

Consequently, FedSUM addresses a broader range of practical scenarios with minimal modeling overhead (You et al., 20 Dec 2025).

5. Communication and Computational Complexity

Communication overhead per round across variants is summarized as follows:

  • FedSUM-B: Uplink—1 vector δi\delta_i; Downlink—1 vector x(t)x^{(t)}.
  • FedSUM: Uplink—1 vector δi\delta_i; Downlink—2 vectors (x(t),y(t1))(x^{(t)}, y^{(t-1)}).
  • FedSUM-CR: Uplink—1 vector δi\delta_i; Downlink—1 vector x(t)x^{(t)}.

FedSUM-B and FedSUM-CR match FedAvg’s minimal downlink cost; FedSUM matches SCAFFOLD for uplink, but doubles the downlink bandwidth. FedSUM-CR trades greater local client memory—storing (ai,zi,hi)(a_i, z_i, h_i) of model dimension—for reduced communication, typically favorable when downlink bandwidth is constrained. Computation per round is dominated by KK local gradient evaluations, which is uniform across the family. Global server-side storage is limited to maintenance of the control vector y(t)y^{(t)} (You et al., 20 Dec 2025).

6. Parameterization and Practical Deployment

Recommended parameter choices for the FedSUM family are as follows:

  • Batch size KK: Increasing KK reduces the variance component σ2/(NKT)\sigma^2/(NKT) but exacerbates staleness error if clients are inactive across many rounds. Typical values: K10K\sim10–$50$.
  • ηg\eta_g: Set as 1/τmax1/\sqrt{\tau_{\max}}; conservatively, use an upper bound on τmax\tau_{\max} (e.g., N/SN/S for cyclic regimes).
  • ηl\eta_l: Not to exceed 1/(10τmaxKL)1/(10\sqrt{\tau_{\max}KL}); initial values often 10210^{-2} or 10310^{-3}, scaled relative to FedAvg by N\sqrt{N}.
  • Local steps KK: As in FedAvg, generally $1$–$10$; increasing trades off higher computation for fewer communications.
  • Initialization: hi(0)=0,y(1)=0h_i^{(0)}=0, y^{(-1)}=0.
  • Client memory: FedSUM-CR requires O(p)O(p) for model dimension pp, typically negligible compared to the overall model size.

Selection among FedSUM-B, FedSUM, and FedSUM-CR is dictated by communication and memory trade-offs, with FedSUM-CR commonly preferred under tight downlink constraints (You et al., 20 Dec 2025).

7. Significance and Scope

The FedSUM family provides an extensible and communication-efficient foundation for federated nonconvex optimization under fully arbitrary client availability, without imposing restrictive data or participation assumptions. Its delay-metric analysis unifies prior special-case results, and its flexible “stochastic uplink-merge” protocol integrates contributions from both fresh and stale client updates. Its convergence analysis directly addresses the general federated setting encountered in real-world deployments, permitting practitioners to select algorithmic variants suited to system bandwidth and memory constraints, as well as desired computation-to-communication ratios (You et al., 20 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FedSUM Family of Algorithms.