FedSUM: Efficient Federated Learning Algorithms
- FedSUM family of algorithms is a suite of federated learning methods that integrate delay metrics and a stochastic uplink-merge to manage irregular client participation and data heterogeneity.
- They employ rigorous delay metrics to capture per-round, maximum, and average staleness, ensuring robust convergence guarantees under standard smoothness and variance assumptions.
- The three variants—FedSUM-B, FedSUM, and FedSUM-CR—offer practical trade-offs between local computation, communication overhead, and memory usage for real-world deployments.
The FedSUM family comprises a suite of federated learning (FL) algorithms that address the challenge of arbitrary client participation patterns in practical distributed optimization. Unlike prior FL methods restricted to idealized participation assumptions or requiring additional constraints on client data heterogeneity, the FedSUM family unifies handling of both temporally irregular client activity and non-i.i.d. data with minimal assumptions. Central to these methods are delay metrics that quantify staleness due to intermittent client participation, and a “Stochastic Uplink-Merge” technique that robustly merges stale and fresh gradient contributions to correct local update bias and reduce communication overhead. The FedSUM family includes three variants—FedSUM-B (basic, without local steps), FedSUM (with local updates), and FedSUM-CR (communication-reduced)—that collectively generalize, and in specific regimes, recover, the behavior of popular FL algorithms such as FedAvg and SCAFFOLD. Convergence guarantees for all FedSUM variants hold under standard smoothness and bounded-variance assumptions, regardless of participation heterogeneity (You et al., 20 Dec 2025).
1. Delay Metrics and Participation Modeling
The FedSUM family introduces a rigorous framework for capturing variability in client participation using two delay metrics:
- Per-round delay : For round , , where is the last round in which client was active.
- Maximum delay : Over rounds, bounds the greatest inactivity gap among all clients.
- Average delay : Given by , it quantifies the typical staleness in the system.
Arbitrary participation, deterministic or random, is modeled entirely through these delay measures, allowing the framework to subsume nonuniform, cyclic, or adversarial dropouts as long as delays remain sub-linear (You et al., 20 Dec 2025).
2. Algorithmic Structure and Variants
Each FedSUM variant maintains a global model , a server control vector , and per-client control vectors . Their principal distinction lies in their local computation strategies and communication patterns.
| Variant | Local Steps | Uplink | Downlink |
|---|---|---|---|
| FedSUM-B | None (single-step) | ||
| FedSUM | SGD steps | ||
| FedSUM-CR | SGD steps |
- FedSUM-B: Active clients compute a mini-batch gradient at , transmit the increment , and update their controls; the server updates the global model solely from aggregated deltas. No downlink of server control is needed.
- FedSUM: Clients perform local SGD steps at each round, using server-provided for variance correction. Clients return a single delta encoding both model and control adjustments.
- FedSUM-CR: Eliminates the extra downlink by letting clients reconstruct the correction direction locally from stored . Local control overhead increases, but downlink cost matches FedSUM-B.
The shared “Stochastic Uplink-Merge” protocol underlies all methods, robustly combining fresh and stale client contributions regardless of participation irregularity (You et al., 20 Dec 2025).
3. Convergence Guarantees and Theoretical Properties
Under the sole requirements that each is -smooth and each stochastic gradient has variance at most , the FedSUM algorithms converge to a stationary point of at a quantifiably delay-dependent rate. Explicitly, for properly chosen stepsizes
the following holds for any (possibly adversarial) client activity sequence: where and . For random participation, these bounds hold in expectation over the delay metrics.
This result unifies prior FL convergence analyses, recovers established rates for uniform or cyclical participation, and quantifies the precise effect of staleness via and (You et al., 20 Dec 2025).
4. Assumption Minimality and Heterogeneity
Contrasting with FedAvg and SCAFFOLD, which require constraints such as bounded gradient dissimilarity or two-vector communication, FedSUM imposes no restriction on for client distributions. The only requirements are -smoothness and bounded variance, alongside explicit tracking of delay metrics. This applicability encompasses:
- Uniform and independent sampling (with ),
- Cyclic or reshuffled scheduling,
- Adversarially determined dropouts, as long as delay growth is sub-linear.
Consequently, FedSUM addresses a broader range of practical scenarios with minimal modeling overhead (You et al., 20 Dec 2025).
5. Communication and Computational Complexity
Communication overhead per round across variants is summarized as follows:
- FedSUM-B: Uplink—1 vector ; Downlink—1 vector .
- FedSUM: Uplink—1 vector ; Downlink—2 vectors .
- FedSUM-CR: Uplink—1 vector ; Downlink—1 vector .
FedSUM-B and FedSUM-CR match FedAvg’s minimal downlink cost; FedSUM matches SCAFFOLD for uplink, but doubles the downlink bandwidth. FedSUM-CR trades greater local client memory—storing of model dimension—for reduced communication, typically favorable when downlink bandwidth is constrained. Computation per round is dominated by local gradient evaluations, which is uniform across the family. Global server-side storage is limited to maintenance of the control vector (You et al., 20 Dec 2025).
6. Parameterization and Practical Deployment
Recommended parameter choices for the FedSUM family are as follows:
- Batch size : Increasing reduces the variance component but exacerbates staleness error if clients are inactive across many rounds. Typical values: –$50$.
- : Set as ; conservatively, use an upper bound on (e.g., for cyclic regimes).
- : Not to exceed ; initial values often or , scaled relative to FedAvg by .
- Local steps : As in FedAvg, generally $1$–$10$; increasing trades off higher computation for fewer communications.
- Initialization: .
- Client memory: FedSUM-CR requires for model dimension , typically negligible compared to the overall model size.
Selection among FedSUM-B, FedSUM, and FedSUM-CR is dictated by communication and memory trade-offs, with FedSUM-CR commonly preferred under tight downlink constraints (You et al., 20 Dec 2025).
7. Significance and Scope
The FedSUM family provides an extensible and communication-efficient foundation for federated nonconvex optimization under fully arbitrary client availability, without imposing restrictive data or participation assumptions. Its delay-metric analysis unifies prior special-case results, and its flexible “stochastic uplink-merge” protocol integrates contributions from both fresh and stale client updates. Its convergence analysis directly addresses the general federated setting encountered in real-world deployments, permitting practitioners to select algorithmic variants suited to system bandwidth and memory constraints, as well as desired computation-to-communication ratios (You et al., 20 Dec 2025).