Papers
Topics
Authors
Recent
Search
2000 character limit reached

Divergence-Based Adaptive Aggregation (DRAG)

Updated 18 January 2026
  • DRAG is a divergence-based adaptive aggregation framework that calibrates local updates through geometric alignment to mitigate client drift in federated learning.
  • BR-DRAG extends DRAG by using a server-vetted root dataset to produce robust reference directions, effectively countering Byzantine attacks.
  • Empirical results on CIFAR-10 and EMNIST demonstrate that DRAG reduces communication rounds and improves test accuracy compared to traditional methods like FedAvg.

Byzantine-Resilient DRAG (BR-DRAG) is an advanced aggregation framework for federated learning (FL) that extends divergence-based adaptive aggregation (DRAG) with robust mechanisms against Byzantine attacks. BR-DRAG directly addresses limitations in FL posed by client drift due to non-iid data and adversarial behaviors (Byzantine clients) by leveraging geometric calibration of updates and a vetted root dataset to maintain trustworthy aggregation dynamics (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

1. Foundational Principles of Divergence-Based Adaptive Aggregation

DRAG introduces a geometric approach to adaptive aggregation in FL. Each client computes a local model displacement gmt=θmt,U−θt\mathbf{g}_m^t = \theta_m^{t,U} - \theta^t. The server maintains a global reference direction rt\mathbf{r}^t, updated via an exponential moving average: rt={1S∑m∈S0gm0,t=0 (1−α)rt−1+αΔt−1,t≥1,\mathbf r^t = \begin{cases} \dfrac1S \sum_{m\in\mathcal{S}^0} \mathbf{g}_m^0, & t=0\ (1-\alpha)\mathbf{r}^{t-1} + \alpha \Delta^{t-1}, & t\ge1, \end{cases} where α∈(0,1)\alpha\in (0,1), St\mathcal{S}^t is the subset of SS clients sampled at round tt, and Δt−1\Delta^{t-1} is the mean of modified updates vmt−1\mathbf{v}_m^{t-1}.

DRAG uses the divergence-of-degree (DoD) metric to quantify misalignment: λmt=c(1−⟨gmt,rt⟩∥gmt∥∥rt∥),c∈[0,1].\lambda_m^t = c \left(1 - \frac{\langle \mathbf{g}_m^t, \mathbf{r}^t \rangle}{\|\mathbf{g}_m^t\| \|\mathbf{r}^t\|} \right), \quad c\in[0,1]. A higher λmt\lambda_m^t indicates greater misalignment, leading to proportionally stronger correction. The local update is then linearly combined with the reference: vmt=(1−λmt) gmt+λmt∥gmt∥∥rt∥rt.\mathbf{v}_m^t = (1-\lambda_m^t)\, \mathbf{g}_m^t + \lambda_m^t \frac{\| \mathbf{g}_m^t \|}{\| \mathbf{r}^t \|} \mathbf{r}^t. This operation preserves the norm of the update while calibrating its direction (Zhu et al., 2023).

2. Byzantine-Resilient DRAG Mechanism

BR-DRAG enhances classical DRAG by providing resilience to Byzantine attacks, where malicious clients transmit adversarial or arbitrary updates. The core modification is the use of a trusted server-side root dataset to generate robust reference directions:

  • The server maintains a vetted root dataset Droot\mathcal{D}_{\text{root}}.
  • At each round, the server simulates UU steps of local SGD on Droot\mathcal{D}_{\text{root}}, generating a reference update rt=θt,U−θt\mathbf{r}^t = \theta^{t,U} - \theta^t.
  • All client DoD and linear calibrations are performed relative to this trusted rt\mathbf{r}^t.

Further normalization can be adopted to mitigate magnitude-scaling attacks: vmt=(1−λmt)∥rt∥∥gmt∥gmt+λmtrt,\mathbf v_m^t = (1-\lambda_m^t)\frac{\|\mathbf{r}^t\|}{\|\mathbf{g}_m^t\|}\mathbf{g}_m^t + \lambda_m^t \mathbf{r}^t, enforcing ∥vmt∥=∥rt∥\|\mathbf{v}_m^t\| = \|\mathbf{r}^t\| regardless of the client's reported magnitude (Zhu et al., 2023).

BR-DRAG thus aligns updates with a server-trusted direction, reducing the sensitivity to Byzantine anomalies and preventing attackers from manipulating the update aggregation through either direction or scale.

3. Algorithmic Workflow

BR-DRAG's workflow comprises:

  • Server-side:
    • Maintains and updates rt\mathbf{r}^t via local SGD on Droot\mathcal{D}_{\text{root}}.
    • Broadcasts (θt,rt)(\theta^t, \mathbf{r}^t) to participating clients.
  • Client-side:
    • Computes gmt=θmt,U−θt\mathbf{g}_m^t = \theta_m^{t,U} - \theta^t with UU local SGD steps.
    • Calculates λmt\lambda_m^t using the provided rt\mathbf{r}^t.
    • Applies the linear calibration to produce vmt\mathbf{v}_m^t.
    • Uploads vmt\mathbf{v}_m^t to server.
  • Server-side aggregation:
    • Aggregates received vmt\mathbf{v}_m^t as Δt=1S∑m∈Stvmt\Delta^t = \tfrac{1}{S} \sum_{m\in\mathcal{S}^t} \mathbf{v}_m^t.
    • Updates θt+1=θt+Δt\theta^{t+1} = \theta^t + \Delta^t.
    • Updates the reference direction as described above (in standard DRAG, a momentum average; in BR-DRAG, via the root dataset).

The communication overhead is one additional vector per round (the reference direction), with negligible increase in computational complexity over FedAvg (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

4. Theoretical Properties and Convergence

For DRAG (and, by construction, BR-DRAG under non-Byzantine scenarios), the following holds under standard assumptions:

  • Assumptions:
    • Each local objective FmF_m is LL-smooth.
    • Stochastic gradients are unbiased with bounded local (σL2\sigma_L^2) and global (σG2\sigma_G^2) variance.
    • Partial participation S<MS<M, supporting realistic FL settings.
  • Main convergence result:

1T∑t=0T−1E∥∇f(θt)∥2≤f(θ0)−f∗γηUT+V,\frac{1}{T} \sum_{t=0}^{T-1} \mathbb{E} \|\nabla f(\theta^t)\|^2 \leq \frac{f(\theta^0) - f^*}{\gamma \eta U T} + V,

where VV depends on hyperparameters and data heterogeneity, and γ>0\gamma > 0 is determined by the choice of drag coefficient cc and step size η\eta.

  • Implication:
    • With η=O(1/U)\eta = O(1/U) and constant UU, DRAG achieves O(1/T)O(1/\sqrt{T}) convergence in non-convex settings, matching FedAvg but with reduced bias/variance due to drift (Xiao et al., 11 Jan 2026).

No formal Byzantine-robust convergence rate is established for BR-DRAG; rather, empirical robustness is provided by the use of a trusted reference and normalization.

5. Empirical Performance and Robustness

Experimental results illustrate the impact of DRAG and BR-DRAG under a range of data heterogeneity and adversarial conditions:

  • Accuracy and Speed:
    • On CIFAR-10 (Dirichlet β=0.1 partition, high heterogeneity), DRAG attains 70.8% test accuracy in 600 rounds vs. 52.3% for FedAvg, and outperforms SCAFFOLD, FedProx, FedACG, and FedExP.
    • At β=0.5, DRAG achieves 78% accuracy in 400 rounds—significantly fewer than the 1,000+ rounds required by FedAvg for equivalent accuracy.
    • In non-iid EMNIST and CIFAR-10 settings, DRAG/BR-DRAG require fewer communication rounds to reach target accuracies compared to baselines (Xiao et al., 11 Jan 2026, Zhu et al., 2023).
  • Byzantine Resilience:
    • Under Gaussian scaling attacks with 1–4 Byzantine clients, BR-DRAG maintains stable convergence and high test accuracy (>75%), while FedAvg diverges and FLTrust degrades under stronger heterogeneity.
    • The trusted reference mechanism enables robust calibration of honest updates while blunting adversarial influence in aggregation (Zhu et al., 2023).

Table: Comparative Test Accuracy on CIFAR-10, β=0.1, 600 Rounds

Method Test Accuracy (%)
FedAvg 52.3
FedProx 54.5
SCAFFOLD 56.1
FedExP 55.0
FedACG 57.2
DRAG 70.8

6. Practical Considerations and Extensions

  • Hyperparameters: Step size η=O(1/U)\eta = O(1/U) with U=5U=5 local steps is effective. Reference momentum α\alpha in [0.2,0.3][0.2, 0.3] yields reference stability. The drag coefficient cc balances drift correction and variance (c≈0.1c\approx0.1 for moderate, c≈0.25c\approx0.25 for strong heterogeneity).
  • Overheads: Each round incurs one additional vector broadcast for rt\mathbf{r}^t, while all other communication and storage match FedAvg or SCAFFOLD.
  • Extensions: DRAG/BR-DRAG can be integrated with communication compression or secure aggregation. Adaptive tuning of cc across rounds may enhance robustness in highly non-stationary conditions.
  • Limitations and Open Problems: Adaptive scheduling of cc, broader characterization against further Byzantine strategies (e.g., label flipping, Krum/Trim), and formal guarantees under adversarial conditions remain important research directions (Zhu et al., 2023).

7. Significance and Outlook

Byzantine-Resilient DRAG addresses critical challenges in FL—client drift from non-iid data and resilience to Byzantine failures—by geometrically aligning local updates with a trusted, server-generated reference. This approach achieves both rapid convergence and robustness with negligible overhead, surpassing prior methods in empirical evaluation. The requirement for a vetted root dataset aligns BR-DRAG with FLTrust-like assumptions while avoiding overly aggressive client removal or complex trust management. Prospective research includes extension to wider adversarial models, dynamic hyperparameter adaptation, and integration into heterogeneous or wireless FL environments (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DiveRgence-based Adaptive aGgregation (DRAG).