Divergence-Based Adaptive Aggregation (DRAG)

Updated 18 January 2026

DRAG is a divergence-based adaptive aggregation framework that calibrates local updates through geometric alignment to mitigate client drift in federated learning.
BR-DRAG extends DRAG by using a server-vetted root dataset to produce robust reference directions, effectively countering Byzantine attacks.
Empirical results on CIFAR-10 and EMNIST demonstrate that DRAG reduces communication rounds and improves test accuracy compared to traditional methods like FedAvg.

Byzantine-Resilient DRAG (BR-DRAG) is an advanced aggregation framework for federated learning (FL) that extends divergence-based adaptive aggregation (DRAG) with robust mechanisms against Byzantine attacks. BR-DRAG directly addresses limitations in FL posed by client drift due to non-iid data and adversarial behaviors (Byzantine clients) by leveraging geometric calibration of updates and a vetted root dataset to maintain trustworthy aggregation dynamics (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

1. Foundational Principles of Divergence-Based Adaptive Aggregation

DRAG introduces a geometric approach to adaptive aggregation in FL. Each client computes a local model displacement $\mathbf{g}_m^t = \theta_m^{t,U} - \theta^t$ . The server maintains a global reference direction $\mathbf{r}^t$ , updated via an exponential moving average: $\mathbf r^t = \begin{cases} \dfrac1S \sum_{m\in\mathcal{S}^0} \mathbf{g}_m^0, & t=0\ (1-\alpha)\mathbf{r}^{t-1} + \alpha \Delta^{t-1}, & t\ge1, \end{cases}$ where $\alpha\in (0,1)$ , $\mathcal{S}^t$ is the subset of $S$ clients sampled at round $t$ , and $\Delta^{t-1}$ is the mean of modified updates $\mathbf{v}_m^{t-1}$ .

DRAG uses the divergence-of-degree (DoD) metric to quantify misalignment: $\lambda_m^t = c \left(1 - \frac{\langle \mathbf{g}_m^t, \mathbf{r}^t \rangle}{\|\mathbf{g}_m^t\| \|\mathbf{r}^t\|} \right), \quad c\in[0,1].$ A higher $\lambda_m^t$ indicates greater misalignment, leading to proportionally stronger correction. The local update is then linearly combined with the reference: $\mathbf{v}_m^t = (1-\lambda_m^t)\, \mathbf{g}_m^t + \lambda_m^t \frac{\| \mathbf{g}_m^t \|}{\| \mathbf{r}^t \|} \mathbf{r}^t.$ This operation preserves the norm of the update while calibrating its direction (Zhu et al., 2023).

2. Byzantine-Resilient DRAG Mechanism

BR-DRAG enhances classical DRAG by providing resilience to Byzantine attacks, where malicious clients transmit adversarial or arbitrary updates. The core modification is the use of a trusted server-side root dataset to generate robust reference directions:

The server maintains a vetted root dataset $\mathcal{D}_{\text{root}}$ .
At each round, the server simulates $U$ steps of local SGD on $\mathcal{D}_{\text{root}}$ , generating a reference update $\mathbf{r}^t = \theta^{t,U} - \theta^t$ .
All client DoD and linear calibrations are performed relative to this trusted $\mathbf{r}^t$ .

Further normalization can be adopted to mitigate magnitude-scaling attacks: $\mathbf v_m^t = (1-\lambda_m^t)\frac{\|\mathbf{r}^t\|}{\|\mathbf{g}_m^t\|}\mathbf{g}_m^t + \lambda_m^t \mathbf{r}^t,$ enforcing $\|\mathbf{v}_m^t\| = \|\mathbf{r}^t\|$ regardless of the client's reported magnitude (Zhu et al., 2023).

BR-DRAG thus aligns updates with a server-trusted direction, reducing the sensitivity to Byzantine anomalies and preventing attackers from manipulating the update aggregation through either direction or scale.

3. Algorithmic Workflow

BR-DRAG's workflow comprises:

Server-side:
- Maintains and updates $\mathbf{r}^t$ via local SGD on $\mathcal{D}_{\text{root}}$ .
- Broadcasts $(\theta^t, \mathbf{r}^t)$ to participating clients.
Client-side:
- Computes $\mathbf{g}_m^t = \theta_m^{t,U} - \theta^t$ with $U$ local SGD steps.
- Calculates $\lambda_m^t$ using the provided $\mathbf{r}^t$ .
- Applies the linear calibration to produce $\mathbf{v}_m^t$ .
- Uploads $\mathbf{v}_m^t$ to server.
Server-side aggregation:
- Aggregates received $\mathbf{v}_m^t$ as $\Delta^t = \tfrac{1}{S} \sum_{m\in\mathcal{S}^t} \mathbf{v}_m^t$ .
- Updates $\theta^{t+1} = \theta^t + \Delta^t$ .
- Updates the reference direction as described above (in standard DRAG, a momentum average; in BR-DRAG, via the root dataset).

The communication overhead is one additional vector per round (the reference direction), with negligible increase in computational complexity over FedAvg (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

4. Theoretical Properties and Convergence

For DRAG (and, by construction, BR-DRAG under non-Byzantine scenarios), the following holds under standard assumptions:

Assumptions:
- Each local objective $F_m$ is $L$ -smooth.
- Stochastic gradients are unbiased with bounded local ( $\sigma_L^2$ ) and global ( $\sigma_G^2$ ) variance.
- Partial participation $S<M$ , supporting realistic FL settings.
Main convergence result:

$\frac{1}{T} \sum_{t=0}^{T-1} \mathbb{E} \|\nabla f(\theta^t)\|^2 \leq \frac{f(\theta^0) - f^*}{\gamma \eta U T} + V,$

where $V$ depends on hyperparameters and data heterogeneity, and $\gamma > 0$ is determined by the choice of drag coefficient $c$ and step size $\eta$ .

Implication:
- With $\eta = O(1/U)$ and constant $U$ , DRAG achieves $O(1/\sqrt{T})$ convergence in non-convex settings, matching FedAvg but with reduced bias/variance due to drift (Xiao et al., 11 Jan 2026).

No formal Byzantine-robust convergence rate is established for BR-DRAG; rather, empirical robustness is provided by the use of a trusted reference and normalization.

5. Empirical Performance and Robustness

Experimental results illustrate the impact of DRAG and BR-DRAG under a range of data heterogeneity and adversarial conditions:

Accuracy and Speed:
- On CIFAR-10 (Dirichlet β=0.1 partition, high heterogeneity), DRAG attains 70.8% test accuracy in 600 rounds vs. 52.3% for FedAvg, and outperforms SCAFFOLD, FedProx, FedACG, and FedExP.
- At β=0.5, DRAG achieves 78% accuracy in 400 rounds—significantly fewer than the 1,000+ rounds required by FedAvg for equivalent accuracy.
- In non-iid EMNIST and CIFAR-10 settings, DRAG/BR-DRAG require fewer communication rounds to reach target accuracies compared to baselines (Xiao et al., 11 Jan 2026, Zhu et al., 2023).
Byzantine Resilience:
- Under Gaussian scaling attacks with 1–4 Byzantine clients, BR-DRAG maintains stable convergence and high test accuracy (>75%), while FedAvg diverges and FLTrust degrades under stronger heterogeneity.
- The trusted reference mechanism enables robust calibration of honest updates while blunting adversarial influence in aggregation (Zhu et al., 2023).

Table: Comparative Test Accuracy on CIFAR-10, β=0.1, 600 Rounds

Method	Test Accuracy (%)
FedAvg	52.3
FedProx	54.5
SCAFFOLD	56.1
FedExP	55.0
FedACG	57.2
DRAG	70.8

6. Practical Considerations and Extensions

Hyperparameters: Step size $\eta = O(1/U)$ with $U=5$ local steps is effective. Reference momentum $\alpha$ in $[0.2, 0.3]$ yields reference stability. The drag coefficient $c$ balances drift correction and variance ( $c\approx0.1$ for moderate, $c\approx0.25$ for strong heterogeneity).
Overheads: Each round incurs one additional vector broadcast for $\mathbf{r}^t$ , while all other communication and storage match FedAvg or SCAFFOLD.
Extensions: DRAG/BR-DRAG can be integrated with communication compression or secure aggregation. Adaptive tuning of $c$ across rounds may enhance robustness in highly non-stationary conditions.
Limitations and Open Problems: Adaptive scheduling of $c$ , broader characterization against further Byzantine strategies (e.g., label flipping, Krum/Trim), and formal guarantees under adversarial conditions remain important research directions (Zhu et al., 2023).

7. Significance and Outlook

Byzantine-Resilient DRAG addresses critical challenges in FL—client drift from non-iid data and resilience to Byzantine failures—by geometrically aligning local updates with a trusted, server-generated reference. This approach achieves both rapid convergence and robustness with negligible overhead, surpassing prior methods in empirical evaluation. The requirement for a vetted root dataset aligns BR-DRAG with FLTrust-like assumptions while avoiding overly aggressive client removal or complex trust management. Prospective research includes extension to wider adversarial models, dynamic hyperparameter adaptation, and integration into heterogeneous or wireless FL environments (Xiao et al., 11 Jan 2026, Zhu et al., 2023).

Markdown Report Issue Upgrade to Chat

References (2)

Divergence-Based Adaptive Aggregation for Byzantine Robust Federated Learning (2026)

DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DiveRgence-based Adaptive aGgregation (DRAG).