Local Model Poisoning Attacks

Updated 26 January 2026

Local model poisoning attacks are adversarial strategies that modify local updates in distributed learning systems to degrade global model performance.
They use optimization and multi-round consistency techniques to bypass robust aggregation defenses and stealthily induce cumulative model drift.
Empirical evaluations show these attacks can raise test error from 6–11% to 50–75% and even reduce accuracy to random-guess levels on various datasets.

Local model poisoning attacks refer to a class of adversarial strategies targeting distributed learning frameworks—most notably federated learning (FL)—wherein an adversary manipulates the local model parameters or gradients on compromised clients to degrade or subvert the resulting global model. Unlike data poisoning (tampering with the local training data) or backdoor attacks (implanting triggers for targeted misclassification), local model poisoning achieves its objective by direct alteration of model updates submitted to the aggregation server. This attack paradigm exposes a critical vulnerability even in Byzantine-robust FL systems and distributed protocols equipped with state-of-the-art defenses.

1. Adversarial Capabilities and Threat Models

Local model poisoning attacks assume an adversary who controls a subset $c$ out of $m$ clients—either by hijacking them or injecting Sybil (fake) identities. These compromised clients can submit arbitrary parameter vectors or update gradients, without restriction to those derived from SGD or any legitimate training process. The attacker's knowledge assumptions vary:

White-box: Full access to local update distributions of honest clients and the aggregation rule.
Partial/Black-box: Only the global model is observable; knowledge about honest local data, updates, or the aggregation rule is unavailable.

The attacker's objective is typically untargeted—maximizing the global model's test error or minimizing benign accuracy. In more advanced settings, the attack may aim to achieve targeted backdoor objectives, maintain persistent deviation during the federated unlearning phase, or poison collaborative control protocols (Fang et al., 2019, Xie et al., 2024, Cao et al., 2022, Joshi et al., 2023, Wang et al., 29 Jan 2025, Russo et al., 2021).

In related domains such as local differential privacy (LDP), adversaries may inject a limited number of fake users with malicious reports to manipulate aggregates such as item frequency or ranking, exploiting the local randomization as an attack surface (Hsu et al., 6 Mar 2025, Zhan et al., 30 Jun 2025).

2. Attack Methodologies and Algorithmic Formulations

Federated Learning Attacks

A typical FL round comprises the following steps:

The server broadcasts global model $w_g^t$ to all or a subset of clients.
Each client $i$ initializes its local model $w_i^t \leftarrow w_g^t$ , performs local training (e.g., SGD), and computes an update.
The clients send the updated models or gradients to the server, which applies an aggregation rule $A$ to obtain $w_g^{t+1}$ .

Local model poisoning alters step 2 on compromised clients: the adversary replaces $w_i^t$ with an adversarially chosen $w_i^{'t}$ to maximize global test error or align $w_g^{t+1}$ with a poisoned direction. Mechanistic frameworks:

Optimization-based: For aggregation rules $m$ 0 (e.g., Krum, Bulyan, trimmed mean, median), the attacker solves

$m$ 1

where $m$ 2 encodes the natural update direction, $m$ 3 and $m$ 4 are global models before and after attack, respectively, and constraints ensure selection by $m$ 5 (Fang et al., 2019).

Direction-based (multi-round consistency): PoisonedFL fixes a random vector $m$ 6 and ensures sign consistency

$m$ 7

enforcing cumulative drift even under filtering/attenuation. Scaling and alignment are tuned adaptively to bypass defenses (Xie et al., 2024).

Base-model drag attacks (MPAF): Each fake client submits

$m$ 8

where $m$ 9 is a base model with low accuracy. $w_g^t$ 0 is large enough to amplify the polluted direction, regardless of the aggregation rule (Cao et al., 2022).

Distance-constrained adversarial perturbation (DISBELIEVE): The attacker constrains parameter or gradient distance within the empirical benign spread

$w_g^t$ 1

and maximizes the adversarial loss. This explicitly bypasses distance-based robust aggregation (e.g., Krum, trimmed mean) (Joshi et al., 2023).

Control-theoretic (FedSA/Sliding Mode Control): The poisoning is cast as a nonlinear control system. The attacker defines a sliding surface to steer the global model $w_g^t$ 2 toward a predefined poisoned target $w_g^t$ 3 at a controlled (stealthy) rate, ensuring exact accuracy degradation (e.g., reduce accuracy by 10% on the validation set) (Pan et al., 22 May 2025).

LDP and Data-Driven Control Poisoning

LDP Protocols: Adversaries inject fake users so that reporting patterns (optimized via combinatorial or submodular heuristics) maximally perturb frequency or ranking statistics, measured by overall gain in swaps or amplified occurrence of target patterns (Zhan et al., 30 Jun 2025, Hsu et al., 6 Mar 2025).
Data-Driven Control Methods: The attacker solves bi-level optimization problems, perturbing input/output sequences to maximally degrade closed-loop system stability or performance, while remaining within stealth constraints (Russo et al., 2021).

3. Impact on Security and Resilience of Distributed Protocols

Extensive empirical evaluation demonstrates the potency of local model poisoning:

Federated Learning (FL)
- Attacks increase test error from 6–11% to 50–75% (MNIST with Krum/median/trimmed-mean); effect persists across Fashion-MNIST, medical imaging, and CH-MNIST datasets (Fang et al., 2019, Joshi et al., 2023).
- Multi-round consistent attacks (PoisonedFL) break all evaluated defenses, raising error to random-guessing rates ( $w_g^t$ 490%) on MNIST, CIFAR-10, FEMNIST, and Purchase datasets (Xie et al., 2024).
- Fake-client attacks (MPAF) reduce accuracy under robust aggregation (Trimmed Mean, Median) by >30% at only 10% malicious clients (Cao et al., 2022).
- Distance-constrained attacks (DISBELIEVE) reduce AUC by up to 28% under outlier-based aggregation, with similar or higher impact than previous attacks (Joshi et al., 2023).
LDP Protocols
- With only 20% fake users, attackers can amplify target pattern frequencies by 3–5 $w_g^t$ 5, and ranking-optimized strategies flip a large number of pairwise item orders even under tight privacy parameters (Hsu et al., 6 Mar 2025, Zhan et al., 30 Jun 2025).
Data-Driven Control
- Well-crafted perturbations cause instability or significant closed-loop performance drop, even for small ( $w_g^t$ 6) attack amplitude (Russo et al., 2021).

Attacks retain effectiveness under varying non-IID data, partial participation, and different model/aggregation architectures.

4. Countermeasures and Limitations of Existing Defenses

Defensive strategies can be broadly categorized into:

Aggregation-based: Robust rules (Krum, Median, Trimmed Mean, Bulyan), norm clipping.
- Krum tolerates up to $w_g^t$ 7 but fails against carefully-colluded or distance-constrained attacks (Fang et al., 2019, Joshi et al., 2023).
- Trimmed mean/median offer partial resistance but can be systematically broken by attacks engineered to align with the natural benign spread (Joshi et al., 2023, Cao et al., 2022).
- Norm clipping imposes trade-offs: small clip bound limits attack but also benign utility; large bound leaves attack unmitigated (Cao et al., 2022).
Statistical/Evaluation-based:
- Error-rate or loss-function-based rejection (ERR, LFR)—server removes updates yielding largest deterioration on validation data (Fang et al., 2019).
- Spectral anomaly detection, entropy filtering, cosine similarity scoring, and truth-inference (Wang et al., 2022).
Client-side (post-pollution) mitigation:
- FL-WBC perturbs “null-space” Hessian directions of the local model, washing out persistent effects of attacks within 1–5 rounds at minimal (<5%) benign-accuracy drop (Sun et al., 2021).
Federated Unlearning Defenses:
- UnlearnGuard predicts and filters model updates during unlearning, ensuring proximity to scratch retraining even under adaptive attack (Wang et al., 29 Jan 2025).
Partial Sharing and Dynamic Protocols:
- PSO-Fed with partial parameter sharing reduces attack impact by lowering the fraction of poisoned coordinates visible to the server; nontrivial optimal stepsize further minimizes attack-induced MSE (Lari et al., 2024).

Summary of empirical findings:

No single defense is invulnerable. Loss/rejection and robust aggregation can mitigate but rarely fully neutralize advanced local model poisoning (e.g., only trimmed-mean+LFR occasionally restores benign performance) (Fang et al., 2019).
Distance-based and norm-based approaches are particularly vulnerable to in-cluster adversarial perturbations (DISBELIEVE), necessitating more sophisticated aggregation strategies.
Defenses relying on static statistics are bypassed by multi-round consistent or targeted attacks; dynamic/trajectory-level detection or validation on held-out data is required.

5. Theoretical Insights and Limits of Robustness

Key theoretical conclusions:

Cumulative Drift: Attacks enforcing sign consistency ( $w_g^t$ 8 across rounds) accumulate deviation linearly in $w_g^t$ 9 (number of rounds), circumventing per-round attenuation by defenses (Xie et al., 2024).
Distance Exploitability: If the malicious update remains inside the benign spread, robust aggregation’s breakdown point is effectively nullified—the basic assumption of “outliers are distant” fails (Joshi et al., 2023).
Precision of Control: Sliding-mode attacks provably drive $i$ 0 toward a specified poisoned state, allowing precise setting of the final global accuracy with controllable attack speed and stealth (Pan et al., 22 May 2025).
Trade-offs in LDP: Privacy/utility trade-offs extend to privacy/security trade-offs; smaller $i$ 1 reduces distinguishability and thus lowers the number of fake users needed for substantial impact (Zhan et al., 30 Jun 2025, Hsu et al., 6 Mar 2025).
Possible Stealth: The attack can remain undetected if the crafted deviation “hides” in subspaces unobservable by standard anomaly detectors or if it manifests only cumulatively across rounds.

6. Open Challenges and Prospective Research Directions

Adaptive, low-overhead defenses: Current validation-based techniques incur excessive computation (e.g., $i$ 2 re-aggregations) and potential removal of benign updates. Future work should focus on one-pass, trajectory-level, or learning-theoretic defenses capable of detecting structured or persistent adversarial drifts (Fang et al., 2019, Xie et al., 2024).
Non-IID robustness: Many evaluation/truth-inference methods lose effectiveness with heterogeneous data—aggregation and trust mechanisms must be robust to client-level data sampling and distributional skew (Wang et al., 2022).
Fake-user Sybil resistance in LDP and FL: Enhanced authentication, physically-grounded reporting, or hybrid global-local privacy models may be required to bound adversarial influence (Zhan et al., 30 Jun 2025, Hsu et al., 6 Mar 2025).
Defensive metrics beyond distance: Aggregators should incorporate loss consistency, higher-moment statistics, or cryptographic attestation, and track anomalous behavior across rounds rather than only per-round Euclidean statistics (Joshi et al., 2023, Fang et al., 2019).
Federated Unlearning under Attack: Frameworks must guarantee that unlearning eliminates not only direct dependencies on malicious client contributions but also subtle residual biases (Wang et al., 29 Jan 2025).
Scalable and verifiable defenses: Blockchain, decentralized, or multiparty verification mechanisms may help reduce computation and provide transparent, audit-ready aggregation.

A plausible implication is that the attack–defense arms race in distributed learning is far from settled: attacks that exploit statistical, temporal, or protocol-level gaps in assumptions will continue to pressure the design of future federated, private, and collaborative learning systems.

7. Comparative Summary of Attack Efficacy and Defenses

Attack/Defense	Knowledge Required	Main Strategy	Effectiveness Against Robust Aggregation	Defense Limitations
Directed-optimization (Fang et al., 2019)	Full or partial	Oppose natural update direction	Raises error from 11%→75%(Krum, MNIST)	Fails if aggregation is fooled
PoisonedFL (Xie et al., 2024)	Minimal	Multi-round sign-aligned bias	Breaks 8 defenses, to ~90% error	Evades round-level anomaly checks
MPAF (Cao et al., 2022)	No genuine data	Base-model drag via fake clients	Random-guess accuracy with 1–10% fakes	Not blocked by norm clipping
DISBELIEVE (Joshi et al., 2023)	Benign cluster stats	In-cluster adversarial shift	Large AUC drop, bypasses outlier defenses	Distance-only methods break down
FedSA (Pan et al., 22 May 2025)	Parameter access	Sliding-mode to tune attack precision	Guarantees arbitrary accuracy reduction	Stealth same as honest updates
FL-WBC (Sun et al., 2021)	Client-side stats	Masking persistently “hidden” effects	Recovers clean accuracy in ≤5 rounds	Relies on per-coordinate Hessians
PSO-Fed (Lari et al., 2024)	Controlled sharing	Partial parameter transmission	Lower MSE across attack strengths	Slows convergence, protocol tuning
UnlearnGuard (Wang et al., 29 Jan 2025)	Server-side history	Predictive update filtering in FU phase	Error rates close to retraining-from-scratch	High memory, extra comm needed

In summary, local model poisoning exposes fundamental weaknesses in current distributed learning algorithms and their robustification strategies. Defenses must adapt to sophisticated, colluding, and temporally persistent adversarial behaviors, and the ongoing development of scalable, adaptive, and theoretically-grounded mitigations is a core challenge for federated, private, and collaborative machine learning (Fang et al., 2019, Xie et al., 2024, Cao et al., 2022, Joshi et al., 2023, Zhan et al., 30 Jun 2025, Pan et al., 22 May 2025, Sun et al., 2021, Lari et al., 2024, Wang et al., 29 Jan 2025, Wang et al., 2022, Russo et al., 2021).