Papers
Topics
Authors
Recent
Search
2000 character limit reached

Local Model Poisoning Attacks

Updated 26 January 2026
  • Local model poisoning attacks are adversarial strategies that modify local updates in distributed learning systems to degrade global model performance.
  • They use optimization and multi-round consistency techniques to bypass robust aggregation defenses and stealthily induce cumulative model drift.
  • Empirical evaluations show these attacks can raise test error from 6–11% to 50–75% and even reduce accuracy to random-guess levels on various datasets.

Local model poisoning attacks refer to a class of adversarial strategies targeting distributed learning frameworks—most notably federated learning (FL)—wherein an adversary manipulates the local model parameters or gradients on compromised clients to degrade or subvert the resulting global model. Unlike data poisoning (tampering with the local training data) or backdoor attacks (implanting triggers for targeted misclassification), local model poisoning achieves its objective by direct alteration of model updates submitted to the aggregation server. This attack paradigm exposes a critical vulnerability even in Byzantine-robust FL systems and distributed protocols equipped with state-of-the-art defenses.

1. Adversarial Capabilities and Threat Models

Local model poisoning attacks assume an adversary who controls a subset cc out of mm clients—either by hijacking them or injecting Sybil (fake) identities. These compromised clients can submit arbitrary parameter vectors or update gradients, without restriction to those derived from SGD or any legitimate training process. The attacker's knowledge assumptions vary:

  • White-box: Full access to local update distributions of honest clients and the aggregation rule.
  • Partial/Black-box: Only the global model is observable; knowledge about honest local data, updates, or the aggregation rule is unavailable.

The attacker's objective is typically untargeted—maximizing the global model's test error or minimizing benign accuracy. In more advanced settings, the attack may aim to achieve targeted backdoor objectives, maintain persistent deviation during the federated unlearning phase, or poison collaborative control protocols (Fang et al., 2019, Xie et al., 2024, Cao et al., 2022, Joshi et al., 2023, Wang et al., 29 Jan 2025, Russo et al., 2021).

In related domains such as local differential privacy (LDP), adversaries may inject a limited number of fake users with malicious reports to manipulate aggregates such as item frequency or ranking, exploiting the local randomization as an attack surface (Hsu et al., 6 Mar 2025, Zhan et al., 30 Jun 2025).

2. Attack Methodologies and Algorithmic Formulations

Federated Learning Attacks

A typical FL round comprises the following steps:

  1. The server broadcasts global model wgtw_g^t to all or a subset of clients.
  2. Each client ii initializes its local model witwgtw_i^t \leftarrow w_g^t, performs local training (e.g., SGD), and computes an update.
  3. The clients send the updated models or gradients to the server, which applies an aggregation rule AA to obtain wgt+1w_g^{t+1}.

Local model poisoning alters step 2 on compromised clients: the adversary replaces witw_i^t with an adversarially chosen witw_i^{'t} to maximize global test error or align wgt+1w_g^{t+1} with a poisoned direction. Mechanistic frameworks:

  • Optimization-based: For aggregation rules AA (e.g., Krum, Bulyan, trimmed mean, median), the attacker solves

maxw1,...,wcsT(ww),\max_{w_1',...,w_c'} s^T(w-w'),

where ss encodes the natural update direction, ww and ww' are global models before and after attack, respectively, and constraints ensure selection by AA (Fang et al., 2019).

  • Direction-based (multi-round consistency): PoisonedFL fixes a random vector ss and ensures sign consistency

sign(gt)=st,\text{sign}(g^t) = s \quad \forall t,

enforcing cumulative drift even under filtering/attenuation. Scaling and alignment are tuned adaptively to bypass defenses (Xie et al., 2024).

  • Base-model drag attacks (MPAF): Each fake client submits

git=λ(wwt),g_i^t = \lambda(w' - w^t),

where ww' is a base model with low accuracy. λ\lambda is large enough to amplify the polluted direction, regardless of the aggregation rule (Cao et al., 2022).

  • Distance-constrained adversarial perturbation (DISBELIEVE): The attacker constrains parameter or gradient distance within the empirical benign spread

Wμparam22Pdist\|W - \mu^{param}\|_2^2 \leq P_{dist}

and maximizes the adversarial loss. This explicitly bypasses distance-based robust aggregation (e.g., Krum, trimmed mean) (Joshi et al., 2023).

  • Control-theoretic (FedSA/Sliding Mode Control): The poisoning is cast as a nonlinear control system. The attacker defines a sliding surface to steer the global model wtw_t toward a predefined poisoned target w~\tilde w at a controlled (stealthy) rate, ensuring exact accuracy degradation (e.g., reduce accuracy by 10% on the validation set) (Pan et al., 22 May 2025).

LDP and Data-Driven Control Poisoning

  • LDP Protocols: Adversaries inject fake users so that reporting patterns (optimized via combinatorial or submodular heuristics) maximally perturb frequency or ranking statistics, measured by overall gain in swaps or amplified occurrence of target patterns (Zhan et al., 30 Jun 2025, Hsu et al., 6 Mar 2025).
  • Data-Driven Control Methods: The attacker solves bi-level optimization problems, perturbing input/output sequences to maximally degrade closed-loop system stability or performance, while remaining within stealth constraints (Russo et al., 2021).

3. Impact on Security and Resilience of Distributed Protocols

Extensive empirical evaluation demonstrates the potency of local model poisoning:

  • Federated Learning (FL)
    • Attacks increase test error from 6–11% to 50–75% (MNIST with Krum/median/trimmed-mean); effect persists across Fashion-MNIST, medical imaging, and CH-MNIST datasets (Fang et al., 2019, Joshi et al., 2023).
    • Multi-round consistent attacks (PoisonedFL) break all evaluated defenses, raising error to random-guessing rates (\sim90%) on MNIST, CIFAR-10, FEMNIST, and Purchase datasets (Xie et al., 2024).
    • Fake-client attacks (MPAF) reduce accuracy under robust aggregation (Trimmed Mean, Median) by >30% at only 10% malicious clients (Cao et al., 2022).
    • Distance-constrained attacks (DISBELIEVE) reduce AUC by up to 28% under outlier-based aggregation, with similar or higher impact than previous attacks (Joshi et al., 2023).
  • LDP Protocols
    • With only 20% fake users, attackers can amplify target pattern frequencies by 3–5×\times, and ranking-optimized strategies flip a large number of pairwise item orders even under tight privacy parameters (Hsu et al., 6 Mar 2025, Zhan et al., 30 Jun 2025).
  • Data-Driven Control
    • Well-crafted perturbations cause instability or significant closed-loop performance drop, even for small (<5%<5\%) attack amplitude (Russo et al., 2021).

Attacks retain effectiveness under varying non-IID data, partial participation, and different model/aggregation architectures.

4. Countermeasures and Limitations of Existing Defenses

Defensive strategies can be broadly categorized into:

  • Aggregation-based: Robust rules (Krum, Median, Trimmed Mean, Bulyan), norm clipping.
    • Krum tolerates up to c<(m2)/2c < (m-2)/2 but fails against carefully-colluded or distance-constrained attacks (Fang et al., 2019, Joshi et al., 2023).
    • Trimmed mean/median offer partial resistance but can be systematically broken by attacks engineered to align with the natural benign spread (Joshi et al., 2023, Cao et al., 2022).
    • Norm clipping imposes trade-offs: small clip bound limits attack but also benign utility; large bound leaves attack unmitigated (Cao et al., 2022).
  • Statistical/Evaluation-based:
    • Error-rate or loss-function-based rejection (ERR, LFR)—server removes updates yielding largest deterioration on validation data (Fang et al., 2019).
    • Spectral anomaly detection, entropy filtering, cosine similarity scoring, and truth-inference (Wang et al., 2022).
  • Client-side (post-pollution) mitigation:
    • FL-WBC perturbs “null-space” Hessian directions of the local model, washing out persistent effects of attacks within 1–5 rounds at minimal (<5%) benign-accuracy drop (Sun et al., 2021).
  • Federated Unlearning Defenses:
    • UnlearnGuard predicts and filters model updates during unlearning, ensuring proximity to scratch retraining even under adaptive attack (Wang et al., 29 Jan 2025).
  • Partial Sharing and Dynamic Protocols:
    • PSO-Fed with partial parameter sharing reduces attack impact by lowering the fraction of poisoned coordinates visible to the server; nontrivial optimal stepsize further minimizes attack-induced MSE (Lari et al., 2024).

Summary of empirical findings:

  • No single defense is invulnerable. Loss/rejection and robust aggregation can mitigate but rarely fully neutralize advanced local model poisoning (e.g., only trimmed-mean+LFR occasionally restores benign performance) (Fang et al., 2019).
  • Distance-based and norm-based approaches are particularly vulnerable to in-cluster adversarial perturbations (DISBELIEVE), necessitating more sophisticated aggregation strategies.
  • Defenses relying on static statistics are bypassed by multi-round consistent or targeted attacks; dynamic/trajectory-level detection or validation on held-out data is required.

5. Theoretical Insights and Limits of Robustness

Key theoretical conclusions:

  • Cumulative Drift: Attacks enforcing sign consistency (sign(gt)=s\text{sign}(g^t) = s across rounds) accumulate deviation linearly in TT (number of rounds), circumventing per-round attenuation by defenses (Xie et al., 2024).
  • Distance Exploitability: If the malicious update remains inside the benign spread, robust aggregation’s breakdown point is effectively nullified—the basic assumption of “outliers are distant” fails (Joshi et al., 2023).
  • Precision of Control: Sliding-mode attacks provably drive wtw_t toward a specified poisoned state, allowing precise setting of the final global accuracy with controllable attack speed and stealth (Pan et al., 22 May 2025).
  • Trade-offs in LDP: Privacy/utility trade-offs extend to privacy/security trade-offs; smaller ε\varepsilon reduces distinguishability and thus lowers the number of fake users needed for substantial impact (Zhan et al., 30 Jun 2025, Hsu et al., 6 Mar 2025).
  • Possible Stealth: The attack can remain undetected if the crafted deviation “hides” in subspaces unobservable by standard anomaly detectors or if it manifests only cumulatively across rounds.

6. Open Challenges and Prospective Research Directions

  • Adaptive, low-overhead defenses: Current validation-based techniques incur excessive computation (e.g., O(m)O(m) re-aggregations) and potential removal of benign updates. Future work should focus on one-pass, trajectory-level, or learning-theoretic defenses capable of detecting structured or persistent adversarial drifts (Fang et al., 2019, Xie et al., 2024).
  • Non-IID robustness: Many evaluation/truth-inference methods lose effectiveness with heterogeneous data—aggregation and trust mechanisms must be robust to client-level data sampling and distributional skew (Wang et al., 2022).
  • Fake-user Sybil resistance in LDP and FL: Enhanced authentication, physically-grounded reporting, or hybrid global-local privacy models may be required to bound adversarial influence (Zhan et al., 30 Jun 2025, Hsu et al., 6 Mar 2025).
  • Defensive metrics beyond distance: Aggregators should incorporate loss consistency, higher-moment statistics, or cryptographic attestation, and track anomalous behavior across rounds rather than only per-round Euclidean statistics (Joshi et al., 2023, Fang et al., 2019).
  • Federated Unlearning under Attack: Frameworks must guarantee that unlearning eliminates not only direct dependencies on malicious client contributions but also subtle residual biases (Wang et al., 29 Jan 2025).
  • Scalable and verifiable defenses: Blockchain, decentralized, or multiparty verification mechanisms may help reduce computation and provide transparent, audit-ready aggregation.

A plausible implication is that the attack–defense arms race in distributed learning is far from settled: attacks that exploit statistical, temporal, or protocol-level gaps in assumptions will continue to pressure the design of future federated, private, and collaborative learning systems.

7. Comparative Summary of Attack Efficacy and Defenses

Attack/Defense Knowledge Required Main Strategy Effectiveness Against Robust Aggregation Defense Limitations
Directed-optimization (Fang et al., 2019) Full or partial Oppose natural update direction Raises error from 11%→75%(Krum, MNIST) Fails if aggregation is fooled
PoisonedFL (Xie et al., 2024) Minimal Multi-round sign-aligned bias Breaks 8 defenses, to ~90% error Evades round-level anomaly checks
MPAF (Cao et al., 2022) No genuine data Base-model drag via fake clients Random-guess accuracy with 1–10% fakes Not blocked by norm clipping
DISBELIEVE (Joshi et al., 2023) Benign cluster stats In-cluster adversarial shift Large AUC drop, bypasses outlier defenses Distance-only methods break down
FedSA (Pan et al., 22 May 2025) Parameter access Sliding-mode to tune attack precision Guarantees arbitrary accuracy reduction Stealth same as honest updates
FL-WBC (Sun et al., 2021) Client-side stats Masking persistently “hidden” effects Recovers clean accuracy in ≤5 rounds Relies on per-coordinate Hessians
PSO-Fed (Lari et al., 2024) Controlled sharing Partial parameter transmission Lower MSE across attack strengths Slows convergence, protocol tuning
UnlearnGuard (Wang et al., 29 Jan 2025) Server-side history Predictive update filtering in FU phase Error rates close to retraining-from-scratch High memory, extra comm needed

In summary, local model poisoning exposes fundamental weaknesses in current distributed learning algorithms and their robustification strategies. Defenses must adapt to sophisticated, colluding, and temporally persistent adversarial behaviors, and the ongoing development of scalable, adaptive, and theoretically-grounded mitigations is a core challenge for federated, private, and collaborative machine learning (Fang et al., 2019, Xie et al., 2024, Cao et al., 2022, Joshi et al., 2023, Zhan et al., 30 Jun 2025, Pan et al., 22 May 2025, Sun et al., 2021, Lari et al., 2024, Wang et al., 29 Jan 2025, Wang et al., 2022, Russo et al., 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Local Model Poisoning Attacks.