Data Poisoning in Deep Learning: A Survey

Published 27 Mar 2025 in cs.CR and cs.AI | (2503.22759v1)

Abstract: Deep learning has become a cornerstone of modern artificial intelligence, enabling transformative applications across a wide range of domains. As the core element of deep learning, the quality and security of training data critically influence model performance and reliability. However, during the training process, deep learning models face the significant threat of data poisoning, where attackers introduce maliciously manipulated training data to degrade model accuracy or lead to anomalous behavior. While existing surveys provide valuable insights into data poisoning, they generally adopt a broad perspective, encompassing both attacks and defenses, but lack a dedicated, in-depth analysis of poisoning attacks specifically in deep learning. In this survey, we bridge this gap by presenting a comprehensive and targeted review of data poisoning in deep learning. First, this survey categorizes data poisoning attacks across multiple perspectives, providing an in-depth analysis of their characteristics and underlying design princinples. Second, the discussion is extended to the emerging area of data poisoning in LLMs(LLMs). Finally, we explore critical open challenges in the field and propose potential research directions to advance the field further. To support further exploration, an up-to-date repository of resources on data poisoning in deep learning is available at https://github.com/Pinlong-Zhao/Data-Poisoning.

Abstract PDF Upgrade to Chat

Summary

The paper establishes a comprehensive taxonomy of data poisoning attacks by categorizing them based on objectives, stealth, scope, and impact.
The paper details various algorithmic approaches—including label flipping, bilevel optimization, and generative attacks—to evaluate vulnerabilities in deep learning models.
The paper underscores the significance of securing large language models and outlines future research directions to enhance model resilience and detection methods.

Data Poisoning in Deep Learning: A Survey

Introduction

The concept of data poisoning has emerged as a significant and pressing threat to the integrity of deep learning systems, particularly as these systems become ever more integrated into critical applications across various domains. This survey provides an in-depth examination of data poisoning within the context of deep learning, laying out a comprehensive analysis of attack methodologies, potential vulnerabilities, and the implications of these attacks on model reliability and security. The paper moves beyond general discussions to focus specifically on poisoning attacks directed at deep learning models, which distinctly differ from other forms of machine learning due to their reliance on large datasets and complex architectures.

Data Poisoning Taxonomy

This survey establishes a detailed taxonomy of data poisoning attacks by categorizing them based on their distinctive mechanisms and objectives. The taxonomy is based on several dimensions, including attack objective, goal, knowledge, stealthiness, scope, impact, and variability.

Attack Objective: The objectives can vary from label flipping, which alters data labels to mislead model training, to data modification, which manipulates both features and labels to embed malicious behaviors covertly.
Attack Goal: Attacks are designed to achieve either a broad degradation of model accuracy (untargeted) or more precise modifications, such as backdoor insertion that triggers specific behaviors when certain inputs are present.
Attack Knowledge: Depending on the attacker’s access, attacks may be white-box, leveraging complete model knowledge, or black-box, requiring only query access to the model, which dictates the sophistication and feasibility of the attack.
Attack Stealthiness: Attacks are classified by their detectability, with non-stealthy attacks making evident changes to data and stealthy attacks embedding changes that elude detection yet significantly impact model performance.
Attack Scope: This dimension highlights whether an attack targets specific instances or aims to degrade the model's performance across multiple dimensions, classes, or the entire dataset.
Attack Impact: The intended impact ranges from model performance degradation to specifically targeting robustness and fairness, introducing biases that affect model predictions.
Attack Variability: Attacks may be static, with fixed poisoning strategies, or dynamic, evolving over time to adapt to changes in model training or architecture and remaining stealthy.

Data Poisoning Algorithms

The paper outlines various algorithmic approaches, providing detailed insights into their mechanisms and applicability:

Heuristic-based Attacks: Focus on methods like "BadNets" that employ simple strategies, such as adding visible patterns to data, to create backdoors. These strategies, though direct, can be countered through anomaly detection.
Label Flipping Attacks: Exploit incorrect label assignments to disrupt model training. The attack's simplicity yet effectiveness is underscored by its ability to degrade models without complex manipulations.
Feature Space Attacks: Utilize more subtle perturbations in the feature space to mislead the model without altering labels, offering a balance of stealth and effectiveness.
Bilevel Optimization Attacks: These sophisticated methods optimize poisoning strategies at two levels, greatly enhancing attack effectiveness at the cost of higher computational resources.
Influence-based Attacks: These rely on identifying and altering the most influential training samples, making them particularly effective in targeted poisoning scenarios.
Generative Attacks: Leverage generative models like GANs to produce highly realistic poisoned data, complicating detection efforts while remaining adaptable across various scenarios.

Data Poisoning in LLMs

The survey extends its examination to LLMs, which are increasingly vulnerable to data poisoning across multiple stages of their operation, such as pre-training and fine-tuning. The paper discusses specific strategies for poisoning these stages, highlighting the complexity and nuanced nature of such attacks, especially as LLMs become more integrated into AI-driven applications.

Future Directions

The paper suggests several avenues for further research, emphasizing the need to develop more effective and less detectable poisoning strategies, improve the robustness of existing models, and explore poisoning in emerging AI architectures such as multimodal systems. This includes calls for comprehensive benchmarking frameworks to standardize attack evaluations and foster reproducibility in this evolving field.

Conclusion

Data poisoning is a multifaceted threat that continues to challenge the reliability and security of deep learning models. This survey consolidates current knowledge, delineates potential vulnerabilities, and charts a path forward for both advancing our understanding and enhancing defenses. By synthesizing diverse attack methodologies and highlighting their implications, the paper serves as a pivotal reference point for future inquiries into safeguarding AI systems.

Markdown Report Issue