Deep Learning-Based Attacks

Updated 4 February 2026

Deep learning-based attacks are techniques that exploit neural network vulnerabilities using adversarial examples, poisoning, and side-channel approaches.
Methodologies span gradient-based, black-box, and physical attacks, achieving high success rates and demonstrating transferability across various domains.
Defense strategies such as adversarial training, certified defenses, and input transformations are evolving to counter these complex threats.

Deep learning-based attacks are a diverse class of offensive techniques that exploit or subvert systems deploying deep neural networks by leveraging their characteristic vulnerabilities, specifically their sensitivity to carefully crafted or maliciously modified inputs, models, or training pipelines. These attacks have evolved rapidly, targeting a wide range of applications including computer vision, natural language processing, wireless communication, quantum cryptography, recommender systems, mobile apps, and physical security domains. The methodologies span both inference-time (evasion) and training-time (poisoning, backdoor injection) vectors, with attack power scaling from white-box access (full model parameters and gradients) to highly restricted black-box or even decision-only query scenarios. Below, we delineate the principal categories, mechanisms, representative results, and current research directions for deep learning-based attacks.

1. Taxonomy of Deep Learning-Based Attacks

Deep learning-based attacks are classified by their target, attacker capability, and attack phase:

Adversarial example attacks: Crafting perturbations $\delta$ to input $x$ so that $f(x+\delta)\neq f(x)$ while keeping $\|\delta\|_p$ small. These can be white-box (gradient-based: FGSM, PGD, CW) or black-box (substitute-model, query-based) (Wang et al., 2024, He et al., 2019, Deng et al., 2022, Cao et al., 2021, Nguyen et al., 2017, Abdukhamidov et al., 2022).
Data poisoning and backdoor/Trojan attacks: Training-time injection of crafted data or triggers, causing persistent targeted or broad-spectrum model misbehavior at inference (Wang et al., 2024, He et al., 2019, Huang et al., 2021, Costales et al., 2020).
Model extraction and model inversion: Stealing proprietary model logic or training-set secrets by exploiting access to prediction queries (He et al., 2019, Wang et al., 2024).
Physical or side-channel attacks: Exploiting non-idealities in sensing (e.g., video compression, RF, cryptosystems) to bypass or compromise deep models (Chang et al., 2023, Ma et al., 12 Dec 2025, Sadeghi et al., 2018, Luo et al., 2021, Huang et al., 2020).
Attacks on interpretability/explainability mechanisms: Deceiving both model predictions and their interpretation layers to evade detection or analysis (Abdukhamidov et al., 2022).
Hybrid, application-specific attacks: Targeting domains such as Android apps (Huang et al., 2022, Deng et al., 2022), DNS/network intrusion detectors (Mathews et al., 2022), quantum cryptography (Lejeune et al., 2024), and side-channel traces (Luo et al., 2021).

This typology underscores the cross-domain applicability and adaptive technical scope of modern deep-learning-based attacks.

2. Methodologies and Attack Mechanisms

Gradient-Based Adversarial Attacks use knowledge of the model’s loss landscape to maximize misclassification with minimal perturbation. Key algorithms:

Fast Gradient Sign Method (FGSM): $x' = x + \epsilon \cdot \mathrm{sign}(\nabla_x L(\theta, x, y))$
Projected Gradient Descent (PGD): Iteratively applies FGSM-style updates with projection onto an $\ell_p$ -ball.
Carlini-Wagner (CW) Attack: Solves $\min_\delta \|\delta\|_p + c \cdot g(x+\delta)$ , tightly controlling distortion (Nguyen et al., 2017, Wang et al., 2024).

Black-Box and Decision-Based Attacks:

Surrogate/transfer-model approach: Attackers query the target model to build a substitute, then transfer white-box perturbations (Cao et al., 2021, Deng et al., 2022, Sadeghi et al., 2018).
Decision-based, sparse perturbations: Only top-1 label output is needed. Evolutionary or combinatorial optimization finds minimal $\ell_0$ -norm attacks (e.g., SparseEvo) (Vo et al., 2022).

Physical and Protocol-Level Attacks:

Physically realizable attacks: Attacks are implemented in the physical world, e.g., via projected light patterns (NetFlick adversarial flicker) (Chang et al., 2023).
Side-channel and RF attacks: Injecting perturbations into analog waveforms to manipulate deep classifiers directly at the signal level (Ma et al., 12 Dec 2025, Sadeghi et al., 2018, Luo et al., 2021).
Quantum cryptography: Deep RNNs process measurement records from continuous quantum measurements, inferring secret keys with high accuracy at minimal protocol disturbance (Lejeune et al., 2024).

Interpretability/Explanation Attacks:

Jointly optimizing for prediction error and preservation of explanation consistency, deceiving both classifier and interpretation maps (AdvEdge/AdvEdge $^{+}$ ) (Abdukhamidov et al., 2022).

Poisoning and Model-Data Attacks:

Optimization-based poisoning manipulates the loss landscape to maximize downstream item promotion or backdoor success (e.g., NeuMF recommender poisoning, backdoor weight patching) (Huang et al., 2021, Costales et al., 2020).
Live Trojan attacks directly patch DNN weights at runtime, achieving targeted behavior with minimal detectable modification (Costales et al., 2020).

3. Application Domains and Practical Demonstrations

Deep learning-based attacks have been empirically validated in the following domains:

Computer Vision (CV): Image classification, object recognition, and video compression. Digital and physical attacks can cause significant accuracy and quality degradation, even under operational constraints (e.g., PSNR drops, high attack success rates) (Cao et al., 2021, Chang et al., 2023, Vo et al., 2022).
Wireless and IoT Security: RSSI-based authentication (RFFI), modulation classification, DNS/NIDS. Universal and per-sample perturbations achieve >95% misclassification with extremely low perturbation energy—far exceeding classical jamming in effectiveness (Ma et al., 12 Dec 2025, Sadeghi et al., 2018, Mathews et al., 2022).
Autonomous Driving: Physical attacks on traffic signs, LiDAR, radar; cyberattacks on OTA updates. End-to-end attacks result in large steering deviations and nearly perfect targeted misclassification in real scenarios (Deng et al., 2021).
Mobile Apps and Embedded Models: Extraction and attack of TFLite/PyTorch models on Android, using grey-box and semantic black-box strategies. Employing transfer-learning surrogates, over 71% of real-world apps were found vulnerable to practical attacks (Huang et al., 2022, Deng et al., 2022).
Quantum Cryptography: Deep RNN-based side-channel attacks on BB84 QKD protocols, achieving 86.1% key inference accuracy with marginal QBER increase (Lejeune et al., 2024).
Privacy and Explainability: Recovery of sensitive content (e.g., through envelopes) and deception of interpretation systems, illuminating new classes of threat (Huang et al., 2020, Abdukhamidov et al., 2022).

4. Impact Metrics and Quantitative Results

Attack efficiency and severity are measured by:

Attack Success Rate (ASR): Proportion of inputs successfully misclassified, often exceeding 90% in white-box settings or with strong transferability [ $\sim66.6\%$ avg. for black-box app attacks (Cao et al., 2021); $>80\%$ for UAP on RF, $>95\%$ for physical video attacks (Ma et al., 12 Dec 2025, Chang et al., 2023)].
Perturbation Norms: $\ell_2$ , $\ell_\infty$ , $\ell_0$ metrics, typically bounded by imperceptibility thresholds (e.g., $\epsilon=8/255$ for image, $<-30$ dB $for RF).</li> <li><strong>Resource Cost</strong>: Number of queries in black-box attacks ($ \lesssim 10^4 $for label-only sparse attacks (<a href="/papers/2202.00091" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Vo et al., 2022</a>); 800–2000 for substitute models (<a href="/papers/1911.12562" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">He et al., 2019</a>)).</li> <li><strong>System Effects</strong>: Drop in PSNR/bitrate (video), steered deviation (autonomous driving), QBER (quantum), Hit Ratio in recommenders, detector evasion rates.</li> <li><strong>Transferability</strong>: Universal perturbations crafted on one architecture apply effectively to others of similar or different type (CNN$ \rightarrow $RNN, pre-trained$ \rightarrow $fine-tuned) (<a href="/papers/2512.12002" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Ma et al., 12 Dec 2025</a>, <a href="/papers/2204.11075" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Huang et al., 2022</a>).</li> </ul> <div class='overflow-x-auto max-w-full my-4'><table class='table border-collapse w-full' style='table-layout: fixed'><thead><tr> <th>Domain</th> <th>Attack Success Rate</th> <th>Perturbation Budget</th> </tr> </thead><tbody><tr> <td>Android Apps</td> <td>66.6% (black-box)</td> <td>$ \epsilon=8/255 $–$ 20/255 $</td> </tr> <tr> <td>RFFI</td> <td>95–98% (white-box); 81.7% (universal, no prior)</td> <td>$ < $–30 dB <a href="https://www.emergentmind.com/topics/parameter-shift-rule-psr" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">PSR</a></td> </tr> <tr> <td>Video Compression</td> <td>92–98% (offline); 83–86% (universal, online)</td> <td>$ \epsilon=0.2 $</td> </tr> <tr> <td>Quantum QKD</td> <td>Key inference 86.1%</td> <td>$ \sim$2.6% QBER penalty CV Black-box 47.35% apps broken Adaptive

These metrics demonstrate that deep learning-based attacks remain potent even under severe information restrictions, minimal access, and strong resource constraints.

5. Defense Strategies and Mitigation

Countermeasures span both proactive and reactive strategies:

Adversarial training: Incorporating worst-case perturbed samples into the training loop, raising robust accuracy at the expense of clean accuracy and compute (Wang et al., 2024, Ma et al., 12 Dec 2025, Deng et al., 2021).
Certified and randomized defenses: PixelDP, convex outer approximation, randomized smoothing, and network regularizations target theoretical robustness (Wang et al., 2024, Deng et al., 2021).
Input transformations: JPEG compression, feature squeezing, and custom denoisers can partially suppress straightforward attacks (Deng et al., 2021, Deng et al., 2022).
Detection and anomaly monitoring: Ensemble interpretation detectors for explanation attacks, SVM/statistical screening for fake user detection in recommender poisoning, anomaly detectors for spectrum inputs (Abdukhamidov et al., 2022, Huang et al., 2021, Ma et al., 12 Dec 2025).
Model protection and deployment-hardening: Obfuscation, encryption, secure enclaves, API-hardening, and rate-limiting defend against extraction/model-theft and unlimited adversarial querying (Huang et al., 2022, Deng et al., 2022).
Mitigating physical attacks: Hardware redundancy, sensor fusion, and physical shielding disrupt physically realizable attacks in autonomous and IoT contexts (Deng et al., 2021, Chang et al., 2023).
Domain-specific countermeasures: Envelope design for privacy, mixup/data augmentation in side-channel analysis to bolster resistance (Huang et al., 2020, Luo et al., 2021).

No single defense eliminates all deep learning-based attacks, and most effective approaches combine multiple mechanisms matched to the threat model and operational environment.

6. Trends, Limitations, and Open Research Problems

Transferability and black-box potency: Modern attacks exploit transferability, universal perturbations, and proxy models to bypass limited access restrictions in practical deployments (Cao et al., 2021, Huang et al., 2022, Ma et al., 12 Dec 2025).
Real-World Deployment Gaps: Standard academic models are less indicative of real-system vulnerability; practical attacks require adaptation to quantization, hidden I/O, and proprietary frameworks (Deng et al., 2022).
Physical-World/Protocol Attacks: Empirical evidence highlights that attacks can be realized under real environmental constraints (lighting, air transmission, device synchronization) (Chang et al., 2023, Ma et al., 12 Dec 2025, Luo et al., 2021).
Hybrid, multi-layer, and explanation attacks: Emerging research targets the interpretability stack, federated learning, and automation of defense strategies, with questions about the robustness–accuracy trade-off and interpretability under adversarial manipulation (Abdukhamidov et al., 2022, Wang et al., 2024).
Automated and scalable defenses: The field is moving toward integrated, automated security architectures, zero-trust methodologies, and formal certification of deep models, especially as models and applications scale (Wang et al., 2024, Deng et al., 2021).

A persistent research focus remains on quantifying human-perceptual imperceptibility, lowering the cost of robust training and detection, and securing models across both digital and physical threat surfaces.

7. Representative Case Studies

Black-box Transfer Attacks on Mobile Apps: Substitute models trained via API logging and public datasets yield an average ASR of 66.6% on diverse, real-world apps, markedly exceeding prior methods (Cao et al., 2021).
Universal, Decision-based Sparse Attacks: The SparseEvo algorithm achieves 99% untargeted ASR on ImageNet within 5,000 queries, with highly sparse perturbations ( $\sim0.08\%$ pixel change), demonstrating that even decision-only black-box access is not sufficient defense (Vo et al., 2022).
Practical Physical Attacks on Video Compression: NetFlick adversarial flicker attacks achieve 92-98% ASR digitally and up to 86% with universal, online physical perturbations, drastically degrading PSNR and bitrates (Chang et al., 2023).
Trojan Injection by Live Weight Patching: Trojaned DNNs can be realized in-memory at runtime, with minimal patch size and high trigger stealthiness, show casing feasibility for on-device and cloud deployments (Costales et al., 2020).
Attack on Quantum Key Distribution: A deep RNN-based continuous measurement scheme allows a spy to infer $\sim86\%$ of the sifted key in BB84 QKD, for only a 2.6%-point QBER penalty, rivaling the optimal quantum cloner (Lejeune et al., 2024).