Fluently Lying: Adversarial Robustness Can Be Substrate-Dependent

Published 1 Apr 2026 in cs.CV | (2604.00605v1)

Abstract: The primary tools used to monitor and defend object detectors under adversarial attack assume that when accuracy degrades, detection count drops in tandem. This coupling was assumed, not measured. We report a counterexample observed on a single model: under standard PGD, EMS-YOLO, a spiking neural network (SNN) object detector, retains more than 70% of its detections while mAP collapses from 0.528 to 0.042. We term this count-preserving accuracy collapse Quality Corruption (QC), to distinguish it from the suppression that dominates untargeted evaluation. Across four SNN architectures and two threat models (l-infinity and l-2), QC appears only in one of the four detectors tested (EMS-YOLO). On this model, all five standard defense components fail to detect or mitigate QC, suggesting the defense ecosystem may rely on a shared assumption calibrated on a single substrate. These results provide, to our knowledge, the first evidence that adversarial failure modes can be substrate-dependent.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces Quality Corruption (QC), an attack failure mode in hardware-deployable SNN object detectors that decouples detection count from accuracy.
It demonstrates that standard ANN-derived defenses fail in SNNs due to substrate-specific constraints and unique membrane dynamics.
Experimental results reveal a dramatic mAP collapse under moderate PGD attacks, underscoring the need for substrate-aware robustness evaluation.

Substrate Dependence of Adversarial Robustness: The Quality Corruption Failure Mode

Introduction

The paper "Fluently Lying: Adversarial Robustness Can Be Substrate-Dependent" (2604.00605) examines adversarial robustness in object detectors from the perspective of computational substrate. Specifically, it challenges a foundational, unexamined assumption underlying adversarial evaluation and defense: that adversarial degradation in accuracy is coupled with a reduction in detection count. The authors expose and systematically analyze a novel untargeted attack failure mode—Quality Corruption (QC)—uniquely arising in a spiking neural network (SNN)-based object detector (EMS-YOLO), whose architectural constraints match those required for neuromorphic chip deployment. QC manifests as a detection count-preserving but severely accuracy-compromising output under standard PGD attacks, revealing that adversarial robustness and failure modes can critically depend on the computational substrate. The work also demonstrates the systematic failure of ANN-derived defenses when transferred to SNN object detectors with deployable architectures.

Context and Motivation

Adversarial robustness research in object detection has focused predominantly on artificial neural network (ANN) substrates, with decades of work confirming that untargeted attacks—e.g., PGD, FGSM—produce a "suppression" failure mode: detection counts and accuracy degrade together. Evaluation metrics, monitoring protocols, and defenses (detection-rate monitoring, count-based alarms, adversarial training, input purification, and attack certification) were built upon this assumed coupling. However, with the emergence of SNNs and neuromorphic hardware constraints—binary spikes, accumulate-only arithmetic, and no dense matrix computation—the substrate may diverge significantly from the ANN paradigm, raising the question of whether failure-mode coupling and adversarial robustness indeed transfer.

Experimental Protocol and Models

The authors evaluate four recent SNN object detectors, chosen to span encoding schemes, neuron types, temporal depths, and hardware deployability. Only EMS-YOLO satisfies all three neuromorphic hardware constraints and is directly trained end-to-end, whereas others relax at least one constraint for accuracy improvements. Each SNN is paired with an ANN reference (e.g., YOLOv3-tiny, YOLOv8s, YOLOX-S). All experiments are conducted on the COCO 2017 dataset. The test protocol applies white-box PGD attacks (both $\ell_\infty$ and $\ell_2$ , with varying budgets), constant across models, with evaluation metrics of mAP and Detection Rate Reduction (DRR).

Critically, the paper introduces the Quality Corruption Index (QCI):

$\mathrm{QCI} = \text{(relative mAP drop)} - \text{DRR}$

allowing explicit measurement of decoupling between detection count and accuracy.

Discovery and Characterization of Quality Corruption

In a striking divergence from canonical suppression, EMS-YOLO under moderate PGD perturbation ( $\ell_\infty$ , $\varepsilon=8/255$ ) retains over 70% of its detections, yet experiences a collapse in mAP from 0.528 to 0.042 (QCI = +63.0). Surviving detection boxes, classes, and confidences are nearly all wrong—they are hallucinations distributed across irrelevant regions.

Figure 1: Under standard PGD, EMS-YOLO exhibits Quality Corruption: detection count is preserved but prediction accuracy collapses, whereas the ANN baseline exhibits complete suppression.

The QC phenomenon only emerges in EMS-YOLO—the only hardware-deployable SNN—across both norm constraints, is robust to attack optimizer (PGD, APGD, CW), and persists for a range of perturbation budgets. Stronger optimization transitions the failure mode from QC (count-preserving, undetectable) back to suppression (count-collapsing, detectable), contrary to the trend observed in ANN-based detectors. Other SNN pipelines and all ANN references consistently yield suppression (QCI $\approx 0$ ), confirming that QC is not a generic SNN property, nor attributable to attack design.

Per-image analysis reveals high heterogeneity in QC severity across the dataset, with per-image QCI spanning from -254 to over +1300 under identical perturbation settings.

(Figure 2)

Figure 2: Distribution of per-image QCI in EMS-YOLO reveals substantial heterogeneity: a majority of images are corruption-dominant (QCI $>0$ ), but the range is extreme.

Substrate as the Critical Variable

Control experiments rule out backbone, training, encoding, or neuron-type as the determinant; only the computational substrate—i.e., the conjunction of neuromorphic constraints—predicts the emergence of QC. The ANN reference for EMS-YOLO, as well as all non-deployable SNNs and their corresponding ANNs, continue to exhibit classic suppression under identical attacks.

Transfer attacks (black-box, crafted on other models and applied to EMS-YOLO) do not induce QC, demonstrating that gradient information of the deployable pipeline is necessary for this failure mode. Furthermore, by introducing a Focused Membrane Probe (FMP) attack term targeting SNN membrane potentials explicitly, the authors show that membrane dynamics constitute an exploitable attack surface exclusively in the deployable SNN architecture, not in SNNs relaxing hardware constraints.

Systematic Failure of the ANN Defense Toolkit

Applying five standard ANN-derived defense and monitoring techniques to EMS-YOLO under QC yields comprehensive failure:

Detection-Rate Monitoring: Silent, because detection counts are preserved.
Count-Based Alarms: Ineffective, for the same reason.
Input Purification: Shifts QC back to suppression but does not restore accuracy—all detection-bearing boxes are removed, not corrected.
Adversarial Training: Fails to restore robustness, instead eliminating corrupted detections (transitioning QC to suppression without improving accuracy), unlike behavior observed in non-deployable pipelines.
Empirical $\varepsilon$ -Certification: Mis-bounds worst case, since the (less optimized) PGD attack yields more damaging, less detectable outcomes than stronger attacks.

Thus, the assumption that these components are layered and independent is contradicted in the presence of QC, where a single underlying premise—the count-accuracy coupling—unravels the entire defense stack.

Broader Implications

Theoretical and Safety Implications

The work demonstrates that adversarial failure modes are substrate-dependent and that failure-mode preservation cannot be assumed across substrate transitions (e.g., ANN $\to$ SNN, quantization, pruning, compilation for hardware deployment). QC—by decoupling monitoring signals from inference quality—creates a blind spot for existing safety regimes. This has direct relevance to neuromorphic deployment in safety-critical applications (e.g., autonomous driving), where a downstream planner would receive fluently corrupted, yet unmonitored, outputs.

The results suggest that safety and certification practices must audit not just accuracy but also failure-mode preservation explicitly after every substrate transition. QCI, or a similarly sensitive metric, should be introduced as a protocol standard.

Prospects and Open Problems

The findings open several avenues for future research:

QC-Aware Defense: Existing mechanisms are fundamentally insufficient; research must address how to detect and correct count-preserving accuracy collapse, potentially requiring distributional or semantic monitoring.
Generalization to Other Substrates: Whether other architectural substrates (e.g., aggressive quantization, sparsity, custom compilation) can induce similar decoupling remains unresolved.
Failure-Mode Characterization: Beyond accuracy, the pattern (and detectability) of failure should become a core evaluation criterion.
Mechanistic Understanding: The structural cause of QC in hardware-deployable SNN pipelines (e.g., emergent membrane dynamics) remains open for analysis.

Conclusion

This study provides compelling evidence that adversarial robustness and failure modes are fundamentally substrate-dependent, breaking a key assumption underpinning the adversarial machine learning ecosystem. The Quality Corruption failure mode emerges in hardware-deployable SNN object detectors under standard untargeted attacks, evading all existing monitoring and defense strategies calibrated on ANN paradigms. These findings necessitate a re-evaluation of adversarial defense and certification for models deployed on novel substrates, with QCI-type metrics mandated for future safety assurance.