Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector

Published 19 Nov 2025 in cs.CV and cs.CR | (2511.15571v1)

Abstract: Recent AI-generated image (AIGI) detectors achieve impressive accuracy under clean condition. In view of antiforensics, it is significant to develop advanced adversarial attacks for evaluating the security of such detectors, which remains unexplored sufficiently. This letter proposes a Dual-domain Feature Importance Attack (DuFIA) scheme to invalidate AIGI detectors to some extent. Forensically important features are captured by the spatially interpolated gradient and frequency-aware perturbation. The adversarial transferability is enhanced by jointly modeling spatial and frequency-domain feature importances, which are fused to guide the optimization-based adversarial example generation. Extensive experiments across various AIGI detectors verify the cross-model transferability, transparency and robustness of DuFIA.

Abstract PDF Upgrade to Chat

Summary

The paper introduces DuFIA, an adversarial framework that manipulates mid-layer feature importance by fusing spatial and frequency signals.
It employs integrated gradients and DCT-based perturbations under L∞ constraints to achieve superior cross-model attack transferability.
Experimental results show DuFIA outperforms baseline methods with enhanced robustness and nearly imperceptible perturbations (e.g., PSNR 33.42 dB).

Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detectors

Introduction

The proliferation of high-fidelity generative models including GANs and DMs—such as ProGAN, StyleGAN2, CycleGAN, and Stable Diffusion—has necessitated the development of robust AI-generated image (AIGI) detectors. However, the security of these detectors under adversarial conditions is comparatively underexplored. The "Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector" (2511.15571) introduces DuFIA, an adversarial framework designed to invalidate AIGI detectors by targeting transferable and semantically critical features across both spatial and frequency domains. The approach extends the paradigm of intermediate-level adversarial attacks (ILA), introducing domain fusion to significantly enhance cross-model attack transferability.

Methodology

DuFIA operates on the key insight that AIGI detectors, while achieving impressive detection accuracy in clean conditions, are vulnerable to adversarial perturbations that exploit transferable feature representations. The method comprises several distinct components:

Baseline Formulation: The adversarial objective is cast as maximizing the cross-entropy loss with respect to input images, subject to an $L_\infty$ -norm constraint on perturbation magnitude, following the MI-FGSM approach.
Intermediate Feature Importance: Moving beyond conventional output-based attacks, DuFIA leverages intermediate activations. Feature importance weights $\lambda^i$ are derived via backpropagation not only from adversarial examples but, critically, from the original (unperturbed) image to reduce model-specific adaptation.
Spatial Domain Importance: Integrated gradients (IG) across interpolated inputs are used to compute the spatial feature importance map, emphasizing features robust to model-specific variations.
Frequency Domain Importance: Images undergo perturbed DCT/IDCT transformations, with random frequency masking and additive noise, to probe and capture frequency-domain importance. Detector decision gradients with respect to DCT coefficients form the basis of frequency-aware importance.
Fusion Mechanism: The final feature importance is constructed by averaging spatial and frequency importance maps, $\lambda = (\lambda^{(s)} + \lambda^{(f)}) / 2$ , which is then used to weight the mid-layer activations of adversarial examples.
Objective Optimization: Rather than optimizing traditional classification loss, DuFIA maximizes $\sum (\lambda \odot h_t^{adv})$ at every iteration, effectively guiding perturbations along semantically meaningful, domain-agnostic features.
Figure 1: Schematic of the DuFIA pipeline illustrating dual-branch spatial and frequency perturbation, feature importance fusion, and adversarial sample generation guided by mid-layer weighted loss.

Dual-Domain Feature Analysis

The frequency domain offers a complementary perspective fundamental for AIGI detection. Synthetic versus real images frequently exhibit divergent spectral characteristics, exploited by modern detectors. DuFIA’s use of spectrum saliency maps (SSMs) reveals two critical phenomena: (1) frequency perturbations reduce spectral discrimination between real and synthetic images, and (2) different AIGI detectors leverage disparate frequency bands. Randomized perturbation in the frequency domain thus decreases inter-model specificity and increases attack transferability.

Figure 2: Mean spectrum saliency maps across AIGI detectors, demonstrating how frequency perturbations homogenize decision-relevant spectral features.

Experimental Evaluation

Adversarial Attack Performance

DuFIA was benchmarked against six leading adversarial methods (FGSM, PGD, Carlini & Wagner, MI-FGSM, FIA) on a comprehensive dataset spanning GANs, DMs, Deepfakes, and varied AIGI detector architectures. Notably:

DuFIA consistently yields lowest post-attack accuracy values across all detectors (GAN-based, deepfake, and diffusion-generated images), attesting to superior cross-model transferability.
In black-box scenarios, DuFIA exhibits a mean accuracy of 0.22 (UnivFD as source) compared to 0.386 (MIFGSM) and 0.399 (FIA).
Perceptual quality metrics indicate minimal conspicuousness: DuFIA achieves highest PSNR (33.42 dB), SSIM (0.881), and lowest LPIPS (0.062), indicating perturbations are nearly imperceptible.

Robustness to Real-World Degradations

DuFIA adversarial samples maintain their attack efficacy under common image degradations (JPEG compression, Gaussian blur, additive noise):

Across all post-processing variants, DuFIA’s attacked images evade detection more reliably (lowest detector accuracy), underlying its practical robustness.
For instance, under heavy JPEG compression (Q=30), the RINE detector accuracy drops to 0.551, outperforming all compared baselines.

Ablation Study

Systematic ablation confirms that joint spatial-frequency perturbation offers notable improvement over isolated domain attacks, both in terms of cross-model accuracy and robustness. Frequency-only and spatial-only strategies underperform compared to their fused counterpart.

Implications and Future Directions

DuFIA exposes fundamental limitations of current AIGI detectors against transferable, dual-domain adversarial attacks. The fusion of spatial and spectral feature importance enables perturbations to remain effective even under severe real-world degradations, a property essential for circumventing forensic scrutiny on synthetic visual media. The methodology proposes a paradigm shift from output-based attacks towards mid-layer feature importance manipulation, with domain-aware fusion ensuring maximal transferability.

Theoretical implications include the need for detector architectures less reliant on either spatial or spectral domain features exclusively. Practically, deploying robust AIGI detection now necessitates resilience against both spatial and frequency-aware adversaries. Prospective avenues include adaptive and data-driven fusion schemes and expansion of DuFIA-type attacks into multimodal (e.g., text-image) synthetic content detection.

Conclusion

The Dual-domain Feature Importance Attack (DuFIA) constitutes a significant advancement in adversarial strategies targeting AI-generated image detectors. By leveraging fused spatial and frequency domain feature importances, the framework achieves superior cross-model transferability and maintains imperceptible perturbation artifacts. The experiments substantiate its effectiveness against diverse detectors and under varying degradation scenarios. The long-term implication is a call for fundamentally new AIGI detector architectures and countermeasures capable of withstanding sophisticated, domain-complement adversarial methodologies.

Markdown Report Issue