- The paper introduces DuFIA, an adversarial framework that manipulates mid-layer feature importance by fusing spatial and frequency signals.
- It employs integrated gradients and DCT-based perturbations under L∞ constraints to achieve superior cross-model attack transferability.
- Experimental results show DuFIA outperforms baseline methods with enhanced robustness and nearly imperceptible perturbations (e.g., PSNR 33.42 dB).
Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detectors
Introduction
The proliferation of high-fidelity generative models including GANs and DMs—such as ProGAN, StyleGAN2, CycleGAN, and Stable Diffusion—has necessitated the development of robust AI-generated image (AIGI) detectors. However, the security of these detectors under adversarial conditions is comparatively underexplored. The "Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector" (2511.15571) introduces DuFIA, an adversarial framework designed to invalidate AIGI detectors by targeting transferable and semantically critical features across both spatial and frequency domains. The approach extends the paradigm of intermediate-level adversarial attacks (ILA), introducing domain fusion to significantly enhance cross-model attack transferability.
Methodology
DuFIA operates on the key insight that AIGI detectors, while achieving impressive detection accuracy in clean conditions, are vulnerable to adversarial perturbations that exploit transferable feature representations. The method comprises several distinct components:
Dual-Domain Feature Analysis
The frequency domain offers a complementary perspective fundamental for AIGI detection. Synthetic versus real images frequently exhibit divergent spectral characteristics, exploited by modern detectors. DuFIA’s use of spectrum saliency maps (SSMs) reveals two critical phenomena: (1) frequency perturbations reduce spectral discrimination between real and synthetic images, and (2) different AIGI detectors leverage disparate frequency bands. Randomized perturbation in the frequency domain thus decreases inter-model specificity and increases attack transferability.
Figure 2: Mean spectrum saliency maps across AIGI detectors, demonstrating how frequency perturbations homogenize decision-relevant spectral features.
Experimental Evaluation
DuFIA was benchmarked against six leading adversarial methods (FGSM, PGD, Carlini & Wagner, MI-FGSM, FIA) on a comprehensive dataset spanning GANs, DMs, Deepfakes, and varied AIGI detector architectures. Notably:
- DuFIA consistently yields lowest post-attack accuracy values across all detectors (GAN-based, deepfake, and diffusion-generated images), attesting to superior cross-model transferability.
- In black-box scenarios, DuFIA exhibits a mean accuracy of 0.22 (UnivFD as source) compared to 0.386 (MIFGSM) and 0.399 (FIA).
- Perceptual quality metrics indicate minimal conspicuousness: DuFIA achieves highest PSNR (33.42 dB), SSIM (0.881), and lowest LPIPS (0.062), indicating perturbations are nearly imperceptible.
Robustness to Real-World Degradations
DuFIA adversarial samples maintain their attack efficacy under common image degradations (JPEG compression, Gaussian blur, additive noise):
- Across all post-processing variants, DuFIA’s attacked images evade detection more reliably (lowest detector accuracy), underlying its practical robustness.
- For instance, under heavy JPEG compression (Q=30), the RINE detector accuracy drops to 0.551, outperforming all compared baselines.
Ablation Study
Systematic ablation confirms that joint spatial-frequency perturbation offers notable improvement over isolated domain attacks, both in terms of cross-model accuracy and robustness. Frequency-only and spatial-only strategies underperform compared to their fused counterpart.
Implications and Future Directions
DuFIA exposes fundamental limitations of current AIGI detectors against transferable, dual-domain adversarial attacks. The fusion of spatial and spectral feature importance enables perturbations to remain effective even under severe real-world degradations, a property essential for circumventing forensic scrutiny on synthetic visual media. The methodology proposes a paradigm shift from output-based attacks towards mid-layer feature importance manipulation, with domain-aware fusion ensuring maximal transferability.
Theoretical implications include the need for detector architectures less reliant on either spatial or spectral domain features exclusively. Practically, deploying robust AIGI detection now necessitates resilience against both spatial and frequency-aware adversaries. Prospective avenues include adaptive and data-driven fusion schemes and expansion of DuFIA-type attacks into multimodal (e.g., text-image) synthetic content detection.
Conclusion
The Dual-domain Feature Importance Attack (DuFIA) constitutes a significant advancement in adversarial strategies targeting AI-generated image detectors. By leveraging fused spatial and frequency domain feature importances, the framework achieves superior cross-model transferability and maintains imperceptible perturbation artifacts. The experiments substantiate its effectiveness against diverse detectors and under varying degradation scenarios. The long-term implication is a call for fundamentally new AIGI detector architectures and countermeasures capable of withstanding sophisticated, domain-complement adversarial methodologies.