Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training

Published 24 Jul 2019 in cs.CV, cs.CR, and cs.LG | (1907.10764v4)

Abstract: We introduce a feature scattering-based adversarial training approach for improving model robustness against adversarial attacks. Conventional adversarial training approaches leverage a supervised scheme (either targeted or non-targeted) in generating attacks for training, which typically suffer from issues such as label leaking as noted in recent works. Differently, the proposed approach generates adversarial images for training through feature scattering in the latent space, which is unsupervised in nature and avoids label leaking. More importantly, this new approach generates perturbed images in a collaborative fashion, taking the inter-sample relationships into consideration. We conduct analysis on model robustness and demonstrate the effectiveness of the proposed approach through extensively experiments on different datasets compared with state-of-the-art approaches.

Abstract PDF Upgrade to Chat

Citations (224)

View on Semantic Scholar

Summary

The paper introduces a feature scattering-based adversarial training method that generates unsupervised, collaborative adversarial examples to overcome label leaking.
The paper leverages optimal transport to maximize the feature matching distance between clean and perturbed samples within a bilevel optimization framework.
Experimental results on CIFAR10, CIFAR100, and SVHN show significant improvements in adversarial robustness, with CIFAR10 accuracy increasing by 25.6% over previous methods.

Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training

This paper presents an innovative feature scattering-based adversarial training method to enhance model robustness against adversarial attacks. Traditional adversarial training methods utilize a supervised scheme for generating adversarial samples, often encountering issues like label leaking. The proposed approach distinguishes itself by adopting an unsupervised methodology to generate adversarial images via feature scattering in the latent space, effectively circumventing the challenge of label leaking. Moreover, this approach emphasizes collaborative perturbation generation by considering inter-sample relationships, as opposed to treating each sample in isolation.

Main Contributions

The paper makes several contributions to improve adversarial training:

Novel Approach: It introduces a feature-scattering technique for creating adversarial images in an unsupervised, collaborative fashion. This method diverges from the traditional minimax formulation common in adversarial training.
Bilevel Optimization: The research explores an adversarial training formulation that fits within a broader category of bilevel optimization problems.
Robustness Analysis: Through extensive experimentation on various datasets, the paper analyzes the effectiveness of feature scattering in comparison to state-of-the-art adversarial training techniques.

Methodology

The feature scattering method relies on maximizing the feature matching distance between empirical distributions derived from clean and perturbed samples. The optimal transport (OT) distance serves as the metric for this comparison, leveraging ground features extracted from the data. This technique seeks to maintain the inter-sample structural integrity while generating adversarial perturbations, thus averting the pitfalls associated with label-guided adversarial examples that may deviate from the data manifold.

Experimental Results

The efficacy of the proposed approach is validated on benchmark datasets such as CIFAR10, CIFAR100, and SVHN:

On CIFAR10, the proposed method achieves a significant accuracy of 70.5% under a standard 20-step PGD attack, outperforming prior methods by notable margins (e.g., improving over the Madry method by 25.6%).
Experiments on CIFAR100 and SVHN further consolidate the robustness of the proposed approach, demonstrating substantial improvements in adversarial accuracy against white-box attacks compared to existing models.

Implications and Future Directions

The implications of this research are two-fold. Practically, the feature scattering method facilitates the training of models that are inherently more robust to adversarial attacks without incurring the time and computational resource penalties associated with traditional adversarial training iterations. Theoretically, it opens avenues for leveraging inter-sample features more effectively, encouraging exploration in collaborative perturbation techniques across machine learning domains.

Future research can further refine this unsupervised adversarial sample generation approach, potentially integrating other structural learning paradigms and exploring its applications in various domains beyond image classification, such as object detection and natural language processing. Additionally, investigating the theoretical bounds of adversarial robustness achievable through such collaborative methods can yield deeper insights into the limitations and capabilities of current adversarial defense strategies.

Markdown Report Issue