Self-supervised Adversarial Training

Published 15 Nov 2019 in cs.LG and cs.CV | (1911.06470v2)

Abstract: Recent work has demonstrated that neural networks are vulnerable to adversarial examples. To escape from the predicament, many works try to harden the model in various ways, in which adversarial training is an effective way which learns robust feature representation so as to resist adversarial attacks. Meanwhile, the self-supervised learning aims to learn robust and semantic embedding from data itself. With these views, we introduce self-supervised learning to against adversarial examples in this paper. Specifically, the self-supervised representation coupled with k-Nearest Neighbour is proposed for classification. To further strengthen the defense ability, self-supervised adversarial training is proposed, which maximizes the mutual information between the representations of original examples and the corresponding adversarial examples. Experimental results show that the self-supervised representation outperforms its supervised version in respect of robustness and self-supervised adversarial training can further improve the defense ability efficiently.

Abstract PDF Upgrade to Chat

Citations (23)

View on Semantic Scholar

Summary

The paper's main contribution is integrating self-supervised learning with adversarial training by maximizing mutual information between clean and adversarial examples.
It demonstrates superior robustness on benchmarks like CIFAR-10 and STL-10, outperforming traditional supervised adversarial methods with higher defense success rates.
The proposed SAT framework efficiently alternates between adversarial sample generation and MI maximization, offering a scalable, label-free approach to boost model robustness.

Self-supervised Adversarial Training

Introduction

The susceptibility of deep neural networks (DNNs) to adversarial examples has prompted extensive research into improving model robustness. Conventional approaches such as adversarial training (AT) have been beneficial, but a novel perspective involves the use of self-supervised learning (SSL) to enhance model robustness. "Self-supervised Adversarial Training" explores this concept by integrating SSL with adversarial training to maximize mutual information (MI) between clean and adversarial examples, thereby fortifying the model against adversarial attacks.

Methodology

Self-supervised Representation

The primary objective is to leverage self-supervised features for classification using k-Nearest Neighbour (kNN). SSL intrinsically focuses on learning robust, semantic embeddings by predicting parts of the data from itself. This process inherently aligns with the goals of adversarial robustness, as SSL is designed to discern invariant features that are less susceptible to perturbations inherent in adversarial examples.

Figure 1: The diagram of self-supervised adversarial training.

Self-supervised Adversarial Training (SAT)

SAT advances the robustness of self-supervised models by maximizing MI between feature representations of clean images and their adversarial counterparts. This involves two key components: the generation of adversarial samples and the optimization of MI.

Adversarial Sample Generation: Building on the Projected Gradient Descent (PGD) method, adversarial examples are crafted by iteratively adjusting the input image to maximize perturbation within an $\epsilon$ -ball constraint.
Maximizing Mutual Information: By employing a noise contrastive estimator (NCE), MI between the feature representations of clean and adversarial examples is computed. The objective is to minimize the contrast loss, defined as the negative estimated MI, thereby ensuring robust feature alignment.

Algorithm Implementation

The SAT framework iteratively fine-tunes a pre-trained self-supervised model by alternating between adversarial sample creation and MI maximization. This process does not require labeled data, thus maintaining the self-supervised premise.

Experiments

Superior Performance of Self-supervised Features

Using AMDIM as a benchmark, self-supervised representations (SSL) demonstrated superior robustness compared to supervised counterparts (SUP), especially on CIFAR-10 and STL-10 datasets. For instance, on CIFAR-10, SSL's Defense Successful Rate (DSR) significantly eclipsed that of SUP, validating the enhanced robustness imparted by self-supervised learning.

Efficacy of SAT

Figure 2: The defense results of AMDIM and NPID using SAT on CIFAR-10.

Figure 3: The defense results of among self adversarial training and supervised adversarial training on CIFAR-10. AMDIM is selected as the seed model, and SUP and SSL mean the supervised and self-supervised version.

Figure 4: The defense results of among self adversarial training and supervised adversarial training on STL-10.

In SAT's efficacy trials, integrating NPID and AMDIM models, substantial improvements in DSR were observed across varying levels of perturbation. The time efficiency of SAT compared to traditional adversarial trainings, such as MAT and ALP, highlights its practical applicability in environments where labeled data is scarce or the computational budget is constrained.

Conclusion

The research bestows a promising augmentation to adversarial defenses by coupling self-supervised representations with adversarial training. Notably, self-supervised adversarial training offers a scalable, label-free method to boost model robustness against adversarial missteps. Future research avenues could explore self-supervised learning strategies that inherently account for adversarial dimensions during the training phase, potentially obviating the need for post-hoc adversarial reinforcement.