Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG

Published 5 Sep 2024 in cs.LG, cs.AI, and cs.HC | (2409.03646v2)

Abstract: In contrast to human vision, artificial neural networks (ANNs) remain relatively susceptible to adversarial attacks. To address this vulnerability, efforts have been made to transfer inductive bias from human brains to ANNs, often by training the ANN representations to match their biological counterparts. Previous works relied on brain data acquired in rodents or primates using invasive techniques, from specific regions of the brain, under non-natural conditions (anesthetized animals), and with stimulus datasets lacking diversity and naturalness. In this work, we explored whether aligning model representations to human EEG responses to a rich set of real-world images increases robustness to ANNs. Specifically, we trained ResNet50-backbone models on a dual task of classification and EEG prediction; and evaluated their EEG prediction accuracy and robustness to adversarial attacks. We observed significant correlation between the networks' EEG prediction accuracy, often highest around 100 ms post stimulus onset, and their gains in adversarial robustness. Although effect size was limited, effects were consistent across different random initializations and robust for architectural variants. We further teased apart the data from individual EEG channels and observed strongest contribution from electrodes in the parieto-occipital regions. The demonstrated utility of human EEG for such tasks opens up avenues for future efforts that scale to larger datasets under diverse stimuli conditions with the promise of stronger effects.

Abstract PDF Upgrade to Chat

Summary

The paper explores co-training neural networks with human EEG data using a dual-task framework to enhance adversarial robustness in object recognition models.
Key findings demonstrate limited but consistent robustness gains across models, which correlate with EEG prediction accuracy, particularly from parieto-occipital channels approximately 100 ms post-stimulus.
The study positions human EEG as a viable source of biological inductive biases for enhancing ANN robustness, suggesting potential for future work with larger datasets and advanced integration methods.

Analyzing the Integration of Human EEG in Enhancing Adversarial Robustness of Neural Networks

The susceptibility of artificial neural networks (ANNs) to adversarial attacks remains a prominent research challenge in the field of computer vision despite their extensive performance in object recognition. Traditional approaches to increasing ANN resilience have focused predominantly on architecture-based or optimization-based inductive biases. This paper investigates a novel direction by exploring the co-training of ANNs with human electroencephalogram (EEG) data, aiming to glean inductive biases that can enhance the networks' robustness against adversarial disruptions.

The study predominantly employs ResNet50 as the backbone architecture, extending it into a dual-task learning (DTL) framework to perform both image classification and EEG prediction tasks concurrently. This approach is inspired by the inherently robust nature of human perception, presumably capable of transferring beneficial attributes to ANNs. The EEG data used in training is obtained from humans exposed to naturalistic images, offering a biologically relevant context distinct from more constrained datasets typically involving non-human subjects and unrealistic stimuli.

Experimental Methodology

The empirical setup involves the co-training of ResNet50 models across multiple architectures, differentiated by their extended modules, such as dense layers, recurrent neural networks (RNNs), transformers, and attention mechanisms, aimed at predicting EEG signals. The primary measure of success is the gain in adversarial robustness observed when aligning the network's representation more closely with EEG predictions, ultimately evaluated using metrics such as Pearson Correlation Coefficient (PCC) between the predicted and actual EEG data.

To quantify adversarial robustness, the study makes use of well-known adversarial attacks, including PGD (both $L_2$ and $L_\infty$ bounds) and Carlini & Wagner’s methods, by generating perturbed inputs and evaluating the model's classification accuracy against these inputs. Importantly, the analysis also includes control conditions with shuffled EEG data and random datasets to assess the intrinsic value of using authentic EEG signals.

Key Findings

The results indicate that, while improvements in robustness are not marked as substantial, they display consistency across various models and initialization scenarios. Notably, a significant correlation is evident between the EEG prediction accuracy and the robustness gained, particularly for EEG signals captured approximately 100 ms post-stimulus. It highlights an intriguing temporal aspect where certain neural activities are more valuable for robustness.

An analysis of individual EEG channels further reveals that mid-level channels, specifically those in the parieto-occipital regions, play a critical role in contributing to these robustness gains, even as early visual channels are most accurately predicted. This suggests a complex interaction between temporal EEG features and network robustness, providing insights that challenge conventional assessments relying solely on lower-order vision signals.

Implications and Future Directions

This study situates human EEG as a viable and rich source of biological inductive biases for enhancing ANN robustness against adversarial attacks. The experimental evidence here confirms the potential paradigm shift towards using human neural data instead of invasive, non-human animal datasets, encouraging more accessible and scalable research within neural-inspired AI models.

In closing, while this research underscores modest yet consistent improvements, it lays a foundational approach ripe for further exploration with larger EEG datasets and more sophisticated frameworks leveraging neural characteristic phenomena. It invites future studies to consider not only the scale and diversity of EEG data but also the refinement of integration techniques within network architectures aspiring for biologically inspired resilience. The study also raises questions on whether the systematic observation of robustness gains from seemingly random EEG configurations necessitates deeper investigation into ANN initialization techniques and their broader implications.

Markdown Report Issue