Entropy-Based Non-Invasive Reliability Monitoring of Convolutional Neural Networks

Published 29 Aug 2025 in cs.CV, cs.AI, cs.CR, cs.IT, and eess.IV | (2508.21715v1)

Abstract: Convolutional Neural Networks (CNNs) have become the foundation of modern computer vision, achieving unprecedented accuracy across diverse image recognition tasks. While these networks excel on in-distribution data, they remain vulnerable to adversarial perturbations imperceptible input modifications that cause misclassification with high confidence. However, existing detection methods either require expensive retraining, modify network architecture, or degrade performance on clean inputs. Here we show that adversarial perturbations create immediate, detectable entropy signatures in CNN activations that can be monitored without any model modification. Using parallel entropy monitoring on VGG-16, we demonstrate that adversarial inputs consistently shift activation entropy by 7% in early convolutional layers, enabling 90% detection accuracy with false positives and false negative rates below 20%. The complete separation between clean and adversarial entropy distributions reveals that CNNs inherently encode distribution shifts in their activation patterns. This work establishes that CNN reliability can be assessed through activation entropy alone, enabling practical deployment of self-diagnostic vision systems that detect adversarial inputs in real-time without compromising original model performance.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper presents an entropy-based framework that detects adversarial perturbations in CNNs through real-time analysis of activation patterns.
It utilizes entropy and mutual information measures on key layers to differentiate clean from adversarial samples, achieving 90% detection accuracy.
The non-invasive approach runs parallel to the CNN with minimal overhead, making it practical for real-world AI reliability monitoring.

Entropy-Based Non-Invasive Reliability Monitoring of Convolutional Neural Networks

Introduction

The paper "Entropy-Based Non-Invasive Reliability Monitoring of Convolutional Neural Networks" introduces an innovative method for monitoring the reliability of CNNs in real-time. The approach focuses on detecting adversarial perturbations through entropy signatures in activations without modifying the network architecture or requiring retraining. By leveraging information-theoretic measures such as entropy and mutual information, the method provides a practical, scalable, and efficient solution to enhance adversarial detection capabilities of CNNs.

Vulnerability and Detection Limitations

CNNs exhibit high accuracy on benchmark datasets but suffer from severe performance degradation under distribution shifts, including adversarial attacks. Existing detection approaches, such as uncertainty estimation and adversarial training, demand substantial computational resources and often compromise performance on clean data. These solutions are impractical for real-world deployments due to their reliance on architectural modifications or complex computation frameworks.

Information-Theoretic Monitoring Framework

The paper proposes a novel framework that bypasses the limitations of existing methods by analyzing entropy and mutual information patterns across CNN layers. This non-invasive approach does not require any modifications to the CNN architecture or additional training phases. Instead, the detection system runs in parallel to the original network, using hooks to capture activations and compute entropy statistics asynchronously. This setup imposes minimal computational overhead while allowing real-time monitoring of adversarial inputs.

Methodology

Key Monitoring Points

The framework targets two critical layers for monitoring: the early convolutional layer and a pre-classification fully connected layer. The early layer captures changes in low-level features due to adversarial noise, whereas the pre-classification layer evaluates disruptions in high-level semantic representations.

Entropy Computation

Entropy for each layer is computed by discretizing activation values into histograms, which are then normalized to obtain probability distributions. Binning strategies are optimized to capture differences between clean and adversarial entropy distributions effectively. The monitoring system forms detection scores by comparing current entropy measurements against established baselines from in-distribution data.

Detection Mechanism

A threshold-based detection method utilizing entropy separation enables identifying adversarial perturbations. The threshold is determined through detailed analysis of false positive and false negative rates, ensuring balanced detection performance.

Experimental Results

The proposed methodology was evaluated on the VGG-16 architecture using the ImageNet dataset. FGSM adversarial attacks were applied to assess the framework's performance. Results revealed a distinct separation in entropy distributions between clean and adversarial samples, particularly in the early convolutional layer. Detection accuracy reached 90% with zero false positives at the convolutional layer, demonstrating the method's efficacy in identifying adversarial inputs without degrading performance on clean samples.

Discussion

The paper highlights the practical advantages of the proposed entropy-based monitoring framework, which aligns with the principles of information bottleneck theory. This approach can detect adversarial perturbations without retraining models or modifying their architectures, making it suitable for deployment in constrained environments. However, the framework assumes access to intermediate activations and requires careful calibration based on operational conditions.

The methodology's success in detecting adversarial inputs supports further exploration into its applicability across broader types of attacks and different architectures. Additionally, integrating this approach with broader AI monitoring systems could lead to more comprehensive reliability assessments and adaptive resilience mechanisms.

Conclusion

The entropy-based non-invasive monitoring framework for CNNs effectively identifies adversarial perturbations through real-time analysis of layer-wise information patterns. This scalable solution maintains original model performance while adding self-diagnostic capabilities, crucial for deploying reliable vision systems in dynamic environments. Future work should focus on extending the validation to a wider range of attacks and architectures, ultimately contributing towards developing autonomous AI health monitoring systems.

Markdown Report Issue