Contrastive Learning with Spectrum Information Augmentation in Abnormal Sound Detection

Published 19 Sep 2025 in cs.SD, cs.AI, and eess.AS | (2509.15570v1)

Abstract: The outlier exposure method is an effective approach to address the unsupervised anomaly sound detection problem. The key focus of this method is how to make the model learn the distribution space of normal data. Based on biological perception and data analysis, it is found that anomalous audio and noise often have higher frequencies. Therefore, we propose a data augmentation method for high-frequency information in contrastive learning. This enables the model to pay more attention to the low-frequency information of the audio, which represents the normal operational mode of the machine. We evaluated the proposed method on the DCASE 2020 Task 2. The results showed that our method outperformed other contrastive learning methods used on this dataset. We also evaluated the generalizability of our method on the DCASE 2022 Task 2 dataset.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a novel contrastive learning framework with high-frequency spectrum augmentation to enhance anomaly recognition.
It combines Log-mixup-exp, Random Resize Crop, and multiple segmentation techniques for robust unsupervised feature extraction.
Experimental results reveal superior performance (93.83% AUC) on DCASE datasets, indicating strong domain generalization.

Contrastive Learning with Spectrum Information Augmentation in Abnormal Sound Detection

This essay provides an in-depth exploration of the paper "Contrastive Learning with Spectrum Information Augmentation in Abnormal Sound Detection" (2509.15570), focusing on the key methodologies, experimental evaluations, and implications for future developments in the field of abnormal sound detection using machine learning.

Introduction

The paper addresses the challenges associated with detecting abnormal sounds in machine operations, highlighting issues such as data imbalance, the complexity of sound signals, and the generalization capabilities of models. In industrial settings, the ability to accurately detect anomalies in sound is crucial as these can be indicative of machine failure or maintenance requirements. The researchers propose a novel approach leveraging contrastive learning and spectrum information augmentation to focus on the low-frequency information representative of normal machine operation modes, which is less likely to include anomalies as these typically occur at higher frequencies.

Proposed Method

The core of the proposed method is a contrastive learning framework that emphasizes high-frequency feature augmentation. The technique involves generating audio samples with significant high-frequency contrasts, enabling the model to learn the normal operational patterns of machinery more effectively. This is achieved through a data augmentation strategy tailored to enhance the contrastive learning process.

Figure 1: The overall framework of the proposed method, illustrating the data augmentation and contrastive learning process for training the model to distinguish between normal and abnormal audio patterns.

The approach employs a combination of data augmentation techniques, including Log-mixup-exp for feature mixing, Random Resize Crop (RRC) for simulating pitch changes, and the use of multiple sections of data to construct positive and negative samples for the contrastive learning model. This design intends to enhance the model's sensitivity to deviations from normal sound patterns, thereby increasing its anomaly detection capabilities.

Figure 2: The spectrograms highlight anomalies primarily occurring in high-frequency ranges, supporting the focus on low-frequency information during model training.

Experimental Evaluation

Datasets

The method was evaluated using the DCASE 2020 and DCASE 2022 Task 2 datasets. These datasets include audio recordings from a variety of machine types, providing a robust platform for testing the model's capability to generalize across different domains and detect anomalies effectively.

Implementation Details

An optimization strategy incorporating the Adam optimizer, ResNet50 encoder, and InfoNCE loss was employed to train the model. The model's architecture was designed to maximize the potential of unsupervised pre-training and fine-tuning across various machine types.

Results

The proposed method demonstrated superior performance compared to existing techniques, achieving a 93.83% AUC and 87.6% pAUC on the DCASE 2020 dataset. These results emphasize the framework's enhanced ability to generalize across domains and accurately detect anomalies across multiple machine types.

Discussion

The research provides compelling evidence that focusing on high-frequency differences in sound data can significantly improve anomaly detection capabilities in industrial settings. The proposed augmentation techniques and contrastive learning strategy ensure more robust feature extraction, which translates to better generalization and detection accuracy.

The ability of the model to outperform other methodologies on both dataset tasks suggests potential broader applications in various industrial environments, where early anomaly detection is critical for predictive maintenance and operational efficiency. Moreover, the method's domain generalization capabilities hint at its effectiveness in environments with varying operational parameters and noise conditions.

Conclusion

This paper shows effective integration of contrastive learning with spectrum information augmentation, offering a significant advancement in the field of abnormal sound detection. Its application in real-world industrial scenarios can lead to more efficient machine health monitoring systems. Future research might explore the extension of this framework to broader audio domains and its integration with other types of sensory data for multi-modal anomaly detection.

Markdown Report Issue