We Need to Rethink Benchmarking in Anomaly Detection
Abstract: Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. Current benchmarking does not, for example, sufficiently reflect the diversity of anomalies in applications ranging from predictive maintenance to scientific discovery. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that capture the relevant characteristics of different applications. We identify three key areas for improvement: First, we need to identify anomaly detection scenarios based on a common taxonomy. Second, anomaly detection pipelines should be analyzed end-to-end and by component. Third, evaluating anomaly detection algorithms should be meaningful regarding the scenario's objectives.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.