Papers
Topics
Authors
Recent
Search
2000 character limit reached

Statistical Inference for Clustering-based Anomaly Detection

Published 25 Apr 2025 in stat.ML and cs.LG | (2504.18633v1)

Abstract: Unsupervised anomaly detection (AD) is a fundamental problem in machine learning and statistics. A popular approach to unsupervised AD is clustering-based detection. However, this method lacks the ability to guarantee the reliability of the detected anomalies. In this paper, we propose SI-CLAD (Statistical Inference for CLustering-based Anomaly Detection), a novel statistical framework for testing the clustering-based AD results. The key strength of SI-CLAD lies in its ability to rigorously control the probability of falsely identifying anomalies, maintaining it below a pre-specified significance level $\alpha$ (e.g., $\alpha = 0.05$). By analyzing the selection mechanism inherent in clustering-based AD and leveraging the Selective Inference (SI) framework, we prove that false detection control is attainable. Moreover, we introduce a strategy to boost the true detection rate, enhancing the overall performance of SI-CLAD. Extensive experiments on synthetic and real-world datasets provide strong empirical support for our theoretical findings, showcasing the superior performance of the proposed method.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 3 likes about this paper.