Log-based Anomaly Detection based on EVT Theory with feedback

Published 8 Jun 2023 in cs.SE and cs.LG | (2306.05032v2)

Abstract: System logs play a critical role in maintaining the reliability of software systems. Fruitful studies have explored automatic log-based anomaly detection and achieved notable accuracy on benchmark datasets. However, when applied to large-scale cloud systems, these solutions face limitations due to high resource consumption and lack of adaptability to evolving logs. In this paper, we present an accurate, lightweight, and adaptive log-based anomaly detection framework, referred to as SeaLog. Our method introduces a Trie-based Detection Agent (TDA) that employs a lightweight, dynamically-growing trie structure for real-time anomaly detection. To enhance TDA's accuracy in response to evolving log data, we enable it to receive feedback from experts. Interestingly, our findings suggest that contemporary LLMs, such as ChatGPT, can provide feedback with a level of consistency comparable to human experts, which can potentially reduce manual verification efforts. We extensively evaluate SeaLog on two public datasets and an industrial dataset. The results show that SeaLog outperforms all baseline methods in terms of effectiveness, runs 2X to 10X faster and only consumes 5% to 41% of the memory resource.

Abstract PDF Upgrade to Chat

Citations (3)

View on Semantic Scholar

Summary

The paper introduces ScaleAD, a framework that integrates EVT theory with expert feedback for highly accurate log anomaly detection.
It details a trie-based detection agent (TDA) that dynamically parses and clusters log data, enabling real-time anomaly scoring with minimal resources.
The framework achieves robust F1 scores up to 0.990 in both offline and online settings while significantly reducing computational overhead compared to deep learning methods.

Log-based Anomaly Detection based on EVT Theory with Feedback

The paper "Log-based Anomaly Detection based on EVT Theory with Feedback" introduces ScaleAD, an anomaly detection framework designed to efficiently manage the vast amounts of log data generated by large-scale cloud systems. The system leverages a novel Trie-based Detection Agent (TDA) alongside expert feedback to adapt to evolving log data and maintain high detection accuracy rates.

Introduction to ScaleAD

ScaleAD is structured to meet the requirements of cloud vendors by being accurate, lightweight, and adaptive. The reliance on Extreme Value Theory (EVT) enables the system to detect anomalous log templates based on their frequency distribution efficiently. Its lightweight nature allows deployment on individual cloud instances without excessive resource consumption.

Figure 1: The overall framework of ScaleAD.

ScaleAD incorporates a trie-based approach for anomaly detection which allows parsing and clustering log data dynamically, storing logs templates in a compact form for efficient processing. This trie structure supports dynamic expansion—enabling adaptation to new log templates—and can perform anomaly detection using minimal computational resources.

Trie-based Detection Agent (TDA)

TDA consists of several steps: preprocessing, node traversal, leaf update, trie update, and anomaly detection. The preprocessing step extracts key components of each log message and traverses the internal trie nodes using domain knowledge and token frequency.

Figure 2: The workflow of trie-based detection agent (TDA).

When a log reaches a leaf node, it is matched against existing templates. Templates are updated based on new logs to maintain accuracy. The trie update process merges clusters with similar templates to refine the detection model. Anomaly detection is based on the occurrence frequency of log templates using EVT, yielding an anomaly score that signals potential system problems.

Incorporating Expert Feedback

In cases where new log templates appear, ScaleAD allows queries to experts to verify suspicious anomalies. Expert feedback helps refine the anomaly detection process further. By leveraging knowledge bases or utilizing LLMs like ChatGPT, ScaleAD can improve its decision-making capabilities.

Figure 3: A case study of using ChatGPT as an expert.

ChatGPT as an expert demonstrates high consistency in providing feedback similar to human experts, showcasing its ability as a viable option in the feedback loop. This integration reduces the manual labor required from on-call engineers.

Evaluation

ScaleAD is evaluated on two public datasets and an industrial dataset from Huawei Cloud. The framework consistently outperforms state-of-the-art methods in both offline and online settings, demonstrating superior adaptability and efficiency.

Figure 4: Experimental results of online anomaly detection.

ScaleAD maintains robust F1 scores reaching up to 0.990 in fixed log conditions, which confirms its effectiveness in the offline setting. In online scenarios, ScaleAD adapts to evolving log data while maintaining high performance metrics, benefiting from continuous learning via expert feedback.

Efficiency in Time and Space

ScaleAD exhibits impressive efficiency, being 2 to 10 times faster and using only 5\% to 41\% of memory compared to existing deep learning approaches. Its design facilitates practical deployment by ensuring low resource usage and high processing speed.

Figure 5: Experimental results of time and space efficiency comparison.

This makes ScaleAD a practical solution for real-time anomaly detection in resource-constrained cloud environments.

Conclusion

ScaleAD stands out as an effective, scalable, and adaptive solution for log-based anomaly detection in large-scale cloud systems. Its integration of EVT theory with expert feedback allows for robust performance under varying conditions while minimizing maintenance overhead. The framework addresses critical requirements within cloud environments, promising efficient anomaly detection with minimal resource impact. Future enhancements could explore more sophisticated feedback mechanisms to further improve anomaly detection accuracy and efficiency.