Constructing Large-Scale Real-World Benchmark Datasets for AIOps

Published 8 Aug 2022 in cs.SE and cs.PF | (2208.03938v1)

Abstract: Recently, AIOps (Artificial Intelligence for IT Operations) has been well studied in academia and industry to enable automated and effective software service management. Plenty of efforts have been dedicated to AIOps, including anomaly detection, root cause localization, incident management, etc. However, most existing works are evaluated on private datasets, so their generality and real performance cannot be guaranteed. The lack of public large-scale real-world datasets has prevented researchers and engineers from enhancing the development of AIOps. To tackle this dilemma, in this work, we introduce three public real-world, large-scale datasets about AIOps, mainly aiming at KPI anomaly detection, root cause localization on multi-dimensional data, and failure discovery and diagnosis. More importantly, we held three competitions in 2018/2019/2020 based on these datasets, attracting thousands of teams to participate. In the future, we will continue to publish more datasets and hold competitions to promote the development of AIOps further.

Abstract PDF Upgrade to Chat

Authors (8)

Citations (19)

View on Semantic Scholar

Summary

The paper introduces three public datasets that address the scale, realism, and diversity limitations in AIOps research.
It details KPI anomaly detection, multi-dimensional root cause localization, and failure diagnosis across varied IT operational scenarios.
The datasets, developed in collaboration with industry, enable robust benchmarking and foster innovation in automated IT operations.

Constructing Large-Scale Real-World Benchmark Datasets for AIOps

Introduction

The paper "Constructing Large-Scale Real-World Benchmark Datasets for AIOps" explores the significant challenge in AIOps (Artificial Intelligence for IT Operations) posed by the lack of public, large-scale, real-world datasets. AIOps aims to leverage ML and big data functionalities to automate and enhance IT operations, including anomaly detection, root cause analysis (RCA), and incident management, among others. This research introduces three publicly available datasets to address this gap, facilitating the benchmarking of various AIOps methodologies.

Existing Limitations in AIOps Research

Current AIOps research heavily relies on private datasets, leading to challenges in evaluating and generalizing models to different scenarios. The paper identifies three primary limitations:

Scenario Specificity: Existing datasets often cover narrow AIOps scenarios, inadequately representing the diverse operational environments encountered in real-world systems.
Dataset Scale: The datasets used in many studies are not sufficiently large, given the scope and scale of real-world IT infrastructures.
Realism: Many datasets are synthetic, lacking the fidelity required for real-world applications.

These limitations highlight the necessity for comprehensive datasets that encompass diverse AIOps scenarios and adequately reflect real-world system behaviors.

Contributions and Published Datasets

The researchers address these issues by introducing three distinct datasets focused on different aspects of AIOps:

Dataset A: Aimed at KPI anomaly detection, this dataset contains real-world KPIs with various patterns and labeled anomalies. It provides a comprehensive testing ground for anomaly detection models.
Figure 1: Some KPI examples of dataset A.
Dataset B: Targeted at root cause localization in multi-dimensional data, this dataset includes structured log data from an online shopping platform, providing rich multi-dimensional data for dissecting fault instances.
Dataset C: Encompasses failure discovery and diagnosis, incorporating traces, metrics, and KPI data from a distributed system environment to support end-to-end failure detection and RCA.

These datasets are constructed from collaborations with industry partners, ensuring realism and variety reflective of actual IT environments. They have fostered significant academic and industry involvement through the structured competitions held annually, promoting advancements in AIOps methodologies.

Practical Implications and Theoretical Insights

From a practical standpoint, these datasets facilitate robust benchmarking and model evaluation, enabling researchers to validate their approaches under realistic constraints. The introduction of public datasets aligns with the vision of bringing AIOps to the maturity level achieved by fields like computer vision with ImageNet.

Theoretically, this work underlines the necessity of interdisciplinary collaboration between academia and industry to produce resources central to advancing technology deployment and integration in operational domains. It also encourages future research to further explore cross-domain applications of the datasets provided, leveraging data-driven insights to inform resource management and fault tolerance strategies within complex IT systems.

Conclusion

The paper marks a significant step towards overcoming barriers in AIOps research by supplying essential data resources to the community. The datasets and associated competitions not only enhance model evaluability and comparability but also drive innovation in automated IT operations through collaborative efforts. The authors plan to continue expanding these datasets and fostering an environment ripe for AIOps research and application growth, underpinning the crucial nature of intelligently managed IT services in contemporary digital infrastructures.

Markdown Report Issue