Multi-Source Data Fusion for Cyberattack Detection in Power Systems

Published 18 Jan 2021 in cs.LG, cs.SY, and eess.SY | (2101.06897v1)

Abstract: Cyberattacks can cause a severe impact on power systems unless detected early. However, accurate and timely detection in critical infrastructure systems presents challenges, e.g., due to zero-day vulnerability exploitations and the cyber-physical nature of the system coupled with the need for high reliability and resilience of the physical system. Conventional rule-based and anomaly-based intrusion detection system (IDS) tools are insufficient for detecting zero-day cyber intrusions in the industrial control system (ICS) networks. Hence, in this work, we show that fusing information from multiple data sources can help identify cyber-induced incidents and reduce false positives. Specifically, we present how to recognize and address the barriers that can prevent the accurate use of multiple data sources for fusion-based detection. We perform multi-source data fusion for training IDS in a cyber-physical power system testbed where we collect cyber and physical side data from multiple sensors emulating real-world data sources that would be found in a utility and synthesizes these into features for algorithms to detect intrusions. Results are presented using the proposed data fusion application to infer False Data and Command injection-based Man-in- The-Middle (MiTM) attacks. Post collection, the data fusion application uses time-synchronized merge and extracts features followed by pre-processing such as imputation and encoding before training supervised, semi-supervised, and unsupervised learning models to evaluate the performance of the IDS. A major finding is the improvement of detection accuracy by fusion of features from cyber, security, and physical domains. Additionally, we observed the co-training technique performs at par with supervised learning methods when fed with our features.

Abstract PDF Upgrade to Chat

Citations (58)

View on Semantic Scholar

Summary

The paper presents a multi-source data fusion approach that combines cyber and physical inputs to reduce false positives and improve IDS accuracy.
It employs various machine learning techniques, achieving a 15-20% improvement in F1-score compared to single-domain models.
Experimental evaluations on the RESLab testbed demonstrate effective detection of complex attacks, including FDI and MiTM.

Multi-Source Data Fusion for Cyberattack Detection in Power Systems

Introduction

The paper "Multi-Source Data Fusion for Cyberattack Detection in Power Systems" (2101.06897) explores the challenges and methodologies for detecting cyberattacks in power systems using multi-source data fusion. It presents a comprehensive approach towards integrating data from diverse sensors and modalities to enhance the accuracy of intrusion detection systems (IDS) in cyber-physical systems like power grids. Traditional IDS approaches, focusing solely on either the cyber or physical domains, often suffer from high false alarm rates and miss out on complex attack vectors. By leveraging data fusion, the system seeks to harness the strengths of both domains to identify cyberattacks that exploit zero-day vulnerabilities.

Data Fusion Architecture

The proposed fusion architecture is implemented on the RESLab testbed, which emulates real-world power system operations and communication networks. The architecture integrates multiple sensors to capture both cyber and physical data, aggregating measurements such as network logs, intrusion detection alerts, and sensor readings from control equipment. This approach allows for a holistic view of the system state, enabling improved detection of anomalies and cyber intrusions.

Figure 1: Centralized fusion architecture. In the autonomous architecture the Fusion and Learning blocks will be interchanged with an addition of another Learning block post fusion.

The data fusion process involves several steps, including feature extraction, time-synchronized merging, and pre-processing techniques like imputation and encoding to prepare data for machine learning models. By combining cyber and physical data, the fusion mechanism enhances the detection capability of the IDS, particularly for attacks like False Data Injection (FDI) and Man-in-The-Middle (MiTM) attacks.

Experimental Setup and Use Cases

The testbed setup includes components like network emulators, power system simulators, and data collection tools such as Elasticsearch and Pyshark. Various cyber attack scenarios, specifically MiTM and FDI/FCI (False Command Injection) attacks, are simulated to validate the fusion-based IDS. These scenarios are crucial for understanding how cross-domain fusion can be leveraged to detect complex attacks that manifest differently across cyber and physical layers.

Figure 2: Testbed architecture with data fusion.

Machine Learning Techniques

The paper explores a spectrum of machine learning techniques for IDS, including supervised, unsupervised, and semi-supervised learning approaches. Supervised learning models like Decision Trees, Random Forests, and Neural Networks were evaluated, demonstrating significant improvements in detection accuracy when using fused data compared to using singular domain data.

Co-training, a semi-supervised method, splits the dataset into cyber and physical views, enabling robust training even when some labels are missing. This method shows comparable performance to conventional supervised learning techniques, with substantial potential in real-world scenarios where labeled data may be limited.

Figure 3: Co-training based fusion for labeled and unlabeled datasets.

Performance Analysis and Results

The fusion-based IDS outperforms traditional methods by effectively reducing false positives and improving detection rates. Experimental results highlight that cyber-physical feature fusion results in an average of 15-20% improvement in F1-score over purely cyber-based detection. The insightful analysis provided by manifold learning visualizations and clustering evaluations underscores the effectiveness of multi-source data fusion in enhancing the interpretability of intrusion detection results.

Figure 4: Agglomerative clustering with different number of clusters.

Conclusion

The fusion of cyber and physical features provides a promising approach to enhancing IDS in power systems, addressing the limitations of traditional cyber-centric models. This paper contributes valuable insights and methodologies for integrating diverse data sources in a coherent manner to protect critical infrastructure from sophisticated cyber threats. Future work may focus on scalability challenges and more extensive real-world validations to solidify the practical applicability of this approach in large-scale deployment scenarios.