- The paper presents a multi-source data fusion approach that combines cyber and physical inputs to reduce false positives and improve IDS accuracy.
- It employs various machine learning techniques, achieving a 15-20% improvement in F1-score compared to single-domain models.
- Experimental evaluations on the RESLab testbed demonstrate effective detection of complex attacks, including FDI and MiTM.
Multi-Source Data Fusion for Cyberattack Detection in Power Systems
Introduction
The paper "Multi-Source Data Fusion for Cyberattack Detection in Power Systems" (2101.06897) explores the challenges and methodologies for detecting cyberattacks in power systems using multi-source data fusion. It presents a comprehensive approach towards integrating data from diverse sensors and modalities to enhance the accuracy of intrusion detection systems (IDS) in cyber-physical systems like power grids. Traditional IDS approaches, focusing solely on either the cyber or physical domains, often suffer from high false alarm rates and miss out on complex attack vectors. By leveraging data fusion, the system seeks to harness the strengths of both domains to identify cyberattacks that exploit zero-day vulnerabilities.
Data Fusion Architecture
The proposed fusion architecture is implemented on the RESLab testbed, which emulates real-world power system operations and communication networks. The architecture integrates multiple sensors to capture both cyber and physical data, aggregating measurements such as network logs, intrusion detection alerts, and sensor readings from control equipment. This approach allows for a holistic view of the system state, enabling improved detection of anomalies and cyber intrusions.
Figure 1: Centralized fusion architecture. In the autonomous architecture the Fusion and Learning blocks will be interchanged with an addition of another Learning block post fusion.
The data fusion process involves several steps, including feature extraction, time-synchronized merging, and pre-processing techniques like imputation and encoding to prepare data for machine learning models. By combining cyber and physical data, the fusion mechanism enhances the detection capability of the IDS, particularly for attacks like False Data Injection (FDI) and Man-in-The-Middle (MiTM) attacks.
Experimental Setup and Use Cases
The testbed setup includes components like network emulators, power system simulators, and data collection tools such as Elasticsearch and Pyshark. Various cyber attack scenarios, specifically MiTM and FDI/FCI (False Command Injection) attacks, are simulated to validate the fusion-based IDS. These scenarios are crucial for understanding how cross-domain fusion can be leveraged to detect complex attacks that manifest differently across cyber and physical layers.
Figure 2: Testbed architecture with data fusion.
Machine Learning Techniques
The paper explores a spectrum of machine learning techniques for IDS, including supervised, unsupervised, and semi-supervised learning approaches. Supervised learning models like Decision Trees, Random Forests, and Neural Networks were evaluated, demonstrating significant improvements in detection accuracy when using fused data compared to using singular domain data.
Co-training, a semi-supervised method, splits the dataset into cyber and physical views, enabling robust training even when some labels are missing. This method shows comparable performance to conventional supervised learning techniques, with substantial potential in real-world scenarios where labeled data may be limited.
Figure 3: Co-training based fusion for labeled and unlabeled datasets.
The fusion-based IDS outperforms traditional methods by effectively reducing false positives and improving detection rates. Experimental results highlight that cyber-physical feature fusion results in an average of 15-20% improvement in F1-score over purely cyber-based detection. The insightful analysis provided by manifold learning visualizations and clustering evaluations underscores the effectiveness of multi-source data fusion in enhancing the interpretability of intrusion detection results.




Figure 4: Agglomerative clustering with different number of clusters.
Conclusion
The fusion of cyber and physical features provides a promising approach to enhancing IDS in power systems, addressing the limitations of traditional cyber-centric models. This paper contributes valuable insights and methodologies for integrating diverse data sources in a coherent manner to protect critical infrastructure from sophisticated cyber threats. Future work may focus on scalability challenges and more extensive real-world validations to solidify the practical applicability of this approach in large-scale deployment scenarios.