A Sophisticated Framework for the Accurate Detection of Phishing Websites

Published 13 Mar 2024 in cs.CR and cs.AI | (2403.09735v1)

Abstract: Phishing is an increasingly sophisticated form of cyberattack that is inflicting huge financial damage to corporations throughout the globe while also jeopardizing individuals' privacy. Attackers are constantly devising new methods of launching such assaults and detecting them has become a daunting task. Many different techniques have been suggested, each with its own pros and cons. While machine learning-based techniques have been most successful in identifying such attacks, they continue to fall short in terms of performance and generalizability. This paper proposes a comprehensive methodology for detecting phishing websites. The goal is to design a system that is capable of accurately distinguishing phishing websites from legitimate ones and provides generalized performance over a broad variety of datasets. A combination of feature selection, greedy algorithm, cross-validation, and deep learning methods have been utilized to construct a sophisticated stacking ensemble classifier. Extensive experimentation on four different phishing datasets was conducted to evaluate the performance of the proposed technique. The proposed algorithm outperformed the other existing phishing detection models obtaining accuracy of 97.49%, 98.23%, 97.48%, and 98.20% on dataset-1 (UCI Phishing Websites Dataset), dataset-2 (Phishing Dataset for Machine Learning: Feature Evaluation), dataset-3 (Phishing Websites Dataset), and dataset-4 (Web page phishing detection), respectively. The high accuracy values obtained across all datasets imply the models' generalizability and effectiveness in the accurate identification of phishing websites.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a stacking ensemble classifier that integrates feature selection, greedy algorithms, cross-validation, and deep learning to enhance detection accuracy.
It achieves accuracy rates above 97% across four diverse phishing datasets, demonstrating significant improvement in model generalizability.
The methodology overcomes limitations of existing models, offering a reliable and robust solution for real-world phishing website detection.

The paper "A Sophisticated Framework for the Accurate Detection of Phishing Websites" addresses the escalating challenge of detecting phishing websites, which pose significant risks to both corporations and individual privacy through increasingly cunning cyberattacks. The authors recognize the limitations in performance and generalizability of existing machine learning-based detection techniques and propose a novel solution to overcome these challenges.

Methodology

The paper introduces a comprehensive methodology to effectively differentiate between phishing and legitimate websites. Key components of this approach include:

Feature Selection: Identifying the most relevant features for phishing detection to enhance the model's accuracy.
Greedy Algorithm: Employing a strategy to optimize feature selection and improve model efficiency.
Cross-Validation: Utilizing this technique to ensure the model's robustness and consistency across different datasets.
Deep Learning Methods: Implementing advanced neural network architectures to capture complex patterns associated with phishing activities.

The authors propose a stacking ensemble classifier that integrates these techniques to build a more sophisticated and generalized model.

Experimental Evaluation

The proposed model was rigorously tested on four different phishing datasets to evaluate its performance:

UCI Phishing Websites Dataset
Phishing Dataset for Machine Learning: Feature Evaluation
Phishing Websites Dataset
Web Page Phishing Detection

The experiments demonstrated that the model achieved remarkable accuracy rates of 97.49%, 98.23%, 97.48%, and 98.20% across these datasets, respectively. These impressive accuracy figures underscore the model's generalizability and effectiveness in accurately identifying phishing websites across diverse scenarios.

Conclusion

The study highlights the model's potential to significantly enhance phishing detection capabilities, offering a reliable and robust alternative to existing solutions. The combination of feature selection, greedy algorithms, cross-validation, and deep learning methods contributes to a robust detection framework that addresses the current limitations of generalizability and performance in phishing detection. This research provides a significant step forward in the ongoing battle against phishing attacks.