EnsemFDet: An Ensemble Approach to Fraud Detection based on Bipartite Graph

Published 23 Dec 2019 in cs.LG, cs.SI, and stat.ML | (1912.11113v4)

Abstract: Fraud detection is extremely critical for e-commerce business. It is the intent of the companies to detect and prevent fraud as early as possible. Existing fraud detection methods try to identify unexpected dense subgraphs and treat related nodes as suspicious. Spectral relaxation-based methods solve the problem efficiently but hurt the performance due to the relaxed constraints. Besides, many methods cannot be accelerated with parallel computation or control the number of returned suspicious nodes because they provide a set of subgraphs with diverse node sizes. These drawbacks affect the real-world applications of existing methods. In this paper, we propose an Ensemble-based Fraud Detection (EnsemFDet) method to scale up fraud detection in bipartite graphs by decomposing the original problem into subproblems on small-sized subgraphs. By oversampling the graph and solving the subproblems, the ensemble approach further votes suspicious nodes without sacrificing the prediction accuracy. Extensive experiments have been done on real transaction data from JD.com, which is one of the world's largest e-commerce platforms. Experimental results demonstrate the effectiveness, practicability, and scalability of EnsemFDet. More specifically, EnsemFDet is up to 100x faster than the state-of-the-art methods due to its parallelism with all aspects of data.

Abstract PDF Upgrade to Chat

Citations (38)

View on Semantic Scholar

Summary

The paper introduces an ensemble-based fraud detection method that leverages bipartite graph dense subgraph identification to uncover coordinated fraudulent activities.
Its novel FDet algorithm, combined with structural sampling techniques, achieves up to 100x faster processing speed than conventional methods.
Validation on real-world JD.com data demonstrates enhanced scalability, accuracy, and adaptability to evolving fraud patterns in large-scale e-commerce platforms.

EnsemFDet: An Ensemble Approach to Fraud Detection Based on Bipartite Graph

Abstract

The paper introduces EnsemFDet, a novel ensemble-based method for fraud detection in large-scale e-commerce platforms like JD.com, leveraging bipartite graphs to identify fraudulent activity through dense subgraph detection. Traditional fraud detection methods face challenges such as inefficiency in real-time settings and limited scalability, which EnsemFDet addresses by decomposing the problem into manageable subproblems and utilizing parallel processing.

Introduction

The prevalence of online fraud, particularly in the context of promotional campaigns, presents significant challenges for e-commerce platforms. Traditional rule-based and supervised learning methods often fall short, given the rapidly evolving nature of fraudulent behaviors and the difficulty in obtaining labeled training data. EnsemFDet uses graph-based methods, focusing on the identification of dense subgraphs representing synchronous and rare behaviors typical of fraudsters.

Methodology

Bipartite Graph Representation

EnsemFDet utilizes a bipartite graph where nodes represent user accounts and merchants, and edges represent transactions. Fraudulent activities manifest as dense subgraphs due to the synchronized actions of coordinated groups of fraudsters aiming to exploit promotional offers.

Sampling Methods

To address the computational challenges associated with large-scale graphs, EnsemFDet employs three structural sampling methods: Random Edge Sampling (RES), One-Side Node Sampling (ONS), and Two-Side Node Sampling (TNS). These methods aid in reducing the problem's complexity while preserving the essential structural characteristics of potential fraud patterns.

Figure 1: The Structure of {EnsemFDet} for fraud detection in promotional campaigns.

Fraud DETection (FDet) Algorithm

The FDet algorithm operates on sampled subgraphs, implementing a heuristic method to detect dense subgraphs. It iteratively identifies and removes nodes that contribute to the highest density scores until reaching a truncating point. This approach automatically determines the optimal number of fraudster groups, $\hat{k}$ , enhancing the algorithm's applicability in dynamic and large-scale environments.

Experimental Results

Performance Evaluation

Experiments were conducted on real-world datasets from JD.com to validate EnsemFDet's efficacy. The datasets included multiple instances of fraud scenarios in different promotional campaigns.

Figure 2: Performance comparison of different methods, highlighting superior F1 scores for {EnsemFDet} compared to other spectral and heuristic-based approaches.

Scalability and Efficiency

EnsemFDet significantly outperforms other methods in terms of processing speed, achieving up to 100x faster execution than state-of-the-art methods under equivalent conditions. This improvement is attributed to its parallel processing capability and the effective decomposition of the detection task into smaller, more manageable subproblems.

Figure 3: Performance comparison between {EnsemFDet} and traditional heuristic models.

Discussion

Implications and Future Work

EnsemFDet's ensemble framework and use of bipartite graphs for dense subgraph identification provide a robust solution to the perennial problem of fraud detection in e-commerce. Its capacity to adapt to various fraud patterns without the need for extensive labeled data makes it particularly well-suited for real-world applications, where fraud scenarios evolve rapidly.

Future work could explore the integration of dynamic network analysis to enhance the model's responsiveness to new types of fraud as they emerge, ensuring that fraud detection mechanisms remain one step ahead of adversaries.

Conclusion

EnsemFDet offers a practical, scalable, and effective solution for fraud detection within large-scale e-commerce platforms. By leveraging graph-based methodologies and ensemble learning strategies, it addresses key challenges in fraud detection, promising significant improvements in both accuracy and computational efficiency. The successful deployment and test on JD.com affirm its potential to revolutionize fraud prevention strategies across the sector.

Markdown Report Issue