Performance of ∂ILP on Larger PaySim Training Sets

Determine the classification performance of Differentiable Inductive Logic Programming (∂ILP) when trained on a larger training set, or the full training set, of the PaySim fraud detection dataset, given that experiments in the present study were restricted by memory limitations. Quantify how metrics such as precision, recall, F1, and MCC change as the training set size increases beyond the subsets used here.

Background

The study evaluates Differentiable Inductive Logic Programming (∂ILP) for fraud detection using the PaySim dataset but could not train on the full dataset due to memory constraints inherent to ∂ILP’s clause generation and constants handling.

As a result, while performance was compared on reduced training subsets and shown to be comparable to Deep Symbolic Classification but below Decision Trees on several metrics, the effect of scaling up the training set remains unresolved and is explicitly flagged as future research.

References

Training of the full PaySim dataset was not possible due to the memory limitation, therefore it is not clear what would be the performance when trained on a larger training set (hence part of future research).

Differentiable Inductive Logic Programming for Fraud Detection  (2410.21928 - Wolfson et al., 2024) in Conclusion (Section 7)