A comparative analysis of machine learning algorithms for predicting probabilities of default

Published 24 Jun 2025 in stat.AP, cs.LG, and q-fin.RM | (2506.19789v1)

Abstract: Predicting the probability of default (PD) of prospective loans is a critical objective for financial institutions. In recent years, ML algorithms have achieved remarkable success across a wide variety of prediction tasks; yet, they remain relatively underutilised in credit risk analysis. This paper highlights the opportunities that ML algorithms offer to this field by comparing the performance of five predictive models-Random Forests, Decision Trees, XGBoost, Gradient Boosting and AdaBoost-to the predominantly used logistic regression, over a benchmark dataset from Scheule et al. (Credit Risk Analytics: The R Companion). Our findings underscore the strengths and weaknesses of each method, providing valuable insights into the most effective ML algorithms for PD prediction in the context of loan portfolios.