Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending
Abstract: Peer-to-peer (P2P) lending connects borrowers and lenders through online platforms but suffers from significant information asymmetry, as lenders often lack sufficient data to assess borrowers' creditworthiness. This paper addresses this challenge by leveraging BERT, a LLM known for its ability to capture contextual nuances in text, to generate a risk score based on borrowers' loan descriptions using a dataset from the Lending Club platform. We fine-tune BERT to distinguish between defaulted and non-defaulted loans using the loan descriptions provided by the borrowers. The resulting BERT-generated risk score is then integrated as an additional feature into an XGBoost classifier used at the loan granting stage, where decision-makers have limited information available to guide their decisions. This integration enhances predictive performance, with improvements in balanced accuracy and AUC, highlighting the value of textual features in complementing traditional inputs. Moreover, we find that the incorporation of the BERT score alters how classification models utilize traditional input variables, with these changes varying by loan purpose. These findings suggest that BERT discerns meaningful patterns in loan descriptions, encompassing borrower-specific features, specific purposes, and linguistic characteristics. However, the inherent opacity of LLMs and their potential biases underscore the need for transparent frameworks to ensure regulatory compliance and foster trust. Overall, this study demonstrates how LLM-derived insights interact with traditional features in credit risk modeling, opening new avenues to enhance the explainability and fairness of these models.
- Explainability of a Machine Learning Granting Scoring Model in Peer-to-Peer Lending. IEEE Access 8 (2020), 64873–64890. https://doi.org/10.1109/ACCESS.2020.2984412
- Risk-return modelling in the p2p lending market: Trends, gaps, recommendations and future directions. Electronic Commerce Research and Applications 49 (2021), 101079. https://doi.org/10.1016/j.elerap.2021.101079
- Prompted Opinion Summarization with GPT-3.5. arXiv preprint arXiv:2211.15914 (2022). https://doi.org/10.48550/arXiv.2211.15914
- Continual Lifelong Learning in Natural Language Processing: A Survey. In Proceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Spain (Online), 6523–6541. https://doi.org/10.18653/v1/2020.coling-main.574
- Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165
- Spanish Pre-Trained BERT Model and Evaluation Data. In Practical ML for Developing Countries Workshop at ICLR 2020. https://doi.org/10.48550/arXiv.2308.02976
- Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021). https://doi.org/10.48550/arXiv.2107.03374
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
- Addressing Information Asymmetries in Online Peer-to-Peer Lending. Springer International Publishing, Cham, 15–31. https://doi.org/10.1007/978-3-030-02330-0_2
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018). https://doi.org/10.48550/ARXIV.1810.04805
- Topic Modeling in Embedding Spaces. Transactions of the Association for Computational Linguistics 8 (07 2020), 439–453. https://doi.org/10.1162/tacl_a_00325
- Description-text related soft information in peer-to-peer lending – Evidence from two leading European platforms. Journal of Banking & Finance 64 (2016), 169–187. https://doi.org/10.1016/j.jbankfin.2015.11.009
- Qiang Gao and Mingfeng Lin. 2015. Lemon or Cherry? The Value of Texts in Debt Crowdfunding. Technical Report 18. Center for Analytical Finance. University of California, Santa Cruz. https://cafin.ucsc.edu/research/work_papers/CAFIN_WP18.pdf
- Target-Dependent Sentiment Classification With BERT. IEEE Access 7 (2019), 154290–154299. https://doi.org/10.1109/ACCESS.2019.2946594
- Tell Me a Good Story and I May Lend You My Money: The Role of Narratives in Peer-to-Peer Lending Decisions. SSRN Electronic Journal (2011). https://doi.org/10.2139/ssrn.1840668
- John H. Holland. 1992. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press, Cambridge, Massachusetts, USA. https://doi.org/10.7551/mitpress/1090.001.0001
- Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Annals of Operations Research 266, 1–2 (Oct. 2017), 511–529. https://doi.org/10.1007/s10479-017-2668-z
- Johannes Kriebel and Lennart Stitz. 2022. Credit default prediction from user-generated text in peer-to-peer lending using deep learning. European Journal of Operational Research 302, 1 (Oct. 2022), 309–323. https://doi.org/10.1016/j.ejor.2021.12.024
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. https://doi.org/10.48550/ARXIV.1909.11942
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (09 2019), 1234–1240. https://doi.org/10.1093/bioinformatics/btz682
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019). https://doi.org/10.48550/arXiv.1910.13461
- Network topology and systemic risk in Peer-to-Peer lending market. Physica A: Statistical Mechanics and its Applications 508 (2018), 118–130. https://doi.org/10.1016/j.physa.2018.05.083
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/ARXIV.1907.11692
- Tim Loughran and Bill McDonald. 2011. When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance 66, 1 (2011), 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x
- Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
- CamemBERT: a Tasty French Language Model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 7203–7219. https://doi.org/10.18653/v1/2020.acl-main.645
- Jeremy Michels. 2012. Do Unverifiable Disclosures Matter? Evidence from Peer-to-Peer Lending. The Accounting Review 87, 4 (2012), 1385–1413.
- Efficient Estimation of Word Representations in Vector Space. https://doi.org/10.48550/ARXIV.1301.3781
- CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering. arXiv preprint arXiv:2307.04683 (2023). https://doi.org/10.48550/arXiv.2307.04683
- Do Facial Images Matter? Understanding the Role of Private Information Disclosure in Crowdfunding Markets. Electronic Commerce Research and Applications 54, C (jul 2022), 14Â pages. https://doi.org/10.1016/j.elerap.2022.101173
- Language Models are Unsupervised Multitask Learners. Technical Report. OpenAI. https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 1, Article 140 (jan 2020), 67Â pages.
- ROFIEG. 2019. Thirty recommendations on regulation, innovation and finance. Final Report to the European Commission by the Expert Group on Regulatory Obstacles to Financial Innovation. Technical Report. European Commission. https://ec.europa.eu/info/files/191113-report-expert-group-regulatory-obstacles-financial-innovation_en
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/ARXIV.1910.01108
- Michael Siering. 2023. Peer-to-Peer (P2P) Lending Risk Management: Assessing Credit Risk on Social Lending Platforms Using Textual Factors. ACM Transactions on Management Information Systems 14, 3, Article 25 (jun 2023), 19Â pages. https://doi.org/10.1145/3589003
- The value of text for small business default prediction: A Deep Learning approach. European Journal of Operational Research 295, 2 (Dec. 2021), 758–771. https://doi.org/10.1016/j.ejor.2021.03.008
- How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics, Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer International Publishing, Cham, 194–206.
- Text Classification via Large Language Models. https://doi.org/10.48550/ARXIV.2305.08377
- Xu Sun and Weichao Xu. 2014. Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Signal Processing Letters 21, 11 (2014), 1389–1393. https://doi.org/10.1109/LSP.2014.2337313
- Vijay Srinivas Tida and Sonya Hy Hsu. 2022. Universal Spam Detection using Transfer Learning of BERT Model. In Proceedings of the 55th Hawaii International Conference on System Sciences. 7669–7677. http://hdl.handle.net/10125/80263
- Attention Is All You Need. (2017). https://doi.org/10.48550/ARXIV.1706.03762
- Credit Risk Evaluation Based on Text Analysis. International Journal of Cognitive Informatics and Natural Intelligence 10 (01 2016), 1–11. https://doi.org/10.4018/IJCINI.2016010101
- Predicting loan default in peer-to-peer lending using narrative data. Journal of Forecasting 39, 2 (2020), 260–280. https://doi.org/10.1002/for.2625
- Identifying features for detecting fraudulent loan requests on P2P platforms. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI). 79–84. https://doi.org/10.1109/ISI.2016.7745447
- Peer-to-Peer Loan Fraud Detection: Constructing Features from Transaction Data. MIS Quarterly 45, 3 (Sept. 2022), 1777–1792. https://doi.org/10.25300/misq/2022/16103
- The relationship between soft information in loan titles and online peer-to-peer lending: evidence from RenRenDai platform. Electronic Commerce Research 19, 1 (2018), 111–129. https://doi.org/10.1007/s10660-018-9293-z
- Credit risk evaluation model with textual features from loan descriptions for P2P lending. Electronic Commerce Research and Applications 42 (2020), 100989. https://doi.org/10.1016/j.elerap.2020.100989
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.