Papers
Topics
Authors
Recent
Search
2000 character limit reached

Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending

Published 29 Jan 2024 in q-fin.RM, cs.AI, cs.CL, and cs.LG | (2401.16458v3)

Abstract: Peer-to-peer (P2P) lending connects borrowers and lenders through online platforms but suffers from significant information asymmetry, as lenders often lack sufficient data to assess borrowers' creditworthiness. This paper addresses this challenge by leveraging BERT, a LLM known for its ability to capture contextual nuances in text, to generate a risk score based on borrowers' loan descriptions using a dataset from the Lending Club platform. We fine-tune BERT to distinguish between defaulted and non-defaulted loans using the loan descriptions provided by the borrowers. The resulting BERT-generated risk score is then integrated as an additional feature into an XGBoost classifier used at the loan granting stage, where decision-makers have limited information available to guide their decisions. This integration enhances predictive performance, with improvements in balanced accuracy and AUC, highlighting the value of textual features in complementing traditional inputs. Moreover, we find that the incorporation of the BERT score alters how classification models utilize traditional input variables, with these changes varying by loan purpose. These findings suggest that BERT discerns meaningful patterns in loan descriptions, encompassing borrower-specific features, specific purposes, and linguistic characteristics. However, the inherent opacity of LLMs and their potential biases underscore the need for transparent frameworks to ensure regulatory compliance and foster trust. Overall, this study demonstrates how LLM-derived insights interact with traditional features in credit risk modeling, opening new avenues to enhance the explainability and fairness of these models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Explainability of a Machine Learning Granting Scoring Model in Peer-to-Peer Lending. IEEE Access 8 (2020), 64873–64890. https://doi.org/10.1109/ACCESS.2020.2984412
  2. Risk-return modelling in the p2p lending market: Trends, gaps, recommendations and future directions. Electronic Commerce Research and Applications 49 (2021), 101079. https://doi.org/10.1016/j.elerap.2021.101079
  3. Prompted Opinion Summarization with GPT-3.5. arXiv preprint arXiv:2211.15914 (2022). https://doi.org/10.48550/arXiv.2211.15914
  4. Continual Lifelong Learning in Natural Language Processing: A Survey. In Proceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Spain (Online), 6523–6541. https://doi.org/10.18653/v1/2020.coling-main.574
  5. Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165
  6. Spanish Pre-Trained BERT Model and Evaluation Data. In Practical ML for Developing Countries Workshop at ICLR 2020. https://doi.org/10.48550/arXiv.2308.02976
  7. Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021). https://doi.org/10.48550/arXiv.2107.03374
  8. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
  9. Addressing Information Asymmetries in Online Peer-to-Peer Lending. Springer International Publishing, Cham, 15–31. https://doi.org/10.1007/978-3-030-02330-0_2
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018). https://doi.org/10.48550/ARXIV.1810.04805
  11. Topic Modeling in Embedding Spaces. Transactions of the Association for Computational Linguistics 8 (07 2020), 439–453. https://doi.org/10.1162/tacl_a_00325
  12. Description-text related soft information in peer-to-peer lending – Evidence from two leading European platforms. Journal of Banking & Finance 64 (2016), 169–187. https://doi.org/10.1016/j.jbankfin.2015.11.009
  13. Qiang Gao and Mingfeng Lin. 2015. Lemon or Cherry? The Value of Texts in Debt Crowdfunding. Technical Report 18. Center for Analytical Finance. University of California, Santa Cruz. https://cafin.ucsc.edu/research/work_papers/CAFIN_WP18.pdf
  14. Target-Dependent Sentiment Classification With BERT. IEEE Access 7 (2019), 154290–154299. https://doi.org/10.1109/ACCESS.2019.2946594
  15. Tell Me a Good Story and I May Lend You My Money: The Role of Narratives in Peer-to-Peer Lending Decisions. SSRN Electronic Journal (2011). https://doi.org/10.2139/ssrn.1840668
  16. John H. Holland. 1992. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press, Cambridge, Massachusetts, USA. https://doi.org/10.7551/mitpress/1090.001.0001
  17. Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Annals of Operations Research 266, 1–2 (Oct. 2017), 511–529. https://doi.org/10.1007/s10479-017-2668-z
  18. Johannes Kriebel and Lennart Stitz. 2022. Credit default prediction from user-generated text in peer-to-peer lending using deep learning. European Journal of Operational Research 302, 1 (Oct. 2022), 309–323. https://doi.org/10.1016/j.ejor.2021.12.024
  19. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. https://doi.org/10.48550/ARXIV.1909.11942
  20. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (09 2019), 1234–1240. https://doi.org/10.1093/bioinformatics/btz682
  21. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019). https://doi.org/10.48550/arXiv.1910.13461
  22. Network topology and systemic risk in Peer-to-Peer lending market. Physica A: Statistical Mechanics and its Applications 508 (2018), 118–130. https://doi.org/10.1016/j.physa.2018.05.083
  23. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/ARXIV.1907.11692
  24. Tim Loughran and Bill McDonald. 2011. When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance 66, 1 (2011), 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x
  25. Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
  26. CamemBERT: a Tasty French Language Model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 7203–7219. https://doi.org/10.18653/v1/2020.acl-main.645
  27. Jeremy Michels. 2012. Do Unverifiable Disclosures Matter? Evidence from Peer-to-Peer Lending. The Accounting Review 87, 4 (2012), 1385–1413.
  28. Efficient Estimation of Word Representations in Vector Space. https://doi.org/10.48550/ARXIV.1301.3781
  29. CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering. arXiv preprint arXiv:2307.04683 (2023). https://doi.org/10.48550/arXiv.2307.04683
  30. Do Facial Images Matter? Understanding the Role of Private Information Disclosure in Crowdfunding Markets. Electronic Commerce Research and Applications 54, C (jul 2022), 14 pages. https://doi.org/10.1016/j.elerap.2022.101173
  31. Language Models are Unsupervised Multitask Learners. Technical Report. OpenAI. https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf
  32. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 1, Article 140 (jan 2020), 67 pages.
  33. ROFIEG. 2019. Thirty recommendations on regulation, innovation and finance. Final Report to the European Commission by the Expert Group on Regulatory Obstacles to Financial Innovation. Technical Report. European Commission. https://ec.europa.eu/info/files/191113-report-expert-group-regulatory-obstacles-financial-innovation_en
  34. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/ARXIV.1910.01108
  35. Michael Siering. 2023. Peer-to-Peer (P2P) Lending Risk Management: Assessing Credit Risk on Social Lending Platforms Using Textual Factors. ACM Transactions on Management Information Systems 14, 3, Article 25 (jun 2023), 19 pages. https://doi.org/10.1145/3589003
  36. The value of text for small business default prediction: A Deep Learning approach. European Journal of Operational Research 295, 2 (Dec. 2021), 758–771. https://doi.org/10.1016/j.ejor.2021.03.008
  37. How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics, Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer International Publishing, Cham, 194–206.
  38. Text Classification via Large Language Models. https://doi.org/10.48550/ARXIV.2305.08377
  39. Xu Sun and Weichao Xu. 2014. Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Signal Processing Letters 21, 11 (2014), 1389–1393. https://doi.org/10.1109/LSP.2014.2337313
  40. Vijay Srinivas Tida and Sonya Hy Hsu. 2022. Universal Spam Detection using Transfer Learning of BERT Model. In Proceedings of the 55th Hawaii International Conference on System Sciences. 7669–7677. http://hdl.handle.net/10125/80263
  41. Attention Is All You Need. (2017). https://doi.org/10.48550/ARXIV.1706.03762
  42. Credit Risk Evaluation Based on Text Analysis. International Journal of Cognitive Informatics and Natural Intelligence 10 (01 2016), 1–11. https://doi.org/10.4018/IJCINI.2016010101
  43. Predicting loan default in peer-to-peer lending using narrative data. Journal of Forecasting 39, 2 (2020), 260–280. https://doi.org/10.1002/for.2625
  44. Identifying features for detecting fraudulent loan requests on P2P platforms. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI). 79–84. https://doi.org/10.1109/ISI.2016.7745447
  45. Peer-to-Peer Loan Fraud Detection: Constructing Features from Transaction Data. MIS Quarterly 45, 3 (Sept. 2022), 1777–1792. https://doi.org/10.25300/misq/2022/16103
  46. The relationship between soft information in loan titles and online peer-to-peer lending: evidence from RenRenDai platform. Electronic Commerce Research 19, 1 (2018), 111–129. https://doi.org/10.1007/s10660-018-9293-z
  47. Credit risk evaluation model with textual features from loan descriptions for P2P lending. Electronic Commerce Research and Applications 42 (2020), 100989. https://doi.org/10.1016/j.elerap.2020.100989
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 7 likes about this paper.