Personal bankruptcy prediction using machine learning techniques
DOI:
https://doi.org/10.18559/ebr.2024.2.1149Keywords:
personal bankruptcy, random forest, XGBoost, LightGBM, AdaBoost, CatBoost, support vector machines, household finance, SHAPAbstract
It has become crucial to have an early prediction model that provides accurate assurance for users about the financial situation of consumers. Recent studies have focused on predicting corporate bankruptcies and credit defaults, not personal bankruptcies. Due to this situation, the present study fills the literature gap by comparing different machine learning algorithms to predict personal bankruptcy. The main objective of the study is to examine the usefulness of machine learning models such as SVM, random forest, AdaBoost, XGBoost, LightGBM, and CatBoost in forecasting personal bankruptcy. The study relies on two samples of households (learning and testing) from the Survey of Consumer Finances, which was conducted in the United States. Among the models estimated, LightGBM, CatBoost, and XGBoost showed the highest effectiveness. The most important variables used in the models are income, refusal to grant credit, delays in the repayment of liabilities, the revolving debt ratio, and the housing debt ratio.
Downloads
References
Al Daoud, E. (2019). Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. International Journal of Computer and Information Engineering, 13(1), 6–10.
View in Google Scholar
Alam, N., Gao, J., & Jones, S. (2021). Corporate failure prediction: An evaluation of deep learning vs discrete hazard models. Journal of International Financial Markets, Institutions and Money, 75, 101455. https://doi.org/10.1016/j.intfin.2021.101455 DOI: https://doi.org/10.1016/j.intfin.2021.101455
View in Google Scholar
Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks. Decision Support Systems, 45(1), 110–122. https://doi.org/10.1016/j.dss.2007.12.002 DOI: https://doi.org/10.1016/j.dss.2007.12.002
View in Google Scholar
Altman, E. I., & Kuehne, B. J. (2016). Credit markets and bubbles: Is the benign credit cycle over? Economics and Business Review, 2(3), 20–31. https://doi.org/10.18559/ebr.2016.3.3 DOI: https://doi.org/10.18559/ebr.2016.3.3
View in Google Scholar
Barboza, F., Basso, L. F. C., & Kimura, H. (2021). New metrics and approaches for predicting bankruptcy. Communications in Statistics-Simulation and Computation, 52(6), 2615–2632. https://doi.org/10.1080/03610918.2021.1910837 DOI: https://doi.org/10.1080/03610918.2021.1910837
View in Google Scholar
Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83, 405–417. https://doi.org/10.1016/j.eswa.2017.04.006 DOI: https://doi.org/10.1016/j.eswa.2017.04.006
View in Google Scholar
Berlemann, M., & Salland, J. (2016). The Joneses’ income and debt market participation: Empirical evidence from bank account data. Economics Letters, 142, 6–9. https://doi.org/10.1016/j.econlet.2016.02.030 DOI: https://doi.org/10.1016/j.econlet.2016.02.030
View in Google Scholar
Bragoli, D., Ferretti, C., Ganugi, P., Marseguerra, G., Mezzogori, D., & Zammori, F. (2022). Machine learning models for bankruptcy prediction: do industrial variables matter? Spatial Economic Analysis, 17(2), 156–177. https://doi.org/10.1080/17421772.2021.1977377 DOI: https://doi.org/10.1080/17421772.2021.1977377
View in Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. DOI: https://doi.org/10.1023/A:1010933404324
View in Google Scholar
Brotcke, L. (2022). Time to assess bias in machine learning models for credit decisions. Journal of Risk and Financial Management, 15(4), 165. https://doi.org/10.3390/ jrfm15040165 DOI: https://doi.org/10.3390/jrfm15040165
View in Google Scholar
Brygała, M. (2022). Consumer bankruptcy prediction using balanced and imbalanced data. Risks, 10(2), 24. https://doi.org/10.3390/risks10020024 DOI: https://doi.org/10.3390/risks10020024
View in Google Scholar
Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2020). Explainable AI in fintech risk management. Frontiers in Artificial Intelligence, 3, 26. https://doi.org/10.3389/frai.2020.00026 DOI: https://doi.org/10.3389/frai.2020.00026
View in Google Scholar
Carmona, P., Dwekat, A., & Mardawi, Z. (2022). No more black boxes! Explaining the predictions of a machine learning XGBoost classifier algorithm in business failu- re. Research in International Business and Finance, 61, 101649. https://doi.org/10.1016/j.ribaf.2022.101649 DOI: https://doi.org/10.1016/j.ribaf.2022.101649
View in Google Scholar
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge di- scovery and data mining, pp. 785–794. https://doi.org/10.1145/2939672.2939785 DOI: https://doi.org/10.1145/2939672.2939785
View in Google Scholar
CFPB (Consumer Financial Protection Bureau). (2022). Is a lender allowed to con- sider my age or where my income comes from when deciding whether to give me a loan? https://www.consumerfinance.gov/askcfpb/isalenderallowedtoconsidermyageorwheremyincomecomesfromwhendecidingwhethertogivemealoanen1181/
View in Google Scholar
Coşer, A., Maermatei, M. M., & Albu, C. (2019). Predictive models for loan default risk assessment. Economic Computation & Economic Cybernetics Studies & Research, 53(2). https://doi.org/10.24818/18423264/53.2.19.09 DOI: https://doi.org/10.24818/18423264/53.2.19.09
View in Google Scholar
de Castro Vieira, J. R., Barboza, F., Sobreiro, V. A., & Kimura, H. (2019). Machine learning models for credit analysis improvements: Predicting lowincome families’ default. Applied Soft Computing, 83, 105640. https://doi.org/10.1016/j.asoc.2019.105640 DOI: https://doi.org/10.1016/j.asoc.2019.105640
View in Google Scholar
Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support. https://doi.org/10.48550/arXiv.1810.11363
View in Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decisiontheoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504 DOI: https://doi.org/10.1006/jcss.1997.1504
View in Google Scholar
Garcia, J. (2022). Bankruptcy prediction using synthetic sampling. Machine Learning with Applications, 9, 100343. https://doi.org/10.1016/j.mlwa.2022.100343 DOI: https://doi.org/10.1016/j.mlwa.2022.100343
View in Google Scholar
Georgarakos, D., Haliassos, M., & Pasini, G. (2014). Household debt and social interactions. The Review of Financial Studies, 27(5), 1404–1433. https://doi.org/10.1093/rfs/hhu014 DOI: https://doi.org/10.1093/rfs/hhu014
View in Google Scholar
Gramegna, A., & Giudici, P. (2021). SHAP and LIME: An evaluation of discriminative power in credit risk. Frontiers in Artificial Intelligence, 4, 752558. https://doi.org/10.3389/frai.2021.752558 DOI: https://doi.org/10.3389/frai.2021.752558
View in Google Scholar
Halim, Z., Shuhidan, S. M., & Sanusi, Z. M. (2021). Corporation financial distress prediction with deep learning: Analysis of public listed companies in Malaysia. Business Process Management Journal, 274), 1163–1178. https://doi.org/10.1108/bpmj0620200273 DOI: https://doi.org/10.1108/BPMJ-06-2020-0273
View in Google Scholar
Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: An interdisciplinary review. Journal of Big Data, 7(1), 94. https://doi.org/10.1186/s40537020003698 DOI: https://doi.org/10.1186/s40537-020-00369-8
View in Google Scholar
Heo, J., & Yang, J. Y. (2014). AdaBoost based bankruptcy forecasting of Korean construction companies. Applied Soft Computing, 24, 494–499. https://doi.org/10.1016/j.asoc.2014.08.009 DOI: https://doi.org/10.1016/j.asoc.2014.08.009
View in Google Scholar
Jabeur, S. B., Gharib, C., MeftehWali, S., & Arfi, W. B. (2021). CatBoost model and artificial intelligence techniques for corporate failure prediction. Technological Forecasting and Social Change, 166, 120658. https://doi.org/10.1016/j.techfore.2021.120658 DOI: https://doi.org/10.1016/j.techfore.2021.120658
View in Google Scholar
Jabeur, S. B., MeftehWali, S., & Viviani, J. L. (2021). Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Annals of Operations Research, 334, 679–699. https://doi.org/10.1007/s1047902104187w DOI: https://doi.org/10.1007/s10479-021-04187-w
View in Google Scholar
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.
View in Google Scholar
Khare, N., & Sait, S. Y. (2018). Credit card fraud detection using machine learning models and collating machine learning models. International Journal of Pure and Applied Mathematics, 118(20), 825–838.
View in Google Scholar
Korol, T. (2021). Examining statistical methods in forecasting financial energy of households in Poland and Taiwan. Energies, 14(7), 1821. https://doi.org/10.3390/en14071821 DOI: https://doi.org/10.3390/en14071821
View in Google Scholar
Korol, T., & Fotiadis, A. K. (2022). Implementing artificial intelligence in forecasting the risk of personal bankruptcies in Poland and Taiwan. Oeconomia Copernicana, 13(2), 407. https://doi.org/10.24136/oc.2022.013 DOI: https://doi.org/10.24136/oc.2022.013
View in Google Scholar
Kovacova, M., Kliestik, T., Valaskova, K., Durana, P., & Juhaszova, Z. (2019). Systematic review of variables applied in bankruptcy prediction models of Visegrad group countries. Oeconomia Copernicana, 10(4), 743–772. https://doi.org/10.24136/oc.2019.034 DOI: https://doi.org/10.24136/oc.2019.034
View in Google Scholar
Kovacova, M., & Kliestikova, J. (2017). Modelling bankruptcy prediction models in Slovak companies. SHS Web of Conferences, vol. 39, p. 01013. EDP Sciences. https://doi.org/10.1051/shsconf/20173901013 DOI: https://doi.org/10.1051/shsconf/20173901013
View in Google Scholar
Le, T., Lee, M. Y., Park, J. R., & Baik, S. W. (2018). Oversampling techniques for bank- ruptcy prediction: Novel features from a transaction dataset. Symmetry, 10(4), 79. https://doi.org/10.3390/sym10040079 DOI: https://doi.org/10.3390/sym10040079
View in Google Scholar
Letza, S. R., Kalupa, Ł., & Kowalski, T. (2003). Predicting corporate failure: How useful are multidiscriminant analysis models? Economics and Business Review, 3(2), 5–11. https://doi.org/10.18559/ebr.2003.2.494 DOI: https://doi.org/10.18559/ebr.2003.2.494
View in Google Scholar
Liang, D., Lu, C. C., Tsai, C. F., & Shih, G. A. (2016). Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study. European Journal of Operational Research, 252(2), 561–572. https://doi.org/10.1016/j.ejor.2016.01.012 DOI: https://doi.org/10.1016/j.ejor.2016.01.012
View in Google Scholar
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
View in Google Scholar
Machado, M. R., & Karray, S. (2022). Assessing credit risk of commercial customers using hybrid machine learning algorithms. Expert Systems with Applications, 200, 116889. https://doi.org/10.1016/j.eswa.2022.116889 DOI: https://doi.org/10.1016/j.eswa.2022.116889
View in Google Scholar
Mangalathu, S., Hwang, S. H., & Jeon, J. S. (2020). Failure mode and effects analysis of RC members based on machinelearningbased SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 219, 110927. https://doi.org/10.1016/j.engstruct.2020.110927 DOI: https://doi.org/10.1016/j.engstruct.2020.110927
View in Google Scholar
Mihalovič, M. (2016). Performance comparison of multiple discriminant analysis and logit models in bankruptcy prediction. Economics & Sociology, 9(4). https://doi.org/10.14254/2071789x.2016/94/6 DOI: https://doi.org/10.14254/2071-789X.2016/9-4/6
View in Google Scholar
Mo, H., Sun, H., Liu, J., & Wei, S. (2019). Developing window behavior models for residential buildings using XGBoost algorithm. Energy and Buildings, 205, 109564. https://doi.org/10.1016/j.enbuild.2019.109564 DOI: https://doi.org/10.1016/j.enbuild.2019.109564
View in Google Scholar
Papík, M., & Papíková, L. (2023). Impacts of crisis on SME bankruptcy prediction models’ performance. Expert Systems with Applications, 214, 119072. https://doi.org/10.1016/j.eswa.2022.119072 DOI: https://doi.org/10.1016/j.eswa.2022.119072
View in Google Scholar
Papík, M., Papíková, L., Kajanová, J., & Bečka, M. (2023). CatBoost: The case of bankruptcy prediction. International Conference on Business and Technology, pp. 3–17. Springer. DOI: https://doi.org/10.1007/978-3-031-08084-5_3
View in Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31.
View in Google Scholar
Saarela, M., & Jauhiainen, S. (2021). Comparison of feature importance measures as explanations for classification models. SN Applied Sciences, 3, 272. https://doi.org/10.1007/s42452021041489 DOI: https://doi.org/10.1007/s42452-021-04148-9
View in Google Scholar
Sahiq, A. N. M., Ismail, S., Nor, S. H. S., UlSaufie, A. Z., & Yaacob, W. F. W. (2022, September). Application of logistic regression model on imbalanced data in per- sonal bankruptcy prediction. 2022 3rd International Conference on Artificial Intelligence and Data Sciences (AiDAS) (pp. 120–125). IEEE. https://doi.org/10.1109/aidas56890.2022.9918779 DOI: https://doi.org/10.1109/AiDAS56890.2022.9918779
View in Google Scholar
Schonlau, M., & Zou, R. Y. (2020). The random forest algorithm for statistical learning. The Stata Journal, 20(1), 3–29. https://doi.org/10.1177/1536867x20909688 DOI: https://doi.org/10.1177/1536867X20909688
View in Google Scholar
Shi, S., Tse, R., Luo, W., D’Addona, S., & Pau, G. (2022). Machine learningdriven credit risk: A systemic review. Neural Computing and Applications, 34(17), 14327–14339. https://doi.org/10.1007/s00521022074722 DOI: https://doi.org/10.1007/s00521-022-07472-2
View in Google Scholar
Son, H., Hyun, C., Phan, D., & Hwang, H. J. (2019). Data analytic approach for bankruptcy prediction. Expert Systems with Applications, 138, 112816. https://doi.org/10.1016/j.eswa.2019.07.033 DOI: https://doi.org/10.1016/j.eswa.2019.07.033
View in Google Scholar
Syam, N., & Sharma, A. (2018). Waiting for a sales renaissance in the fourth industrial revolution: Machine learning and artificial intelligence in sales research and practice. Industrial Marketing Management, 69, 135–146. https://doi.org/10.1016/j.indmarman.2017.12.019 DOI: https://doi.org/10.1016/j.indmarman.2017.12.019
View in Google Scholar
Syed Nor, S. H., Ismail, S., & Yap, B. W. (2019). Personal bankruptcy prediction using decision tree model. Journal of Economics, Finance and Administrative Science, 24(47), 157–170. https://doi.org/10.1108/jefas0820180076 DOI: https://doi.org/10.1108/JEFAS-08-2018-0076
View in Google Scholar
Wang, D. N., Li, L., & Zhao, D. (2022). Corporate finance risk prediction based on LightGBM. Information Sciences, 602, 259–268. https://doi.org/10.1016/j.ins.2022.04.058 DOI: https://doi.org/10.1016/j.ins.2022.04.058
View in Google Scholar
Wu, D. J., Feng, T., Naehrig, M., & Lauter, K. E. (2016). Privately evaluating decision trees and random forests. Proceedings on Privacy Enhancing Technologies, (4), 335–355. https://doi.org/10.1515/popets20160043 DOI: https://doi.org/10.1515/popets-2016-0043
View in Google Scholar
Yen, S. J., & Lee, Y. S. (2009). Clusterbased undersampling approaches for imbalanced data distributions. Expert Systems with Applications, 36(3), 5718–5727. https://doi.org/10.1016/j.eswa.2008.06.108 DOI: https://doi.org/10.1016/j.eswa.2008.06.108
View in Google Scholar
Zelenkov, Y., & Volodarskiy, N. (2021). Bankruptcy prediction on the base of the unbalanced data using multiobjective selection of classifiers. Expert Systems with Applications, 185, 115559. https://doi.org/10.1016/j.eswa.2021.115559 DOI: https://doi.org/10.1016/j.eswa.2021.115559
View in Google Scholar
Zhang, L., Wang, J., & Liu, Z. (2023). What should lenders be more concerned aboult prediction model. Expert Systems with Applications, 213, 118938. https://doi.org/10.1016/j.eswa.2022.118938 DOI: https://doi.org/10.1016/j.eswa.2022.118938
View in Google Scholar
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Magdalena Brygała, Tomasz Korol
This work is licensed under a Creative Commons Attribution 4.0 International License.