Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

María E. Pérez-Pons; Javier Parra-Dominguez; Guillermo Hernández; Enrique Herrera-Viedma; Juan M. Corchado

doi:10.1017/S026988892100014X

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

Published online by Cambridge University Press: 14 January 2022

María E. Pérez-Pons

Javier Parra-Dominguez ,

Guillermo Hernández ,

Enrique Herrera-Viedma and

Juan M. Corchado

Show author details

María E. Pérez-Pons: Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain
Javier Parra-Dominguez: Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain Air Institute, IoT Digital Innovation Hub, Carbajosa de la Sagrada, 37188. Salamanca, Spain
Guillermo Hernández: Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain
Enrique Herrera-Viedma: Affiliation:
University of Granada, Colegio Máximo de Cartuja, Campus Universitario de Cartuja C.P. 18071 Granada, Spain
Juan M. Corchado: Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain Air Institute, IoT Digital Innovation Hub, Carbajosa de la Sagrada, 37188. Salamanca, Spain Pusat Komputeran dan Informatik, Universiti Malaysia Kelantan, Karung Berkunci 36, Pengkaan Chepa, 16100 Kota Bharu, Kelantan, Malaysia Department of Electronics, Information and Communication, Faculty of Engineering, Osaka Institute of Technology, 535-8585 Osaka, Japan

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper presents a methodology that permits to automate binary classification using the minimum possible number of attributes. In this methodology, the success of the binary prediction does not lie in the accuracy of an algorithm but in the evaluation metrics, which give information about the goodness of fit; which is an important factor when the data batch is unbalanced. The proposed methodology assesses the possible biases in identifying one algorithm as the best performer when considering the goodness of fit of an algorithm through evaluation metrics. The dimension of data has been reduced through the cumulative explained variance. Then, the performance of six machine learning classification models has been compared through Matthew correlation coefficient (MCC), area under curve – receiver operating characteristic (ROC-AUC), and area under curve – precision-recall (AUC-PR). The results show graphically and numerically how the evaluation metrics interfere with the most optimal outcome of an algorithm. The algorithms with the best performance in terms of evaluation metrics have been random forest and gradient boosting. In the imbalanced datasets, MCC has provided better prediction results than ROC-AUC or AUC-PR. The proposed methodology is adapted to the case of bankruptcy prediction.

Information

Type: Research Article
Information: The Knowledge Engineering Review , Volume 37 , 2022 , e1

DOI: https://doi.org/10.1017/S026988892100014X [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Alaka, Hafiz A, Oyedele, Lukumon O, Owolabi, Hakeem A, Vikas, Kumar, Ajayi, Saheed O, Akinade, Olugbenga O, and Muhammad, Bilal. Systematic review of bankruptcy prediction models: Towards a framework for tool selection. Expert Systems with Applications, 94: 164–184, 2018.CrossRef Google Scholar

Altman, Edward I. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance, 23 (4): 589–609, 1968.CrossRef Google Scholar

Flavio, Barboza, Herbert, Kimura, and Edward, Altman. Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83: 405–417, 2017.Google Scholar

Beaver, William H. Financial ratios as predictors of failure. Journal of accounting research, pages 71–111, 1966.CrossRef Google Scholar

Beaver, William H, McNichols, Maureen F, and Jung-Wu, Rhie. Have financial statements become less informative? evidence from the ability of financial ratios to predict bankruptcy. Review of Accounting studies, 10 (1): 93–122, 2005.CrossRef Google Scholar

Bellovary, Jodi L, Giacomino, Don E, and Akers, Michael D. A review of bankruptcy prediction studies: 1930 to present. Journal of Financial education, pages 1–42, 2007.Google Scholar

Girish, Chandrashekar and Ferat, Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40 (1): 16–28, 2014.Google Scholar

Davide, Chicco and Giuseppe, Jurman. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21 (1): 1–13, 2020.Google Scholar

Jesse, Davis and Mark, Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006.CrossRef Google Scholar

Abe De, Jong, Rezaul, Kabir, and Thuy, Thu Nguyen. Capital structure around the world: The roles of firm-and country-specific determinants. Journal of Banking & Finance, 32 (9): 1954–1969, 2008.Google Scholar

Sarojini Devi, S and Radhika, Y. A survey on machine learning and statistical techniques in bankruptcy prediction. International Journal of Machine Learning and Computing, 8 (2): 133–139, 2018.CrossRef Google Scholar

Emil, Eirola, Andrey, Gritsenko, Anton, Akusok, Kaj-Mikael, BjÖrk, Yoan, Miche, DuŠan, Sovilj, Rui, Nian, Bo, He, and Amaury, Lendasse. Extreme learning machines for multiclass classification: refining predictions with gaussian mixture models. In International Work-Conference on Artificial Neural Networks, pages 153–164. Springer, 2015.CrossRef Google Scholar

Daryush, Foroghi, Amirhassan, Monadjemi, et al. Applying decision tree to predict bankruptcy. In 2011 IEEE International Conference on Computer Science and Automation Engineering, volume 4, pages 165–169. IEEE, 2011.Google Scholar

Haibo, He and Edwardo, A Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21 (9): 1263–1284, 2009.CrossRef Google Scholar

Hillegeist, Stephen A, Keating, Elizabeth K, Donald P Cram, and Kyle G Lundstedt. Assessing the probability of bankruptcy. Review of accounting studies, 9 (1): 5–34, 2004.CrossRef Google Scholar

Tadaaki, Hosaka. Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert systems with applications, 117: 287–299, 2019.Google Scholar

Chih-Wei, Hsu, Chih-Chung, Chang, Chih-Jen, Lin, et al. A practical guide to support vector classification, 2003.Google Scholar

Win-Bin, Huang, Junting, Liu, Haodong, Bai, and Pengyi, Zhang. Value assessment of companies by using an enterprise value assessment system based on their public transfer specification. Information Processing & Management, 57 (5): 102254, 2020.Google Scholar

Sadegh Bafandeh, Imandoust and Mohammad, Bolandraftar. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications, 3 (5): 605–610, 2013.Google Scholar

Utkarsh Mahadeo Khaire and Dhanalakshmi, R. Stability of feature selection algorithm: A review. Journal of King Saud University-Computer and Information Sciences, 2019.Google Scholar

Hyeongjun, Kim, Hoon, Cho, and Doojin, Ryu. Corporate default predictions using machine learning: Literature review. Sustainability, 12 (16): 6325, 2020.Google Scholar

Emrehan Kutlug, Sahin, Cengizhan, Ipbuker, and Taskin, Kavzoglu. Investigation of automatic feature weighting methods (fisher, chi-square and relief-f) for landslide susceptibility mapping. Geocarto international, 32 (9): 956–977, 2017.CrossRef Google Scholar

Larry, Li and Silvia, Z Islam. Firm and industry specific determinants of capital structure: Evidence from the australian market. International Review of Economics & Finance, 59: 425–437, 2019.Google Scholar

Piero, Montebruno, Bennett, Robert J, Harry, Smith, and Carry, Van Lieshout. Machine learning classification of entrepreneurs in british historical census data. Information Processing & Management, 57 (3): 102210, 2020.Google Scholar

OECD. Country statistical profile: Spain 2020. OECD ilibrary, 2018. URL https://www.oecd-ilibrary.org/.Google Scholar

Ohlson, James A. Financial ratios and the probabilistic prediction of bankruptcy. Journal of accounting research, pages 109–131, 1980.CrossRef Google Scholar

Olson, David L, Dursun, Delen, and Yanyan, Meng. Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52 (2): 464–473, 2012.CrossRef Google Scholar

Onnela, J-P, Anirban, Chakraborti, Kimmo, Kaski, and Janos, Kertesz. Dynamic asset trees and black monday. Physica A: Statistical Mechanics and its Applications, 324 (1-2): 247–252, 2003.CrossRef Google Scholar

Yi, Qu, Pei, Quan, Minglong, Lei, and Yong, Shi. Review of bankruptcy prediction using machine learning and deep learning techniques. Procedia Computer Science, 162: 895–899, 2019.Google Scholar

Mandeep Kaur, Saggi and Sushma, Jain. A survey towards an integration of big data analytics to big insights for value-creation. Information Processing & Management, 54 (5): 758–790, 2018.Google Scholar

Takaya, Saito and Marc, Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10 (3), 2015.Google Scholar

Sharma, M and Monali, Mavani. Development of predictive model in education system: using nave bayes classifier. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology, pages 185–186, 2011.CrossRef Google Scholar

Tyler, Shumway. Forecasting bankruptcy more accurately: A simple hazard model. The journal of business, 74 (1): 101–124, 2001.Google Scholar

Saúl Solorio, Fernández, J Ariel Carrasco, Ochoa, and José Fco Martnez, Trinidad. A review of unsupervised feature selection methods. Artificial Intelligence Review, 53 (2): 907–948, 2020.CrossRef Google Scholar

David, Veganzones and Eric, Séverin. An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112: 111–124, 2018.Google Scholar

Robert, Wade and Frank, Veneroso. The asian crisis: the high debt model versus the wall street-treasury-imf complex. New left review, pages 3–24, 1998.Google Scholar

Nanxi, Wang et al. Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7 (04): 908, 2017.Google Scholar

Guoqiu, Wen, Xianxian, Li, Yonghua, Zhu, Linjun, Chen, Qimin, Luo, and Malong, Tan. One-step spectral rotation clustering for imbalanced high-dimensional data. Information Processing & Management, 58 (1): 102388, 2021.CrossRef Google Scholar

Feng, Yang and KZ, Mao. Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8 (4): 1080–1092, 2010.CrossRef Google Scholar

Wenhao, Zhang et al. Machine learning approaches to predicting company bankruptcy. Journal of Financial Risk Management, 6 (04): 364, 2017.Google Scholar

Maciej, Zieba, Sebastian K Tomczak, and Jakub M Tomczak. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert systems with applications, 58: 93–101, 2016.CrossRef Google Scholar

Article contents

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests