Hostname: page-component-5db58dd55d-mhzq2 Total loading time: 0 Render date: 2026-05-31T21:21:09.302Z Has data issue: false hasContentIssue false

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

Published online by Cambridge University Press:  14 January 2022

María E. Pérez-Pons
Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain
Javier Parra-Dominguez
Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain Air Institute, IoT Digital Innovation Hub, Carbajosa de la Sagrada, 37188. Salamanca, Spain
Guillermo Hernández
Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain
Enrique Herrera-Viedma
Affiliation:
University of Granada, Colegio Máximo de Cartuja, Campus Universitario de Cartuja C.P. 18071 Granada, Spain
Juan M. Corchado
Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain Air Institute, IoT Digital Innovation Hub, Carbajosa de la Sagrada, 37188. Salamanca, Spain Pusat Komputeran dan Informatik, Universiti Malaysia Kelantan, Karung Berkunci 36, Pengkaan Chepa, 16100 Kota Bharu, Kelantan, Malaysia Department of Electronics, Information and Communication, Faculty of Engineering, Osaka Institute of Technology, 535-8585 Osaka, Japan

Abstract

This paper presents a methodology that permits to automate binary classification using the minimum possible number of attributes. In this methodology, the success of the binary prediction does not lie in the accuracy of an algorithm but in the evaluation metrics, which give information about the goodness of fit; which is an important factor when the data batch is unbalanced. The proposed methodology assesses the possible biases in identifying one algorithm as the best performer when considering the goodness of fit of an algorithm through evaluation metrics. The dimension of data has been reduced through the cumulative explained variance. Then, the performance of six machine learning classification models has been compared through Matthew correlation coefficient (MCC), area under curve – receiver operating characteristic (ROC-AUC), and area under curve – precision-recall (AUC-PR). The results show graphically and numerically how the evaluation metrics interfere with the most optimal outcome of an algorithm. The algorithms with the best performance in terms of evaluation metrics have been random forest and gradient boosting. In the imbalanced datasets, MCC has provided better prediction results than ROC-AUC or AUC-PR. The proposed methodology is adapted to the case of bankruptcy prediction.

Information

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable