Hostname: page-component-5db58dd55d-f6s65 Total loading time: 0 Render date: 2026-06-01T21:32:56.818Z Has data issue: false hasContentIssue false

COVID-19 cluster identification and support vector machine classifier model construction using global healthcare and socio-economic features

Published online by Cambridge University Press:  30 August 2023

Soumya Kanti Guha
Affiliation:
Department of Computer Application, Dinabandhu Andrews Institute of Technology and Management, Kolkata, India
Sandip Sadhukhan
Affiliation:
Department of Computer Application, Dinabandhu Andrews Institute of Technology and Management, Kolkata, India
Sougata Niyogi*
Affiliation:
Department of Medical Lab Technology, Dinabandhu Andrews Institute of Technology and Management, Kolkata, India
*
Corresponding author: Sougata Niyogi; Email: sniyogi10@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Coronaviruses of the human variety have been the culprit of global epidemics of varying levels of lethality, including COVID-19, which has impacted more than 200 countries and resulted in 5.7 million fatalities as of May 2022. Effective clinical management necessitates the allocation of sufficient resources and the employment of appropriately skilled personnel. The elderly population and individuals with diabetes are at increased risk of more severe manifestations of COVID-19. Countries with a higher gross domestic product (GDP) typically exhibit superior health outcomes and reduced mortality rates. Here, we suggest a predictive model for the density of medical doctors and nursing personnel for 134 countries using a support vector machine (SVM). The model was trained in 107 countries and tested in 27, with promising results shown by the kappa statistics and ROC analysis. The SVM model used for predictions showed promising results with a high level of agreement between actual and predicted cluster values.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Selection of a number of clusters. Elbow (a), silhouette (b) & (c), and gap statistic (d) to find the probable number of clusters (K).

Figure 1

Figure 2. K-means clustering data. K-means clustering on PCA components was carried out in 134 countries using different parameters.

Figure 2

Table 1. List of clustered countries

Figure 3

Figure 3. Cluster-wise distribution of different parameters. Death rate (a), age older than 70 yrs. (b), diabetes prevalence (c), hospital beds per thousand (d), life expectancy, (e) and GDP per capita (f) were plotted for all the clusters.

Figure 4

Table 2. List of statistical parameters for different conditions

Figure 5

Figure 4. Supervised machine-learning approach to classify selected countries. Workflow for support vector machine (SVM) (a). Decision surface showing different class boundaries using the model and test data set (b).

Figure 6

Figure 5. Performance of SVM-based classifier model. Confusion matrix showed that most countries in the test data set were classified precisely (a). Receiver operating characteristic (ROC) curves for the SVM classifier exemplify the predictive powers of classes 1, 2, 3, and 4, respectively. Areas under the curves (AUCs) of the four ROC curves are indicated accordingly (b).