Hostname: page-component-76fb5796d-vvkck Total loading time: 0 Render date: 2024-04-27T12:39:21.475Z Has data issue: false hasContentIssue false

Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh

Published online by Cambridge University Press:  20 March 2024

Md. Akib Al-Zubayer
Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
Khorshed Alam
Affiliation:
School of Business, University of Southern Queensland, Toowoomba, QLD, Australia Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, Australia
Hasibul Hasan Shanto
Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
Md. Maniruzzaman
Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
Uttam Kumar Majumder
Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
Benojir Ahammed*
Affiliation:
Statistics Discipline, Khulna University, Khulna, Bangladesh
*
Corresponding author: Benojir Ahammed; Emails: benojir@stat.ku.ac.bd; benojirstat@gmail.com

Abstract

Increasing prevalence of non-communicable diseases (NCDs) has become the leading cause of death and disability in Bangladesh. Therefore, this study aimed to measure the prevalence of and risk factors for double and triple burden of NCDs (DBNCDs and TBNCDs), considering diabetes, hypertension, and overweight and obesity as well as establish a machine learning approach for predicting DBNCDs and TBNCDs. A total of 12,151 respondents from the 2017 to 2018 Bangladesh Demographic and Health Survey were included in this analysis, where 10%, 27.4%, and 24.3% of respondents had diabetes, hypertension, and overweight and obesity, respectively. Chi-square test and multilevel logistic regression (LR) analysis were applied to select factors associated with DBNCDs and TBNCDs. Furthermore, six classifiers including decision tree (DT), LR, naïve Bayes (NB), k-nearest neighbour (KNN), random forest (RF), and extreme gradient boosting (XGBoost) with three cross-validation protocols (K2, K5, and K10) were adopted to predict the status of DBNCDs and TBNCDs. The classification accuracy (ACC) and area under the curve (AUC) were computed for each protocol and repeated 10 times to make them more robust, and then the average ACC and AUC were computed. The prevalence of DBNCDs and TBNCDs was 14.3% and 2.3%, respectively. The findings of this study revealed that DBNCDs and TBNCDs were significantly influenced by age, sex, marital status, wealth index, education and geographic region. Compared to other classifiers, the RF-based classifier provides the highest ACC and AUC for both DBNCDs (ACC = 81.06% and AUC = 0.93) and TBNCDs (ACC = 88.61% and AUC = 0.97) for the K10 protocol. A combination of considered two-step factor selections and RF-based classifier can better predict the burden of NCDs. The findings of this study suggested that decision-makers might adopt suitable decisions to control and prevent the burden of NCDs using RF classifiers.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ahammed, B, Maniruzzaman, M, Talukder, A and Ferdausi, F (2021). Prevalence and risk factors of hypertension among young adults in Albania. High Blood Pressure & Cardiovascular Prevention 28, 3548.CrossRefGoogle ScholarPubMed
Ahammed, B, Sarder, MA, Kundu, S, Keramat, SA and Alam, K (2022). Multilevel exploration of individual-and community-level factors contributing to overweight and obesity among reproductive-aged women: a pooled analysis of Bangladesh Demographic and Health Survey, 2004–2018. Public Health Nutrition 25(8), 20742083.CrossRefGoogle ScholarPubMed
Al Kibria, GM, Hashan, MR, Hossain, MM, Zaman, SB and Stennett, CA (2021). Clustering of hypertension, diabetes and overweight/obesity according to socioeconomic status among Bangladeshi adults. Journal of Biosocial Science 53(2), 157166.CrossRefGoogle Scholar
Al-Zubayer, MA, Ahammed, B, Sarder, MA, Kundu, S, Majumder, UK and Islam, SM (2021). Double and triple burden of non-communicable diseases and its determinants among adults in Bangladesh: evidence from a recent demographic and health survey. International Journal of Clinical Practice 75(10), e14613.Google ScholarPubMed
Bangladesh Bureau of Statistics. Population & Housing Census (2022). URL: http://www.bbs.gov.bd/site/page/47856ad0-7e1c-4aab-bd78-892733bc06eb/Population-&-Housing.2022.Google Scholar
Bentéjac, C, Csörgő, A and Martínez-Muñoz, G (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review 54, 19371967.CrossRefGoogle Scholar
Bigna, JJ and Noubiap, JJ (2019). The rising burden of non-communicable diseases in sub-Saharan Africa. The Lancet Global Health 7(10), e1295e1296.CrossRefGoogle ScholarPubMed
Bista, B, Dhungana, RR, Chalise, B and Pandey, AR (2020). Prevalence and determinants of non-communicable diseases risk factors among reproductive aged women of Nepal: results from Nepal Demographic Health Survey 2016. PloS One 15(3), e0218840.CrossRefGoogle ScholarPubMed
Biswas, T, Townsend, N, Islam, MS, Islam, MR, Gupta, RD, Das, SK and Al Mamun, A (2019). Association between socioeconomic status and prevalence of non-communicable diseases risk factors and comorbidities in Bangladesh: findings from a nationwide cross-sectional survey. BMJ Open 9(3), e025538.CrossRefGoogle ScholarPubMed
Bloom, DE, Cafiero, ET, Jané-Llopis, E, Abrahams-Gessel, S, Bloom, LR, Fathima, S, Feigl, AB, Gaziano, T, Mowafi, M, Pandya, A, Prettner, K, Rosenberg, L, Seligman, B, Stein, AZ and Weinstein, C (2011). The Global Economic Burden of Noncommunicable Diseases. Geneva: World Economic Forum.Google Scholar
Boutilier, JJ, Chan, TC, Ranjan, M and Deo, S (2021). Risk stratification for early detection of diabetes and hypertension in resource-limited settings: machine learning analysis. Journal of Medical Internet Research 23(1), e20123.CrossRefGoogle ScholarPubMed
Breiman, L (2001). Random forests. Machine Learning 45, 532.CrossRefGoogle Scholar
Bunkhumpornpat, C, Sinapiromsaran, K and Lursinsap, C (2011). MUTE: Majority under-sampling technique. In 2011 8th International Conference on Information, Communications & Signal Processing, IEEE, Singapore, pp. 1–4.CrossRefGoogle Scholar
Cheng, D, Ting, C, Ho, C and Ho, C (2020). Performance evaluation of explainable machine learning on non-communicable diseases. Solid State Technology 63, 27802793.Google Scholar
Davagdorj, K, Pham, VH, Theera-Umpon, N and Ryu, KH (2020). XGBoost-based framework for smoking-induced noncommunicable disease prediction. International Journal of Environmental Research and Public Health 17(18), 6513.CrossRefGoogle ScholarPubMed
Fatou, NG, Ibrahima, FA, Camara, MS and Alassane, BA (2020). A study on predicting and diagnosing non-communicable diseases: case of cardiovascular diseases. In 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), IEEE, Morocco, pp. 1–8.Google Scholar
Ferdowsy, F, Rahi, KS, Jabiullah, MI and Habib, MT (2021). A machine learning approach for obesity risk prediction. Current Research in Behavioral Sciences 2, 100053.CrossRefGoogle Scholar
Fottrell, E, Ahmed, N, Shaha, SK, Jennings, H, Kuddus, A, Morrison, J, Akter, K, Nahar, B, Nahar, T, Haghparast-Bidgoli, H and Khan, AA (2018). Distribution of diabetes, hypertension and non-communicable disease risk factors among adults in rural Bangladesh: a cross-sectional survey. BMJ Global Health 3(6), e000787.CrossRefGoogle ScholarPubMed
Golino, HF, Amaral, LS, Duarte, SF, Gomes, CM, Soares, TD, Reis, LA and Santos, J (2014). Predicting increased blood pressure using machine learning. Journal of Obesity 23, 2014.CrossRefGoogle Scholar
Guo, SS, Wu, W, Chumlea, WC and Roche, AF (2002). Predicting overweight and obesity in adulthood from body mass index values in childhood and adolescence. The American Journal of Clinical Nutrition 76(3), 653658.CrossRefGoogle ScholarPubMed
Hastie, T, Tibshirani, R, Friedman, JH and Friedman, JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.CrossRefGoogle Scholar
Hossain, SM and Chetty, G (2011). Next generation identity verification based on face-gait Biometrics. In Proceedings of the International Conference on Biomedical Engineering and Technology 11, 142148.Google Scholar
Hu, M, Nohara, Y, Wakata, Y, Ahmed, A, Nakashima, N and Nakamura, M (2018). Machine learning based prediction of non-communicable diseases to improving intervention program in Bangladesh. European Journal of Biomedical Informatics 14(2), 2028.CrossRefGoogle Scholar
Islam, MM, Rahman, MJ, Roy, DC, Tawabunnahar, M, Jahan, R, Ahmed, NF and Maniruzzaman, M (2021). Machine learning algorithm for characterizing risks of hypertension, at an early stage in Bangladesh. Diabetes & Metabolic Syndrome: Clinical Research & Reviews 15(3), 877884.CrossRefGoogle Scholar
Islam, SM, Purnat, TD, Phuong, NT, Mwingira, U, Schacht, K and Fröschl, G (2014). Non-Communicable Diseases (NCDs) in developing countries: a symposium report. Globalization and Health 10(1), 18.CrossRefGoogle Scholar
Islam, SM, Talukder, A, Awal, MA, Siddiqui, MM, Ahamad, MM, Ahammed, B, Rawal, LB, Alizadehsani, R, Abawajy, J, Laranjo, L and Chow, CK (2022). Machine learning approaches for predicting hypertension and its associated factors using population-level data from three South Asian countries. Frontiers in Cardiovascular Medicine 2022, 9.CrossRefGoogle Scholar
James, G, Witten, D, Hastie, T and Tibshirani, R (2013). An Introduction to Statistical Learning. New York: Springer, Vol. 112, p. 18.CrossRefGoogle Scholar
Khalequzzaman, M, Chiang, C, Choudhury, SR, Yatsuya, H, Al-Mamun, MA, Al-Shoaibi, AA, Hirakawa, Y, Hoque, BA, Islam, SS, Matsuyama, A and Iso, H (2017). Prevalence of non-communicable disease risk factors among poor shantytown residents in Dhaka, Bangladesh: a community-based cross-sectional survey. BMJ Open 7(11), e014710.CrossRefGoogle ScholarPubMed
Liao, Z, Ju, Y and Zou, Q (2016). Prediction of G protein-coupled receptors with SVM-prot features and random forest. Scientifica 2016, 110.CrossRefGoogle ScholarPubMed
Libbrecht, MW and Noble, WS (2015). Machine learning applications in genetics and genomics. Nature Reviews Genetics 16(6), 321332.CrossRefGoogle ScholarPubMed
Liu, H and Motoda, H (2012). Feature Selection for Knowledge Discovery and Data Mining. New York: Springer Science & Business Media.Google Scholar
Ma, D, Sakai, H, Wakabayashi, C, Kwon, JS, Lee, Y, Liu, S, Wan, Q, Sasao, K, Ito, K, Nishihara, K and Wang, P (2017). The prevalence and risk factor control associated with noncommunicable diseases in China, Japan, and Korea. Journal of Epidemiology 27(12), 568573.CrossRefGoogle Scholar
Maniruzzaman, M, Rahman, MJ, Ahammed, B and Abedin, MM (2020). Classification and prediction of diabetes disease using machine learning paradigm. Health Information Science and Systems 8, 14.CrossRefGoogle ScholarPubMed
Maniruzzaman, M, Rahman, MJ, Ahammed, B, Abedin, MM, Suri, HS, Biswas, M, El-Baz, A, Bangeas, P, Tsoulfas, G and Suri, JS (2019). Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Computer Methods and Programs in Biomedicine 176, 173193.CrossRefGoogle Scholar
Maniruzzaman, M, Rahman, MJ, Al-MehediHasan, M, Suri, HS, Abedin, MM, El-Baz, A, Suri, JS (2018). Accurate diabetes risk stratification using machine learning: role of missing value and outliers. Journal of Medical Systems 42, 17.CrossRefGoogle Scholar
Maniruzzaman, M, Shin, J and Hasan, MA (2022). Predicting children with ADHD using behavioral activity: a machine learning analysis. Applied Sciences 12(5), 2737.CrossRefGoogle Scholar
Matsuoka, D (2021). Classification of imbalanced cloud image data using deep neural networks: performance improvement through a data science competition. Progress in Earth and Planetary Science 8(1), 1.CrossRefGoogle Scholar
Merlo, J, Yang, M, Chaix, B, Lynch, J and Råstam, L (2005). A brief conceptual tutorial on multilevel analysis in social epidemiology: investigating contextual phenomena in different groups of people. Journal of Epidemiology & Community Health 59(9), 729736.CrossRefGoogle ScholarPubMed
Montañez, CA, Fergus, P, Hussain, A, Al-Jumeily, D, Abdulaimma, B, Hind, J and Radi, N (2017). Machine learning approaches for the prediction of obesity using publicly available genetic profiles. In 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, AK, USA, pp. 27432750.CrossRefGoogle Scholar
National Institute of Population Research and Training (NIPORT), and ICF (2020). Bangladesh Demographic and Health Survey 2017–18. Dhaka, Bangladesh, and Rockville, Maryland, USA: NIPORT and ICF.Google Scholar
Pranto, B, Mehnaz, SM, Mahid, EB, Sadman, IM, Rahman, A and Momen, S (2020). Evaluating machine learning methods for predicting diabetes among female patients in Bangladesh. Information 11(8), 374.CrossRefGoogle Scholar
Quinlan, JR (1986). Induction of decision trees. Machine Learning 1, 81106.CrossRefGoogle Scholar
Rabe-Hesketh, S and Skrondal, A (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 169(4), 805827.CrossRefGoogle Scholar
Riaz, BK, Islam, MZ, Islam, AS, Zaman, MM, Hossain, MA, Rahman, MM, Khanam, F, Amin, KB and Noor, IN (2020). Risk factors for non-communicable diseases in Bangladesh: findings of the population-based cross-sectional national survey 2018. BMJ Open 10(11), e041334.CrossRefGoogle ScholarPubMed
Russell, S, Sturua, L, Li, C, Morgan, J, Topuridze, M, Blanton, C, Hagan, L and Salyer, SJ (2019). The burden of non-communicable diseases and their related risk factors in the country of Georgia, 2015. BMC Public Health 19, 19.CrossRefGoogle ScholarPubMed
Saeed, KM (2013). Prevalence of risk factors for non-communicable diseases in the adult population of urban areas in Kabul City, Afghanistan. Central Asian Journal of Global Health 2(2), 120.Google ScholarPubMed
Shah, S, Luo, X, Kanakasabai, S, Tuason, R and Klopper, G (2019). Neural networks for mining the associations between diseases and symptoms in clinical notes. Health Information Science and Systems 7, 19.CrossRefGoogle ScholarPubMed
Singh, B and Tawfik, H (2020). Machine learning approach for the early prediction of the risk of overweight and obesity in young people. In Computational Science–ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Springer International Publishing, Proceedings, Part IV 20, pp. 523–535.CrossRefGoogle Scholar
Vos, T, Lim, SS, Abbafati, C, Abbas, KM, Abbasi, M, Abbasifard, M, Abbasi-Kangevari, M, Abbastabar, H, Abd-Allah, F, Abdelalim, A and Abdollahi, M (2020). Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet 396(10258), 12041222.CrossRefGoogle Scholar
Wang, Q, Yang, M, Pang, B, Xue, M, Zhang, Y, Zhang, Z and Niu, W (2022). Predicting risk of overweight or obesity in Chinese preschool-aged children using artificial intelligence techniques. Endocrine 77(1), 6372.CrossRefGoogle ScholarPubMed
World Health Organization (2013). Global Action Plan for the Prevention and Control of NCDs 2013–2020. Geneva: WHO.Google Scholar
World Health Organization (2016). Global Report on Diabetes. Geneva: World Health Organization (WHO).Google Scholar
World Health Organization (2020). Noncommunicable Diseases. URL: https://www.who.int/news-room/fact-sheets/detail/noncommunicablediseases (accessed 1st April 2020).Google Scholar
World Health Organization (2022). Noncommunicable Diseases. URL: https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases (accessed 16th September 2022).Google Scholar
Yosef, T (2020). Prevalence and associated factors of chronic non-communicable diseases among cross-country truck drivers in Ethiopia. BMC Public Health 20(1), 17.CrossRefGoogle ScholarPubMed
Zaman, MM, Bhuiyan, MR, Karim, M, Rahman, M, Akanda, AW and Fernando, T (2015). Clustering of non-communicable diseases risk factors in Bangladeshi adults: an analysis of STEPS survey 2013. BMC Public Health 15(1), 19.Google ScholarPubMed
Zhang, L, Yuan, M, An, Z, Zhao, X, Wu, H, Li, H, Wang, Y, Sun, B, Li, H, Ding, S and Zeng, X (2020). Prediction of hypertension, hyperglycemia and dyslipidemia from retinal fundus photographs via deep learning: a cross-sectional study of chronic diseases in central China. PloS One 15(5), e0233166.CrossRefGoogle ScholarPubMed