Hostname: page-component-5db58dd55d-jhf8m Total loading time: 0 Render date: 2026-05-31T10:10:13.780Z Has data issue: false hasContentIssue false

Association of risk factors with mental illness in a rural community: insights from machine learning models

Published online by Cambridge University Press:  12 May 2025

Firoj Al-Mamun
Affiliation:
CHINTA Research Bangladesh, Savar, Dhaka, Bangladesh Department of Public Health & Informatics, Jahangirnagar University, Savar, Dhaka, Bangladesh Department of Public Health, University of South Asia, Dhaka, Bangladesh
Mohammed A. Mamun*
Affiliation:
CHINTA Research Bangladesh, Savar, Dhaka, Bangladesh Department of Public Health & Informatics, Jahangirnagar University, Savar, Dhaka, Bangladesh Department of Public Health, University of South Asia, Dhaka, Bangladesh
Md Emran Hasan
Affiliation:
CHINTA Research Bangladesh, Savar, Dhaka, Bangladesh School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Moneerah Mohammad ALmerab
Affiliation:
Department of Psychology, College of Education and Human Development, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
Johurul Islam
Affiliation:
Department of Public Health, University of South Asia, Dhaka, Bangladesh CSF Global, Banani, Dhaka, Bangladesh
Mohammad Muhit
Affiliation:
Department of Public Health, University of South Asia, Dhaka, Bangladesh CSF Global, Banani, Dhaka, Bangladesh
*
Correspondence: Mohammed A. Mamun. Email: mamun@thechinta.org
Rights & Permissions [Opens in a new window]

Abstract

Background

Mental health conditions, particularly depression and anxiety, are highly prevalent and impose substantial health burdens globally. Despite advancements in machine learning, there is limited application of these methods in predicting common mental illnesses within community populations in low-resource settings.

Aims

This study aims to examine the prevalence and associated risk factors of common mental illnesses collectively (depression and anxiety) in a rural Bangladeshi community using machine learning models.

Method

This cross-sectional study surveyed 490 adults aged 18–59 in a rural Bangladeshi community. Depression and anxiety were assessed using the Patient Health Questionnaire (PHQ-2) and Generalised Anxiety Disorder (GAD-2) scales. Machine learning models, including Categorical Boosting, the support vector machine, the random forest and XGBoost (eXtreme Gradient Boosting), were trained on 80% of the data-set and tested on 20% to evaluate predictive accuracy, precision, F1 score, log-loss and area under the receiver operating characteristic curve (AUC-ROC).

Results

Some 20.4% of participants experienced at least one common mental illness. Feature importance analysis identified house type, age group and educational status as the most significant predictors. SHAP (Shapley Additive exPlanations) values highlighted their influence on model outputs, and the XGBoost gain metric confirmed the importance of marital status and house type, with gains of 0.76 and 0.73, respectively. XGBoost delivered the best performance, achieving an F1 score of 71.01%, precision of 71.58%, accuracy of 71.15% and the lowest log-loss value of 0.56. The random forest had an accuracy of 78.21% and an AUC-ROC of 0.90.

Conclusions

The findings of this study suggest targeted interventions addressing housing and social determinants could improve mental health outcomes in similar rural settings. Further studies should consider longitudinal data to explore causal relationships.

Information

Type
Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Royal College of Psychiatrists
Figure 0

Fig. 1 The impact of features on the model by Categorical Boosting SHAP (Shapley Additive exPlanations) value, and eXtreme Gradient Boosting (XGBoost) feature importance based on gain.

Figure 1

Table 1 Association and factors associated with any mental illness and the study variables

Figure 2

Table 2 Evaluation of machine learning model performances

Figure 3

Fig. 2 Area under the receiver operating characteristic curve of any mental illness. ROC, receiver operating characteristic; KNN, k-nearest neighbour; AUC, area under the curve; XGBoost, eXtreme Gradient Boosting; CatBoost, Categorical Boosting; GBM, gradient boosting machine; SVM, support vector machine.

Submit a response

eLetters

No eLetters have been published for this article.