Hostname: page-component-6766d58669-nqrmd Total loading time: 0 Render date: 2026-05-19T07:46:03.545Z Has data issue: false hasContentIssue false

Machine learning for prediction of childhood mental health problems in social care

Published online by Cambridge University Press:  11 April 2025

Ryan Crowley*
Affiliation:
New York University Grossman School of Medicine, New York, US
Katherine Parkin
Affiliation:
Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Department of Psychiatry, University of Cambridge, Cambridge, UK Cambridge Public Health, University of Cambridge, Cambridge, UK
Emma Rocheteau
Affiliation:
Department of Computer Science, University of Cambridge, Cambridge, UK
Efthalia Massou
Affiliation:
Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Yasmin Friedmann
Affiliation:
Neath Port Talbot County Borough Council, Port Talbot, UK
Ann John
Affiliation:
Population Psychiatry, Suicide and Informatics, Swansea University Medical School, Swansea, UK
Rachel Sippy
Affiliation:
Department of Psychiatry, University of Cambridge, Cambridge, UK
Pietro Liò
Affiliation:
Department of Computer Science, University of Cambridge, Cambridge, UK
Anna Moore
Affiliation:
Department of Psychiatry, University of Cambridge, Cambridge, UK Anna Freud, London, UK Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, UK
*
Correspondence: Ryan Crowley. Email: rjc8281@nyu.edu
Rights & Permissions [Opens in a new window]

Abstract

Background

Rates of childhood mental health problems are increasing in the UK. Early identification of childhood mental health problems is challenging but critical to children’s future psychosocial development. This is particularly important for children with social care contact because earlier identification can facilitate earlier intervention. Clinical prediction tools could improve these early intervention efforts.

Aims

Characterise a novel cohort consisting of children in social care and develop effective machine learning models for prediction of childhood mental health problems.

Method

We used linked, de-identified data from the Secure Anonymised Information Linkage Databank to create a cohort of 26 820 children in Wales, UK, receiving social care services. Integrating health, social care and education data, we developed several machine learning models aimed at predicting childhood mental health problems. We assessed the performance, interpretability and fairness of these models.

Results

Risk factors strongly associated with childhood mental health problems included age, substance misuse and being a looked after child. The best-performing model, a gradient boosting classifier, achieved an area under the receiver operating characteristic curve of 0.75 (95% CI 0.73–0.78). Assessments of algorithmic fairness showed potential biases within these models.

Conclusions

Machine learning performance on this prediction task was promising. Predictive performance in social care settings can be bolstered by linking diverse routinely collected data-sets, making available a range of heterogenous risk factors relating to clinical, social and environmental exposures.

Information

Type
Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Royal College of Psychiatrists
Figure 0

Table 1 Cohort demographics

Figure 1

Table 2 Model performance on ten-fold cross-validation with 95% confidence intervals

Figure 2

Fig. 1 Mean absolute SHapley Additive exPlanations (SHAP) values for best-performing gradient boosting classifier.

Figure 3

Fig. 2 SHapley Additive exPlanations (SHAP) beeswarm plot for best-performing gradient boosting classifier.

Figure 4

Fig. 3 Assessment of algorithmic fairness. (a) Gradient boosting classifier model. (b) Logistic regression model. NPV, negative predictive value; PPV, positive predictive value; TNR, true negative rate (specificity); TPR, true positive rate (sensitivity).

Supplementary material: File

Crowley et al. supplementary material

Crowley et al. supplementary material
Download Crowley et al. supplementary material(File)
File 81.9 KB
Submit a response

eLetters

No eLetters have been published for this article.