Hostname: page-component-5db58dd55d-8mwbx Total loading time: 0 Render date: 2026-06-02T19:50:55.602Z Has data issue: false hasContentIssue false

Development and validation of a predictive model for high-intensity mental health service use using electronic health record data

Published online by Cambridge University Press:  20 January 2026

Bharadwaj V. Chada*
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK Clinical Informatics Service, South London and Maudsley NHS Foundation Trust, London, UK
Robert Stewart
Affiliation:
Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK Clinical Informatics Service, South London and Maudsley NHS Foundation Trust, London, UK
James Lai
Affiliation:
Department of Bioengineering, Imperial College Healthcare NHS Trust, London, UK
*
Correspondence to Bharadwaj V. Chada (bharadwaj.chada1@nhs.net)
Rights & Permissions [Opens in a new window]

Abstract

Aims and method

This study aimed to develop and evaluate a predictive model using electronic health record (EHR) data from a large south London mental health service, in order to identify patients 3 months following first referral who are at risk of subsequent high-intensity service use over the subsequent 12 months. Early identification of such patients may support proactive and personalised care planning, reducing the need for high-cost episodes of care. Predictive models were developed using information from 18 869 patients newly referred between 2007 and 2011. High-intensity use was defined as the top 10% of estimated mental healthcare expenditure. The model was developed using demographic, clinical and service use variables, and was validated on data from the periods 2012–2017 and 2018–2023.

Results

A logistic regression model achieved an area under the receiver operating characteristic (AUROC) of 0.79 in development (sensitivity 0.82, specificity 0.54), with robust performance in validation sets (AUROC 0.81, 0.83, respectively). Key predictors included first 3 months service use, schizophrenia or eating disorder diagnoses and living alone. Natural language processing-derived features did not improve performance.

Clinical implications

Routine EHR data performed well in predicting the risk of high-cost care, potentially enabling targeted interventions and more efficient resource allocation.

Information

Type
Original Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Royal College of Psychiatrists
Figure 0

Table 1 Comparison of model performance before and after omission of natural language-processing (NLP)-derived variables

Figure 1

Fig. 1 Bar chart comparing model performance for 2007–2011 data with identical inputs for logistic regression, XGBoost, decision tree and random forest models. AUROC, area under the receiver operating characteristic.

Figure 2

Table 2 Sample characteristics comparing patients with/without future high-intensity care use

Figure 3

Table 3 Model performance in development and subsequent validation periods

Supplementary material: File

Chada et al. supplementary material

Chada et al. supplementary material
Download Chada et al. supplementary material(File)
File 30 KB
Submit a response

eLetters

No eLetters have been published for this article.