Hostname: page-component-6766d58669-88psn Total loading time: 0 Render date: 2026-05-15T12:14:44.266Z Has data issue: false hasContentIssue false

Predicting high-cost care in a mental health setting

Published online by Cambridge University Press:  17 January 2020

Craig Colling*
Affiliation:
Applied Clinical Informatics Lead, SLaM Biomedical Research Center, South London & Maudsley Foundation NHS Trust, UK
Mizanur Khondoker
Affiliation:
Senior Lecturer in Medical Statistics, University of East Anglia, Norwich Medical School, UK
Rashmi Patel
Affiliation:
MRC UKRI Health Data Research UK Fellow, Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, Kings College London; and South London & Maudsley Foundation NHS Trust, UK
Marcella Fok
Affiliation:
Visiting Researcher, Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, Kings College London; and Central and North West London NHS Foundation Trust, UK
Robert Harland
Affiliation:
Clinical Director of Psychosis, Psychosis CAG, South London & Maudsley Foundation NHS Trust, UK
Matthew Broadbent
Affiliation:
Informatics Lead, SLaM Biomedical Research Center, South London & Maudsley Foundation NHS Trust, UK
Paul McCrone
Affiliation:
Professor of Health Economics, School of Health Science, Institute of Psychiatry, Psychology and Neuroscience, Kings College London, UK
Robert Stewart
Affiliation:
Professor of Psychiatric Epidemiology and Clinical Informatics, Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, Kings College London; and South London & Maudsley Foundation NHS Trust, UK
*
Correspondence: Craig Colling. Email: craig.colling@kcl.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Background

The density of information in digital health records offers new potential opportunities for automated prediction of cost-relevant outcomes.

Aims

We investigated the extent to which routinely recorded data held in the electronic health record (EHR) predict priority service outcomes and whether natural language processing tools enhance the predictions. We evaluated three high priority outcomes: in-patient duration, readmission following in-patient care and high service cost after first presentation.

Method

We used data obtained from a clinical database derived from the EHR of a large mental healthcare provider within the UK. We combined structured data with text-derived data relating to diagnosis statements, medication and psychiatric symptomatology. Predictors of the three different clinical outcomes were modelled using logistic regression with performance evaluated against a validation set to derive areas under receiver operating characteristic curves.

Results

In validation samples, the full models (using all available data) achieved areas under receiver operating characteristic curves between 0.59 and 0.85 (in-patient duration 0.63, readmission 0.59, high service use 0.85). Adding natural language processing-derived data to the models increased the variance explained across all clinical scenarios (observed increase in r2 = 12–46%).

Conclusions

EHR data offer the potential to improve routine clinical predictions by utilising previously inaccessible data. Of our scenarios, prediction of high service use after initial presentation achieved the highest performance.

Information

Type
Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s) 2020
Figure 0

Table 1 Model performance for all clinical scenarios explored

Figure 1

Fig. 1 (a) Receiver operating characteristic (ROC) curve of model performance for extended duration of hospital admission. (b) ROC curve of model performance for hospital readmission. (c) ROC curve of model performance for high total service cost.

The black line represents the performance for the development sample and the green line represents the performance for the validation sample.
Supplementary material: File

Colling et al. supplementary material

Colling et al. supplementary material

Download Colling et al. supplementary material(File)
File 158.2 KB
Submit a response

eLetters

No eLetters have been published for this article.