Prediction models in first-episode psychosis: systematic review and critical appraisal

Rebecca Lee; Samuel P. Leighton; Lucretia Thomas; Georgios V. Gkoutos; Stephen J. Wood; Sarah-Jane H. Fenton; Fani Deligianni; Jonathan Cavanagh; Pavan K. Mallikarjun

doi:10.1192/bjp.2021.219

Prediction models in first-episode psychosis: systematic review and critical appraisal

Published online by Cambridge University Press: 24 January 2022

Rebecca Lee ,

Samuel P. Leighton

Lucretia Thomas ,

Georgios V. Gkoutos ,

and

Rebecca Lee: Affiliation:
Institute for Mental Health, University of Birmingham, UK
Samuel P. Leighton*: Affiliation:
Institute of Health and Wellbeing, University of Glasgow, UK
Lucretia Thomas: Affiliation:
Birmingham Medical School, University of Birmingham, UK
Georgios V. Gkoutos: Affiliation:
Institute of Cancer and Genomic Sciences, University of Birmingham, UK
Stephen J. Wood: Affiliation:
Orygen Youth Health Research Centre, National Centre of Excellence in Youth Mental Health, Australia; School of Psychological Sciences, University of Melbourne, Australia; and School of Psychology, University of Birmingham, UK
Sarah-Jane H. Fenton: Affiliation:
Institute for Mental Health, University of Birmingham, UK
Fani Deligianni: Affiliation:
School of Computing Science, University of Glasgow, UK
Jonathan Cavanagh: Affiliation:
Institute of Infection, Immunity and Inflammation, University of Glasgow, UK
Pavan K. Mallikarjun: Affiliation:
Institute for Mental Health, University of Birmingham, UK
*: Correspondence: Samuel P. Leighton. Email: samuel.leighton@glasgow.ac.uk

Article contents

Abstract
Background
Aims
Method
Results
Conclusions
Method
Results
Discussion
Data availability
Footnotes
References

Rights & Permissions

Abstract

Background

People presenting with first-episode psychosis (FEP) have heterogenous outcomes. More than 40% fail to achieve symptomatic remission. Accurate prediction of individual outcome in FEP could facilitate early intervention to change the clinical trajectory and improve prognosis.

Aims

We aim to systematically review evidence for prediction models developed for predicting poor outcome in FEP.

Method

A protocol for this study was published on the International Prospective Register of Systematic Reviews, registration number CRD42019156897. Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidance, we systematically searched six databases from inception to 28 January 2021. We used the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies and the Prediction Model Risk of Bias Assessment Tool to extract and appraise the outcome prediction models. We considered study characteristics, methodology and model performance.

Results

Thirteen studies reporting 31 prediction models across a range of clinical outcomes met criteria for inclusion. Eleven studies used logistic regression with clinical and sociodemographic predictor variables. Just two studies were found to be at low risk of bias. Methodological limitations identified included a lack of appropriate validation, small sample sizes, poor handling of missing data and inadequate reporting of calibration and discrimination measures. To date, no model has been applied to clinical practice.

Conclusions

Future prediction studies in psychosis should prioritise methodological rigour and external validation in larger samples. The potential for prediction modelling in FEP is yet to be realised.

Keywords

Schizophrenia psychotic disorders outcome studies prediction precision medicine

Information

Type: Review
Information: The British Journal of Psychiatry , Volume 220 , Special Issue 4: Themed Issue: Precision Medicine and Personalised Healthcare in Psychiatry , April 2022 , pp. 179 - 191

DOI: https://doi.org/10.1192/bjp.2021.219 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © The Author(s), 2022. Published by Cambridge University Press on behalf of the Royal College of Psychiatrists

Psychosis

Psychosis is a mental illness characterised by hallucinations, delusions and thought disorder. The median lifetime prevalence of psychosis is around 8 per 1000 of the global population.^{Reference Moreno-Küstner, Martín and Pastor1} Psychotic disorders, including schizophrenia, are in the top 20 leading causes of disability worldwide.² People with psychosis have heterogeneous outcomes. More than 40% fail to achieve symptomatic remission.^{Reference Lally, Ajnakina, Stubbs, Cullinane, Murphy and Gaughran3} At present, clinicians struggle to predict long-term outcome in individuals with first-episode psychosis (FEP).

Prediction modelling

Prediction modelling has the potential to revolutionise medicine by predicting individual patient outcome.^{Reference Darcy, Louie and Roberts4} Early identification of those with good and poor outcomes would allow for a more personalised approach to care, matching interventions and resources to those most at need. This is the basis of precision medicine. Risk prediction models have been successfully employed clinically in many areas of medicine; for example, the QRISK tool predicts cardiovascular risk in individual patients.^{Reference Hippisley-Cox, Coupland and Brindle5} However, within psychiatry, precision medicine is not yet established within clinical practice. In FEP, precision medicine could enable rapid stratification and targeted intervention, thereby decreasing patient suffering and limiting treatment associated risks such as medication side-effects and intrusive monitoring.

Salazar de Pablo et al recently undertook a broad systematic review of individualised prediction models in psychiatry.^{Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan and Oliver6} They found clear evidence that precision psychiatry has developed into an important area of research, with the greatest number of prediction models focusing on outcomes in psychosis. However, the field is hindered by methodological flaws such as lack of validation. Further, there is a translation gap, with only one study considering implementation into clinical practice. Systematic guidance for the development, validation and presentation of prediction models is available.^{Reference Steyerberg and Vergouwe7} Further, the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement sets standards for reporting.^{Reference Collins, Reitsma, Altman and Moons8} Models that do not adhere to these guidelines result in unreliable predictions, which may cause more harm than good in guiding clinical decisions.^{Reference Wynants, Van Calster, Collins, Riley, Heinze and Schuit9} Salazar de Pablo et al ‘s review was impressive in scope, but necessarily limited in detailed analysis of the specific models included.^{Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan and Oliver6} Systematic reviews focusing on predicting the transition to psychosis^{Reference Studerus, Ramyead and Riecher-Rössler10,Reference Rosen, Betz, Schultze-Lutter, Chisholm, Haidl and Kambeitz-Ilankovic11} and relapse in psychosis have also been published.^{Reference Sullivan, Northstone, Gadd, Walker, Margelyte and Richards12} In our present review, we focus on FEP with the aim to systematically review and critically appraise the prediction models for the prediction of poor outcomes.

Method

We designed this systematic review in accordance with the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS).^{Reference Moons, de Groot, Bouwmeester, Vergouwe, Mallett and Altman13} A protocol for this study was published with the International Prospective Register of Systematic Reviews (PROSPERO), under registration number CRD42019156897.

We developed the eligibility criteria under the Population, Index, Comparator, Outcome, Timing and Setting (PICOTS) guidance (see Supplementary Material available at https://doi.org/10.1192/bjp.2021.219). A study was eligible for inclusion if it utilised a prospective design, including patients diagnosed with FEP, and developed, updated or validated prognostic prediction models for any possible outcome, in any setting. We excluded non-English language studies, those where the full text was not available, those involving diagnostic prediction models and those where the outcome predicted was ≤3 months from baseline as we were interested in longer-term prediction.

We searched PubMed, PsycINFO, EMBASE, CINAHL Plus, Web of Science Core Collection and Google Scholar, from inception up to 28 January 2021. In addition, we manually checked references cited in the systematically searched articles. The search terms were based around three themes: ‘Prediction’, ‘Outcome’ and ‘First Episode Psychosis’ terms. The full search strategy is available in the Supplementary Material. Two reviewers (R.L. and L.T.) independently screened the titles and abstracts. Full-text screening was completed by three independent reviewers (R.L., P.K.M. and S.P.L.). Disagreements were resolved by consensus.

Data extraction was conducted independently by two reviewers (R.L. and S.P.L.), following recommendations in the CHARMS checklist.^{Reference Moons, de Groot, Bouwmeester, Vergouwe, Mallett and Altman13} From all eligible studies, we collected information on study characteristics, methodology and performance. Study characteristics collected included first author name, year, region, whether the study was multicentre, study type, setting, participant description, outcome, outcome timing, predictor categories and number of models presented. Methodology considered sample size, events per variable (EPV), number of events in validation data-set, number of candidate and retained predictors, methods of variable selection, presence and handling of missing data, modelling strategies, shrinkage, validation strategies (see below), whether models were recalibrated, if clinical utility was assessed and whether the full models were presented. Steyerberg and Harrell outline a hierarchy of validation strategies from apparent (which assesses model performance on the data used to develop it and will be severely optimistic) to internal (via cross-validation or bootstrapping), internal–external (e.g. validation across centres in the same study) and external validation (to assess if models generalise to related populations in different settings).^{Reference Steyerberg and Harrell14} Apparent, internal and internal–external validation use the derivation data-set only, whereas external validation requires the addition of a validation data-set. Performance for the best-performing model per outcome in each article was considered by model validation strategy, including model discrimination (reported as the C-statistic, which is equal to the area under the receiver operating characteristic curve for binary outcomes), calibration, other global performance measures and classification metrics. If not reported, where possible, the balanced accuracy (sensitivity + specificity / 2) and the prognostic summary index (positive + negative predictive value – 1) were calculated.

Two reviewers (R.L. and S.P.L.) independently assessed the risk of bias in included studies by using the Prediction Model Risk Of Bias Assessment Tool (PROBAST), a risk-of-bias assessment tool designed for systematic reviews of diagnostic or prognostic prediction models.^{Reference Wolff, Moons, Riley, Whiting, Westwood and Collins15,Reference Moons, Wolff, Riley, Whiting, Westwood and Collins16} We considered all models reported in each article and assigned an overall rating to the article. PROBAST uses a structured approach with signalling questions across four domains: ‘participants’, ‘predictors’, ‘outcome’ and ‘statistical analysis’. Signalling questions are answered ‘yes’, ‘probably yes’, ‘no’, ‘probably no’ or ‘no information’. Answering ‘yes’ indicates a low risk of bias, whereas answering ‘no’ indicates high risk of bias. A domain where all signalling questions are answered as ‘yes’ or ‘probably yes’ indicates low risk of bias. Answering ‘no’ or ‘probably no’ flags the potential for the presence of bias, and reviewers should use their personal judgement to determine whether issues identified have introduced bias. Applicability of included studies to the review question is also considered in PROBAST.

We reported our results according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 Statement (see Supplementary Material).^{Reference Page, Moher, Bossuyt, Boutron, Hoffmann and Mulrow17}

Results

Systematic review of the literature yielded 2353 records from database searches and 67 from additional sources. After removal of duplicates, 1543 records were screened. Of these, 82 full texts were reviewed, which resulted in 13 studies meeting criteria for inclusion in our qualitative synthesis (Fig. 1).^{Reference Ajnakina, Agbedjro, Lally, Forti, Trotta and Mondelli18–Reference Puntis, Whiting, Pappa and Lennox30}

Fig. 1 Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram.

Study characteristics are summarised in Table 1. The 13 included studies, comprising a total of 19 different patient cohorts, reported 31 different prediction models. Dates of publication ranged from 2006 to 2021. Twelve studies (92%) recruited participants from Europe, with two studies (15%) also recruiting participants from Israel and one study (8%) from Singapore. Over two-thirds (n = 9) of studies were multicentre. Ten studies (77%) included participants from cohort studies, three studies (23%) included participants from randomised controlled trials and two studies (15%) included participants from case registries. Two studies (15%) included only out-patients, four (31%) included in-patients and out-patients, and the rest did not specify their setting. Cohort sample size ranged from 47 to 1663 patients. The average age of patients ranged from 21 to 28 years, and 49–77% of the cohorts were male. Where specified, the average duration of untreated psychosis ranged from 34 to 106 weeks. Ethnicity was reported in eight studies (62%), with the percentage of Black and minority ethnic patients in the cohorts ranging from 4 to >75%. The definition of FEP was primarily non-affective psychosis in the majority of patient cohorts, with the minority also including affective psychosis, and two cohorts also including drug-induced psychosis. All but one study (92%) considered solely sociodemographic and clinical predictors. A wide range of outcomes were assessed across the 13 included studies, including symptom remission in five studies (38%), global functioning in five studies (38%), vocational functioning in three studies (23%), treatment resistance in two studies (15%), hospital readmission in two studies (15%) and quality of life in one study (8%). All of the outcomes were binary. The follow-up period of included studies ranged from 1 to 10 years.

Table 1 Study characteristics

DUP, duration of untreated psychosis; FEP, first-episode psychosis; EET, employment, education or training; GAF, Global Assessment of Functioning; DAS, Disability Assessment Schedule.

Study prediction-modelling methodologies are outlined in Table 2. Nine (69%) studies pertained solely to model development, with the highest level of validation reported being apparent validity in four of the studies, internal validity in three of the studies and internal–external validity (via leave-one-site-out cross-validation) in two of the studies. The remaining four (31%) studies also included a validation cohort and reported external validity. High dimensionality was common across the study cohorts, with the majority having a very low EPV ratio and up to 258 candidate predictors considered. Some form of variable selection was used in the majority (62%) of studies. The number of events in the external validation cohort ranged from 23 to 173. All of the studies had missing data. Six studies (46%) used complete-case analysis, five (38%) studies used single imputation and the remaining two (15%) studies applied multiple imputation.

Table 2 Study methodology

EPV, events per variable; LASSO, least absolute shrinkage and selection operator; MLE, maximum likelihood estimation.

The most common modelling methodology was logistic regression fitted by maximum likelihood estimation, followed by logistic regression with regularisation. Only two studies used machine learning methods, both via support vector machines. Just over half of the studies (54%) did not use any variable shrinkage, and only three (23%) studies recalibrated their models based on validation to improve performance. The full model was presented in seven (54%) studies. Only two (15%) studies assessed clinical utility.

The performance of the best model per study outcome grouped by method of validation to allow for appropriate comparisons is reported in Table 3. For the five studies (38%) reporting only apparent validity, two reported a measure of discrimination and only one considered calibration. For the seven (54%) studies reporting internal validation performance, four reported discrimination with a C-statistic ranging from 0.66 to 0.77, and four reported calibration. For the three (23%) studies reporting internal–external validation, only one study considered discrimination with a C-statistic, which ranged from 0.703 to 0.736 across each of its four models. None of the studies reporting internal–external validation considered any measure of calibration. All four (31%) studies reporting external validation considered model discrimination, with C-statistics ranging from 0.556 to 0.876. However, only two of these studies considered calibration. Table 3 also records any global performance metrics, including the Brier score and McFadden's pseudo-R ², both of which incorporate aspects of discrimination and calibration. Various classification metrics were reported across the study models, but it is difficult to make any meaningful comparisons between these alone, without considering the models’ corresponding discrimination and calibration metrics, which were not universally reported.

Table 3 Performance metrics for best model per outcome in each study

PPV, positive predictive value; NPV, negative predictive value; PSI, prognostic summary index; EET, employment, education or training; GAF, Global Assessment of Functioning; DAS, Disability Assessment Schedule.

We applied the PROBAST tool to the 31 different prediction models across the 13 studies in our systematic review, and determined an overall risk-of-bias rating for each study, as summarised in Supplementary Table 1. The majority (85%) of studies had an overall ‘high’ risk of bias. In each of these studies, the risk of bias was rated ‘high’ in the analysis domain, with one study also having a ‘high’ risk of bias in the predictors domain. The main reasons for the ‘high’ risk of bias in the analysis domain were insufficient participant numbers and consequently low EPV, inappropriate methods of variable selection including via univariable analysis, a lack of appropriate validation with only apparent validation, an absence of reported measures of discrimination and calibration, and inappropriate handling of missing data by either complete-case analysis or single imputation. Two studies, Leighton et al^{Reference Leighton, Krishnadas, Upthegrove, Marwaha, Steyerberg and Broome29} and Puntis et al,^{Reference Puntis, Whiting, Pappa and Lennox30} were rated overall ‘low’ risk of bias. These studies considered symptom remission and psychiatric hospital readmission outcomes, respectively. Both studies externally validated their prediction model and considered its clinical utility. However, neither study considered the implementation of the prediction model into actual clinical practice. When we assessed the 13 included studies according to PROBAST applicability concerns, all of the studies were considered overall ‘low’ concern. This is indicative of the broad scope of our systematic review.

Discussion

Our systematic review identified 13 studies reporting 31 prognostic prediction models for the prediction of a wide range of clinical outcomes. The majority of models were developed via logistic regression. There were several methodological limitations identified, including a lack of appropriate validation, issues with handling missing data and a lack of reporting of calibration and discrimination measures. We identified two studies with models at low risk of bias as assessed with PROBAST, both of which externally validated their models.

Principal findings in context

Our systematic review found no consistent definition of FEP across the different cohorts used for developing and validating prediction models. A lack of an operational definition for FEP within clinical and research settings has previously been identified as major a barrier to progress.^{Reference Breitborde, Srihari and Woods31} The majority of cohorts in our systematic review included only individuals with non-affective psychosis, with a minority also including affective psychosis. In contrast, early intervention services typically do not make a distinction between affective and non-affective psychosis in those that they accept onto their service.³² As such, there may be issues with generalisability of prediction models developed in cohorts with solely non-affective psychosis to real-world clinical practice.

A wide range of different outcomes were predicted by the FEP models, including symptom remission, global functioning, vocational functioning, treatment resistance, hospital readmission and quality-of-life outcomes. This is reflective of the fact that recovery from FEP is not readily distilled down to a single factor such as symptom remission. Meaningful recovery is represented by a constellation of multidimensional outcomes unique to each individual.^{Reference Jääskeläinen, Juola, Hirvonen, McGrath, Saha and Isohanni33} We should engage people with lived experience, to ensure that prediction models are welcomed and are predicting outcomes most relevant to the people they are for.

All of the prediction models were developed in populations from high-income countries, and only three studies included participants from countries outside of Europe, an issue not unique to FEP research. Consequently, it is currently unknown how prediction models for FEP would generalise to low-income countries. Prediction models may have considerable benefit in low-income countries, where almost 80% of patients with FEP live, but where mental health support is often scarce.^{Reference Singh and Javed34} Prediction models could help prioritise the appropriate utilisation of limited healthcare resources.

Only one study considered predictor variables other than clinical or sociodemographic factors. In this study, the additional predictors did not add significant value.^{Reference de Nijs22} In recent years, substantial progress has been made in elucidating the pathophysiological mechanisms underpinning the development of psychosis. We now recognise important roles for genetic factors, neurodevelopmental factors, dopamine and glutamate.^{Reference Lieberman and First35} Prediction model performance may be improved by the incorporation of these biologically relevant disease markers as predictor variables. However, the cost–benefit aspect of adding more expensive and less accessible disease markers must be carefully considered, especially if models are to be utilised in settings where resources are more limited.

Machine learning can be operationally defined as ‘models that directly and automatically learn from data’. This is in contrast to regression models, which ‘are based on theory and assumptions, and benefit from human intervention and subject knowledge for model specification’.^{Reference Christodoulou, Ma, Collins, Steyerberg, Verbakel and Van Calster36} Just two studies used machine learning techniques for their modelling.^{Reference de Nijs22,Reference Koutsouleris, Kahn, Chekroud, Leucht, Falkai and Wobrock26} The rest of the studies used logistic regression. We were unable to make any comparison between the discrimination and calibration ability of the two studies that used machine learning and the other studies, because these metrics were not provided. However, a recent systematic review found no evidence of superior performance of clinical prediction models that use machine learning methods over logistic regression.^{Reference Christodoulou, Ma, Collins, Steyerberg, Verbakel and Van Calster36} In any case, the distinction between regression models and machine learning has been viewed to be artificial. Instead, algorithms may exist ‘along a continuum between fully human-guided to fully machine-guided data analysis’.^{Reference Beam and Kohane37} An alternative comparison may be between linear and non-linear classifiers. Only one study used a non-linear classifier,^{Reference Koutsouleris, Kahn, Chekroud, Leucht, Falkai and Wobrock26} but again we were unable to gain meaningful insights into its relative performance because appropriate metrics were not provided.

A principal finding from our systematic review is the presence of methodological limitations across the majority of studies. Steyerberg et al outline four key measures of predictive performance that should be assessed in any prediction-modelling study: two measures of calibration (the model intercept (A) and the calibration slope (B)), discrimination via a concordance statistic (C) and clinical usefulness with decision-curve analysis (D).^{Reference Steyerberg and Vergouwe7} Model calibration is the level of agreement between the observed outcomes and the predictions. For example, if a model predicts a 5% risk of cancer, then, according to such a prediction, the observed proportion should be five cancers per 100 people. Discrimination is the ability of a model to distinguish between a patient with the outcome and one without.^{Reference Steyerberg and Vergouwe7} Our review found that only seven studies (54%) reported discrimination and just five (38%) reported any measure of calibration. The remaining studies reported only classification metrics, such as accuracy or balanced accuracy. The problem with solely reporting classification metrics is that they vary both across models and across different probability thresholds for the same model. This renders the comparison between models less meaningful. It is further argued that setting a classification threshold for a probability-generating model is premature. Rather, a clinician may choose to set different probability thresholds for the same prediction model, depending on the situation at hand, to optimise the balance between false positives and false negatives. For example, in the case of a model predicting cancer, a clinician may choose a lower probability threshold to offer a non-invasive screening test and a higher probability threshold to suggest an invasive and potentially harmful biopsy. Further, without any measure of model calibration, we are unable to assess if the model can make unbiased estimates of outcome.^{Reference Harrell38} The final key step in assessing the performance of a prediction model is to determine its clinical usefulness – that is, can better decisions be made with the model than without? Decision-curve analysis considers the net benefit (the treatment threshold weighted sum of true- minus false-positive classifications) for a prediction model compared with the default strategy of treating all or no patients, across an entire range of treatment thresholds.^{Reference Vickers, van Calster and Steyerberg39} Only two studies (15%) included in our review considered whether the model was clinically useful. Without proper validation of the prediction models, the reported performances are likely to be overly optimistic. Four studies (31%) reported only apparent validity. Just four studies (31%) reported external validation, which is considered essential before applying a prediction model to clinical practice.^{Reference Steyerberg and Harrell14}

Altogether, just two studies (15%) had an overall ‘low’ risk of bias according to PROBAST, reflecting these methodological limitations. Neither study considered real-world implementation. To progress with implementation, impact studies are required. These would involve a cluster randomised trial comparing patient outcomes between a group with treatment informed by a clinical prediction model and a control group.^{Reference Moons, Kengne, Grobbee, Royston, Vergouwe and Altman40} We are not aware of any such study having been carried out within the field of psychiatry. However, Salazar de Pablo et al suggest that PROBAST thresholds for considering a study to be a ‘low’ risk of bias may be too strict.^{Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan and Oliver6} Indeed, in the field of machine learning, multiple imputation is frequently computationally infeasible, and single imputation may be viewed as sufficient. This is especially true in larger data-sets or in the presence of relatively few missing values.^{Reference Steyerberg41}

Strengths and limitations

Our review had a number of strengths. We provide the first systematic overview of prediction-modelling studies for use in patients with FEP. We offer a detailed critique of the study characteristics, their methodologies and model performance metrics. Further, our review adheres to gold-standard guidance for extracting data from prediction models and for assessing bias, namely the CHARMS checklist and PROBAST.

There were several limitations. Our initial aim was to perform a meta-analysis of any prediction model that was validated across different settings and populations. However, no meta-analysis was possible because no single prediction model was validated more than once. In addition, as a consequence of poor reporting of discrimination and calibration performance across the studies, it was often difficult to make meaningful comparison between the prediction models. Also, the lack of consensus as to the most important outcome measure in FEP, with six different outcomes considered across only 13 included studies, further hindered efforts at drawing meaningful comparisons between the included studies and their respective prediction models. Likewise, if more studies had considered the same outcome measures, this may have afforded the opportunity to validate existing prediction models rather than necessitating the creation of additional new models. All published prediction-modelling studies in FEP reported significant positive findings. It is possible that studies that had negative findings were held back from publication, reflecting the possibility of publication bias. We originally intended to evaluate the overall certainty in the body of evidence by using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework.^{Reference Schünemann, Oxman, Brozek, Glasziou, Jaeschke and Vist42} GRADE was originally designed for reviews of intervention studies, but has not yet been adapted for use in systematic reviews of prediction models. Consequently, in its current form, we did not find GRADE to be a suitable tool for our review and decided not to use it. Future research should consider how to adapt GRADE for use in systematic reviews of prediction models.

Implications for future research

It is clear that there is a growing trend for the development of prediction models in FEP.^{Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan and Oliver6} FEP is an illness that responds best to an early intervention paradigm.^{Reference Birchwood, Todd and Jackson43} Prediction models have the potential to optimise the allocation of time-critical interventions, like clozapine for treatment resistance.^{Reference Farooq, Choudry, Cohen, Naeem and Ayub44} However, several steps are necessary before meaningful implementation into real-world clinical practice. The field must prioritise external validation and replication of existing prediction models in larger sample sizes, to increase the EPV. This is best accomplished by an emphasis on data-sharing and open collaboration. Prediction studies should include FEP cohorts from low-income countries, where there is considerable potential for benefit by helping to prioritise limited resources to those most in need. Harmonisation of data collection across the field, both in terms of predictors and outcomes measured, would facilitate validation efforts. There should be a greater consideration of biologically relevant and cognitive predictors based on our growing understanding of disease mechanisms, which could optimise prediction model performance. Finally, our review highlights considerable methodological pitfalls in much of the current literature. Future prediction-modelling studies should focus on methodological rigour with adherence to accepted best-practice guidance.^{Reference Wynants, Van Calster, Collins, Riley, Heinze and Schuit9,Reference Steyerberg and Harrell14,Reference Harrell38} Our goal in psychiatry should be to develop an innovative approach to care by using prediction models. Application of these approaches into clinical practice would enable rapid and targeted intervention, thereby limiting treatment-associated risks and reducing patient suffering.

Supplementary material

Supplementary material is available online at https://doi.org/10.1192/bjp.2021.219.

Data availability

Data is available from the corresponding author, S.P.L., upon reasonable request.

Author contributions

P.K.M. and R.L. formulated the research question and designed the study. R.L., S.P.L., L.T. and P.K.M. collected the data. R.L., S.P.L. and P.K.M. analysed the data and drafted the manuscript. L.T., G.V.G., S.J.W., S.-J.H.F., F.D. and J.C. critically evaluated and revised the manuscript.

Funding

R.L. is funded by the Institute for Mental Health Priestley Scholarship, University of Birmingham. S.P.L. is funded by a clinical academic fellowship from the Chief Scientist Office, Scotland (CAF/19/04). S.J.W. is funded by the Medical Research Council, UK (grant MR/K013599).

Declaration of interest

G.V.G. has received support from Horizon 2020 E-Infrastructures (H2020-EINFRA), the National Institute for Health Research (NIHR) Birmingham Experimental Cancer Medicine Centre (ECMC), NIHR Birmingham Surgical Reconstruction Microbiology Research Centre (SRMRC), the NIHR Birmingham Biomedical Research Centre, and the Medical Research Council Health Data Research United Kingdom (MRC HDR UK), an initiative funded by UK Research and Innovation, Department of Health and Social Care (England), the devolved administrations and leading medical research charities. J.C. has received grants from the Wellcome Trust and Sackler Trust, and honorariums from Johnson & Johnson. P.K.M. has received honorariums from Sunovion and Sage, and is a Director of Noux Technologies Limited. All other authors declare no competing interests.

Footnotes

Joint first authors.

References

Moreno-Küstner, B, Martín, C, Pastor, L. Prevalence of psychotic disorders and its association with methodological issues. A systematic review and meta-analyses. PLoS One 2018; 13: e0195687.10.1371/journal.pone.0195687CrossRef Google Scholar PubMed

Institute for Health Metrics and Evaluation (IHME). GBD Compare Data Visualization. IHME, University of Washington, 2021 (http://vizhub.healthdata.org/gbd-compare).Google Scholar

Lally, J, Ajnakina, O, Stubbs, B, Cullinane, M, Murphy, KC, Gaughran, F, et al. Remission and recovery from first-episode psychosis in adults: systematic review and meta-analysis of long-term outcome studies. Br J Psychiatry 2017; 211: 350–8.10.1192/bjp.bp.117.201475CrossRef Google Scholar PubMed

Darcy, AM, Louie, AK, Roberts, LW. Machine learning and the profession of medicine. JAMA 2016; 315(6): 551.10.1001/jama.2015.18421CrossRef Google Scholar PubMed

Hippisley-Cox, J, Coupland, C, Brindle, P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017; 357: j2099.10.1136/bmj.j2099CrossRef Google Scholar PubMed

Salazar de Pablo, G, Studerus, E, Vaquerizo-Serrano, J, Irving, J, Catalan, A, Oliver, D, et al. Implementing precision psychiatry: a systematic review of individualized prediction models for clinical practice. Schizophr Bull 2021; 47(2): 284–97.10.1093/schbul/sbaa120CrossRef Google Scholar PubMed

Steyerberg, EW, Vergouwe, Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 2014; 35(29): 1925–31.10.1093/eurheartj/ehu207CrossRef Google Scholar

Collins, GS, Reitsma, JB, Altman, DG, Moons, KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015; 350: g7594.10.1136/bmj.g7594CrossRef Google Scholar PubMed

Wynants, L, Van Calster, B, Collins, GS, Riley, RD, Heinze, G, Schuit, E, et al. Prediction models for diagnosis and prognosis of Covid-19: systematic review and critical appraisal. BMJ 2020; 369: 26.Google Scholar PubMed

Studerus, E, Ramyead, A, Riecher-Rössler, A. Prediction of transition to psychosis in patients with a clinical high risk for psychosis: a systematic review of methodology and reporting. Psychol Med 2017; 47: 1163–78.10.1017/S0033291716003494CrossRef Google Scholar PubMed

Rosen, M, Betz, LT, Schultze-Lutter, F, Chisholm, K, Haidl, TK, Kambeitz-Ilankovic, L, et al. Towards clinical application of prediction models for transition to psychosis: a systematic review and external validation study in the PRONIA sample. Neurosci Biobehav Rev 2021; 125: 478–92.10.1016/j.neubiorev.2021.02.032CrossRef Google Scholar PubMed

Sullivan, S, Northstone, K, Gadd, C, Walker, J, Margelyte, R, Richards, A, et al. Models to predict relapse in psychosis: a systematic review. PLoS One 2017; 12(9): e0183998.10.1371/journal.pone.0183998CrossRef Google Scholar PubMed

Moons, KGM, de Groot, JAH, Bouwmeester, W, Vergouwe, Y, Mallett, S, Altman, DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med 2014; 11(10): e1001744.10.1371/journal.pmed.1001744CrossRef Google Scholar PubMed

Steyerberg, EW, Harrell, FE Jr. Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol 2016; 69: 245–7.10.1016/j.jclinepi.2015.04.005CrossRef Google Scholar PubMed

Wolff, RF, Moons, KGM, Riley, RD, Whiting, PF, Westwood, M, Collins, GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019; 170(1): 51–8.10.7326/M18-1376CrossRef Google Scholar PubMed

Moons, KGM, Wolff, RF, Riley, RD, Whiting, PF, Westwood, M, Collins, GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 2019; 170(1): W1–33.10.7326/M18-1377CrossRef Google Scholar PubMed

Page, MJ, Moher, D, Bossuyt, PM, Boutron, I, Hoffmann, TC, Mulrow, CD, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ 2021; 372: n160.Google Scholar PubMed

Ajnakina, O, Agbedjro, D, Lally, J, Forti, M, Trotta, A, Mondelli, V, et al. Predicting onset of early- and late-treatment resistance in first-episode schizophrenia patients using advanced shrinkage statistical methods in a small sample. Psychiatry Res 2020; 294: 113527.10.1016/j.psychres.2020.113527CrossRef Google Scholar

Bhattacharyya, S, Schoeler, T, Patel, R, di Forti, M, Murray, RM, McGuire, P. Individualized prediction of 2-year risk of relapse as indexed by psychiatric hospitalization following psychosis onset: model development in two first episode samples. Schizophr Res 2021; 228: 483–92.10.1016/j.schres.2020.09.016CrossRef Google Scholar PubMed

Chua, YC, Abdin, E, Tang, C, Subramaniam, M, Verma, S. First-episode psychosis and vocational outcomes: a predictive model. Schizophr Res 2019; 211: 63–8.10.1016/j.schres.2019.07.009CrossRef Google Scholar PubMed

Demjaha, A, Lappin, JM, Stahl, D, Patel, MX, MacCabe, JH, Howes, OD, et al. Antipsychotic treatment resistance in first-episode psychosis: prevalence, subtypes and predictors. Psychol Med 2017; 47(11): 1981–9.10.1017/S0033291717000435CrossRef Google Scholar PubMed

de Nijs, J. The Outcome of Psychosis. Utrecht University, 2019 (https://dspace.library.uu.nl/bitstream/1874/376436/1/22_01_3_jessica_de_nijs_compleet_final.pdf).Google Scholar

Derks, EM, Fleischhacker, WW, Boter, H, Peuskens, J, Kahn, RS. Antipsychotic drug treatment in first-episode psychosis should patients be switched to a different antipsychotic drug after 2, 4, or 6 weeks of nonresponse? J Clin Psychopharmacol 2010; 30(2): 176–80.10.1097/JCP.0b013e3181d2193cCrossRef Google Scholar PubMed

Flyckt, L, Mattsson, M, Edman, G, Carlsson, R, Cullberg, J. Predicting 5-year outcome in first-episode psychosis: construction of a prognostic rating scale. J Clin Psychiatry 2006; 67(6): 916–24.10.4088/JCP.v67n0608CrossRef Google Scholar PubMed

González-Blanch, C, Perez-Iglesias, R, Pardo-García, G, Rodríguez-Snchez, JM, Martínez-García, O, Vázquez-Barquero, JL, et al. Prognostic value of cognitive functioning for global functional recovery in first-episode schizophrenia. Psychol Med 2010; 40(6): 935–44.10.1017/S0033291709991267CrossRef Google Scholar PubMed

Koutsouleris, N, Kahn, RS, Chekroud, AM, Leucht, S, Falkai, P, Wobrock, T, et al. Multisite prediction of 4-week and 52-week treatment outcomes in patients with first-episode psychosis: a machine learning approach. Lancet Psychiatry 2016; 3(10): 935–46.10.1016/S2215-0366(16)30171-7CrossRef Google Scholar PubMed

Leighton, SP, Krishnadas, R, Chung, K, Blair, A, Brown, S, Clark, S, et al. Predicting one-year outcome in first episode psychosis using machine learning. PLoS One 2019; 14(3): e0212846.10.1371/journal.pone.0212846CrossRef Google Scholar PubMed

Leighton, SP, Upthegrove, R, Krishnadas, R, Benros, ME, Broome, MR, Gkoutos, GV, et al. Development and validation of multivariable prediction models of remission, recovery, and quality of life outcomes in people with first episode psychosis: a machine learning approach. Lancet Digit Heal 2019; 1(6): e261–70.10.1016/S2589-7500(19)30121-9CrossRef Google Scholar PubMed

Leighton, SP, Krishnadas, R, Upthegrove, R, Marwaha, S, Steyerberg, EW, Broome, MR, et al. Development and validation of a non-remission risk prediction model in first episode psychosis: an analysis of two longitudinal studies. Schizophr Bull Open 2021; 2(1): sgab041.10.1093/schizbullopen/sgab041CrossRef Google Scholar

Puntis, S, Whiting, D, Pappa, S, Lennox, B. Development and external validation of an admission risk prediction model after treatment from early intervention in psychosis services. Transl Psychiatry 2021; 11: 35.10.1038/s41398-020-01172-yCrossRef Google Scholar PubMed

Breitborde, NJK, Srihari, VH, Woods, SW. Review of the operational definition for first-episode psychosis. Early Interv Psychiatry 2009; 3: 259–65.10.1111/j.1751-7893.2009.00148.xCrossRef Google Scholar PubMed

National Institute for Health and Care Excellence (NICE). Implementing the Early Intervention in Psychosis Access and Waiting Time Standard: Guidance. NICE, 2016 (https://www.nice.org.uk/guidance/qs80/resources/implementing-the-early-intervention-in-psychosis-access-and-waiting-time-standard-guidance-2487749725).Google Scholar

Jääskeläinen, E, Juola, P, Hirvonen, N, McGrath, JJ, Saha, S, Isohanni, M, et al. A systematic review and meta-analysis of recovery in schizophrenia. Schizophr Bull 2013; 39(6): 1296–306.10.1093/schbul/sbs130CrossRef Google Scholar

Singh, SP, Javed, A. Early intervention in psychosis in low- and middle-income countries: a WPA initiative. World Psychiatry 2020; 19: 122.10.1002/wps.20708CrossRef Google Scholar PubMed

Lieberman, JA, First, MB. Psychotic disorders. N Engl J Med 2018; 379(3): 270–80.10.1056/NEJMra1801490CrossRef Google Scholar PubMed

Christodoulou, E, Ma, J, Collins, GS, Steyerberg, EW, Verbakel, JY, Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019; 110: 12–22.10.1016/j.jclinepi.2019.02.004CrossRef Google Scholar PubMed

Beam, AL, Kohane, IS. Big data and machine learning in health care. JAMA 2018; 319: 1317–8.10.1001/jama.2017.18391CrossRef Google Scholar PubMed

Harrell, FE Jr. Regression Modeling Strategies. Springer International Publishing, 2015.10.1007/978-3-319-19425-7CrossRef Google Scholar

Vickers, AJ, van Calster, B, Steyerberg, EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagnostic Progn Res 2019; 3: 18.10.1186/s41512-019-0064-7CrossRef Google Scholar PubMed

Moons, KGM, Kengne, AP, Grobbee, DE, Royston, P, Vergouwe, Y, Altman, DG, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98: 691–8.10.1136/heartjnl-2011-301247CrossRef Google Scholar PubMed

Steyerberg, EW. Clinical Prediction Models 2nd ed. Springer International Publishing, 2019.10.1007/978-3-030-16399-0CrossRef Google Scholar

Schünemann, HJ, Oxman, AD, Brozek, J, Glasziou, P, Jaeschke, R, Vist, GE, et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ 2008; 336(7653): 1106–10.10.1136/bmj.39500.677199.AECrossRef Google Scholar PubMed

Birchwood, M, Todd, P, Jackson, C. Early intervention in psychosis: the critical period hypothesis. Br J Psychiatry 1998; 172(S33): 53–9.10.1192/S0007125000297663CrossRef Google Scholar PubMed

Farooq, S, Choudry, A, Cohen, D, Naeem, F, Ayub, M. Barriers to using clozapine in treatment-resistant schizophrenia: systematic review. BJPsych Bull 2019; 43(1): 8–16.10.1192/bjb.2018.67CrossRef Google Scholar PubMed

Fig. 1 Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram.

Table 1 Study characteristics

Table 2 Study methodology

Table 3 Performance metrics for best model per outcome in each study

Lee et al. supplementary material

File 18.5 KB

This journal is not currently accepting new eletters.

eLetters

No eLetters have been published for this article.

Article contents

Prediction models in first-episode psychosis: systematic review and critical appraisal

Abstract

Keywords

Information

Psychosis

Prediction modelling

Method

Results

Discussion

Principal findings in context

Strengths and limitations

Implications for future research

Supplementary material

Data availability

Author contributions

Funding

Declaration of interest

Footnotes

References

Lee et al. supplementary material

eLetters

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests