Improving prediction of psychosis in youth at clinical high-risk: pre-baseline symptom duration and cortical thinning as moderators of the NAPLS2 risk calculator

Background. Clinical implementation of risk calculator models in the clinical high-risk for psychosis (CHR-P) population has been hindered by heterogeneous risk distributions across study cohorts which could be attributed to pre-ascertainment illness progression. To examine this, we tested whether the duration of attenuated psychotic symptom (APS) worsening prior to baseline moderated performance of the North American prodrome longitudinal study 2 (NAPLS2) risk calculator. We also examined whether rates of cortical thinning, another marker of illness progression, bolstered clinical prediction models. Methods. Participants from both the NAPLS2 and NAPLS3 samples were classified as either ‘ long ’ or ‘ short ’ symptom duration based on time since APS increase prior to baseline. The NAPLS2 risk calculator model was applied to each of these groups. In a subset of NAPLS3 participants who completed follow-up magnetic resonance imaging scans, change in cortical thickness was combined with the individual risk score to predict conversion to psychosis. Results. The risk calculator models achieved similar performance across the combined NAPLS2/NAPLS3 sample [area under the curve (AUC) = 0.69], the long duration group (AUC = 0.71), and the short duration group (AUC = 0.71). The shorter duration group was younger and had higher baseline APS than the longer duration group. The addition of cortical thinning improved the prediction of conversion significantly for the short duration group (AUC = 0.84), with a moderate improvement in prediction for the longer duration group (AUC = 0.78). Conclusions. These results suggest that early illness progression differs among CHR-P patients, is detectable with both clinical and neuroimaging measures, and could play an essential role in the prediction of clinical outcomes.


Introduction
One of the primary goals of the clinical high-risk for psychosis (CHR-P) research paradigm is to develop prognostic models estimating an individual's risk for conversion to psychosis.Such models using clinical measures have been developed (Cannon et al., 2008(Cannon et al., , 2016;;Fusar-Poli et al., 2017;Mechelli et al., 2017;Zhang et al., 2018b) and have achieved prognostic accuracies ranging from 65% to 80% in correctly discriminating eventual converters from non-converters (Sanfelici, Dwyer, Antonucci, & Koutsouleris, 2020;Worthington & Cannon, 2021).While these models have shown promise and studies validating these models in external samples have been published (Carrión et al., 2016;Koutsouleris et al., 2021;Osborne & Mittal, 2019;Zhang et al., 2018a), barriers to clinical implementation remain.
A significant barrier in ascertaining a new patient's level of risk is matching characteristics to the appropriate model and risk distribution.Existing risk calculators are developed in samples that differ in symptom severity, pathways of ascertainment, and age at ascertainment (Koutsouleris et al., 2021).Because distributions of the predictor variables as well as the rate of conversion will differ across such samples, predicted risk distributions will differ if the same algorithm is applied, potentially resulting in different individual-level risk predictions for a newly ascertained case.A key step toward implementing prognostic models in clinical settings will be to identify key characteristics that can feasibly and reliably differentiate risk distributions for increased precision in outcome estimation and treatment allocation.
Key differences in symptom severity, conversion rates, and demographic variables could be attributed to the progression of prodromal symptoms prior to the baseline visit.In a recent effort to validate the North American prodrome longitudinal study 2 (NAPLS2) risk calculator in the European-based PRONIA study (www.pronia.eu),validation was possible after statistical adjustments to account for the PRONIA sample's lower symptom severity, lower conversion rate, higher age, and higher neurocognitive performance compared with the North American-based NAPLS2 study (Koutsouleris et al., 2021).These two samples also differ in the criteria used to assess psychosis risk syndromes at a broad levelthe PRONIA study uses the comprehensive assessment of at-risk mental state (CAARMS) (Yung et al., 2005) criteria and the NAPLS uses the criteria of psychosis-risk syndromes (COPS) (Miller et al., 2003) which differ in that the CAARMS permits a broader range and duration of attenuated psychotic symptom (APS).A subset of the PRONIA sample (ultra-high-risk) do match the inclusion criteria of the NAPLS sample, but only make up 20% of the larger sample.Further, these frameworks may be more or less well-suited to employing differing pathways of ascertainment, originating in either the general population as self-referrals or from other healthcare providers as clinician referrals.Thus, illness progression at the point of ascertainment may be captured differently in these two frameworks, resulting in different distributions of risk of conversion.
Evidence of differing rates of illness progression within the CHR-P population has also been detected in measures of cortical thickness.Steeper rates of cortical thinning have been observed among converters compared with non-converters (Cannon et al., 2015), and recent work indicates that accelerated cortical thinning is observable among converters as compared to nonconverters in as little as 3 months, on average, and prior to psychosis onset (Collins et al., 2022).These changes could be attributed to disrupted neurodevelopmental processes (e.g.synaptic pruning) which may contribute to the development of psychosis (Cannon et al., 2015;Germann, Brederoo, & Sommer, 2021).The progression of these neural changes may also manifest in APS onset and worsening, and early detection of these changes provides an opportunity (and potential target) for early intervention.For the purpose of predicting outcomes, models using clinical measures alone have provided a relatively high level of accuracy, and the function of neuroimaging measures in clinical prediction has yet to be demonstrated.Studies have examined the additive effects of neuroimaging measures in clinical prediction models with mixed results (Chung et al., 2019a;Koutsouleris et al., 2018); however, no study has examined the additive power of baseline cortical thickness and cortical thickness change in the period preceding psychosis onset to clinical models predicting conversion to psychosis.
In this study, we examined the moderating effect of duration of symptom progression prior to baseline on the resulting risk distribution as calculated by the NAPLS2 risk calculator.We hypothesized that a shorter duration of psychosis-risk symptom worsening would confer a higher risk of conversion and higher APS severity, whereas a longer duration of prodromal symptom worsening would confer a lower risk of conversion and lower APS severity (Chung et al., 2019b;Koutsouleris et al., 2014).If this approach is successful, it would suggest the use of cut-points specific to each risk distribution for clinical decision-making which should produce comparable performance in sensitivity and specificity across the strata.We also sought to determine whether tracking cortical thinning in the period preceding conversion could provide a potential mechanism that underlies this clinical stratification.As a secondary analysis, we hypothesized that rates of cortical thinning would differentially add to the predictive power of clinical measures such that cortical thinning would have greater predictive power in the shorter duration group as compared to the longer duration group, given the hypothesized differentiation in rates of conversion.Evidence in support of these hypotheses would suggest that the duration of symptom worsening may serve as a meaningful variable on which to stratify CHR-P samples for risk estimation and that processes underlying steeper rates of cortical thinning support this differentiation.

Participants
Participants were drawn from the (non-overlapping) first and second phases of the NAPLS2 and NAPLS3 (Addington et al., 2012(Addington et al., , 2022)).NAPLS2 and NAPLS3 are eight-and nine-site observational consortium studies, respectively, examining predictors and mechanisms related to conversion to psychosis in the CHR-P population.Participants were individuals aged 12-35 in NAPLS2 and aged 12-30 in NAPLS3 who met criteria for a psychosis risk syndrome as defined by the COPS (McGlashan, Walsh, & Woods, 2010) and as assessed by the Structured Interview for Psychosis-risk Syndromes (SIPS) (Addington et al., 2007;McGlashan, Walsh, & Woods, 2001).Exclusion criteria included any current or lifetime Diagnostic and Statistical Manual of Mental Disorders-IV diagnosis of a psychotic disorder, IQ < 70, the presence of a neurological disorder, or psychosis-risk symptoms caused by another axis I disorder.In NAPLS3, study visits occurred every 2 months for the first 8 months of the study, and at 12, 18, and 24 months.In NAPLS2, study visits occurred every 6 months for the duration of the 2-year follow-up period.Participants from both NAPLS2 and NAPLS3 provided written informed consent for the study.The protocol and consent forms for each study were approved by the institutional review boards at each site.

Risk calculator assessments
In the original NAPLS2 risk calculator, eight clinical variables that were previously shown to predict conversion to psychosis were included: age, baseline severity of SIPS positive symptom items P1 and P2 (unusual thought content and suspiciousness), score on the brief assessment of cognition in schizophrenia (BACS) symbol coding (SC) test (Keefe et al., 2008); score on the Hopkins Verbal Learning Test-Revised (HVLT-R) (Benedict, Schretlen, Groninger, & Brandt, 1998); decline in social functioning during the prior year as measured by the Global Functioning Social scale (GFS) (Cornblatt et al., 2007); stressful life events as measured by the Research Interview Life Events Scale (Dohrenwend, Askenasy, Krasnoff, & Dohrenwend, 1978); childhood traumas as measured by the Childhood Trauma and Abuse Scale (Janssen et al., 2004); and family history of psychotic disorder in a first-degree relative (Cannon et al., 2016).

Neuroimaging assessments
We incorporated measures of baseline cortical thickness and change in cortical thinning based on previous findings within the NAPLS3 study showing that rates of cortical thinning were accelerated for converters as compared to non-converters and healthy controls (Collins et al., 2022).Regions of interest included in the percent change measure were originally identified using linear mixed effects models at the vertex level to assess the diagnostic group-by-time interaction on longitudinal cortical thickness while controlling for age, sex, and scanner.These changes were detected specifically in primarily left hemisphere prefrontal cortical regions of interest.Percent change in cortical thickness was calculated based on cortical thickness change from scan 1 (baseline) to scan 2 (first follow-up) and normalized over the average number of months between scan 1 and scan 2 [mean (S.D.) interval = 2.93 (1.81) months].For full details regarding magnetic resonance imaging (MRI) procedures, quality control, data processing, and scanner information, refer to Collins et al. (2022).

Sample stratification
Participants from both the NAPLS2 and NAPLS3 samples were included in this study to create a larger, combined sample.This larger combined sample was first divided into training and testing samples with a 50-50 split.In the training dataset, participants were grouped as either 'long symptom duration' or 'short symptom duration' based on the median number of days since any positive symptom increase prior to baseline as recorded on the SIPS interview.This median was then used to split the testing set into long and short duration testing samples.Models were developed in the training set of each sample iteration (e.g.full TRAIN , short duration TRAIN , long duration TRAIN ) and tested in the validation set of each sample iteration (e.g.full VALIDATION , short duration VALIDATION , long duration VALIDATION ).

Model training and validation
In line with the original NAPLS2 risk calculator, multivariable Cox proportional hazards regression models were developed to estimate an individual's likelihood of conversion to psychosis within a 2-year period.In the present study, to maintain degrees of freedom in the smaller samples after stratification and to facilitate future external validation of the present study, we used a pruned version of the risk calculator model (i.e.excluding stressful life events and trauma history, which were not significant predictors in the original risk calculator), of which a similar version has been used in prior studies expanding on the original risk calculator findings (Chung et al., 2019a;Worthington et al., 2021).After pruning, the primary model included age, SIPS items P1 + P2, HVLT-R score, BACS SC score, change in social functioning, and family history of a psychotic disorder.
We first applied the pruned version of the risk calculator in the full training sample to rediscover the model coefficients in this combined sample and validated how well the model predicted eventual converters in the full validation sample.We then repeated this process within both the short and long duration training/validation samples.Performance metrics for predicting eventual converters in the validation samples were assessed using area under the curve (AUC, 95% confidence interval constructed using 2000 bootstrap resamples) to measure correct discrimination of eventual converters and non-converters, sensitivity to measure the true positive converter cases, specificity to measure the true negative non-converter cases, and balanced accuracy (BAC) as the mean of sensitivity and specificity.Model calibration was assessed using the Brier score and calibration slope (Brier, 1950).In all models and subsamples, the mean level of predicted risk as defined in the training sample was used as a cut-point for determining prediction accuracy.

Incorporating cortical thickness
To test the potential added value of cortical thickness measures, we assessed the increase in predictive power of baseline cortical thickness and rate of cortical thinning (e.g.percent change in gray matter thickness over an average of 3 months) to the clinical model for a subset of NAPLS3 participants who had completed follow-up imaging assessments.This analysis was limited to NAPLS3 participants due to the frequency of follow-up imaging visits available for this sample as compared to NAPLS2 (i.e.imaging assessments were conducted every 2 months in NAPLS3 and every 12 months in NAPLS2) and in line with the goal of assessing short-term cortical changes to predict future outcomes.Risk scores for each individual in the full, short duration, and long duration samples were calculated from the clinical models described above.The resulting risk score was included in three different Cox proportional hazard regression models with and without neuroimaging measures to predict conversion to psychosis.These models were: (1) risk score only; (2) risk score plus baseline cortical thickness; and (3) risk score plus percent change in cortical thickness.Due to the smaller sample sizes and number of converters after stratifying this sample, bootstrap resampling was used to internally validate these models in place of the train/test split used in the prior analysis step.

Results
A total of 1300 participants from the NAPLS2 and NAPLS3 studies who completed any follow-up data were included.Of these, 11.8% experienced eventual conversion to psychosis.The median duration since recent positive symptom increase was 120 days.After sample stratification based on this criterion, the long symptom duration group consisted of 621 participants (9.8% conversion rate) and the short symptom duration group consisted of 618 participants (13.7% conversion rate).Overall, the short duration group was younger, had fewer years of education, and had higher levels of baseline Scale of Prodromal Symptoms (SOPS) items P1 + P2 (unusual thought content and suspiciousness) than the longer duration group.See Table 1 for full results.In both groups, converters had higher ratings on unusual thought content and suspiciousness, worse cognitive functioning on the BACS SC and HVLT-R tests, and a higher proportion had a family history of a psychotic disorder.In the long duration group, but not the short duration group, converters experienced greater decline in GFS functioning and the converter group consisted of a higher proportion of racial minorities than the non-converter group.See online Supplementary eTable 1 for full results.
When the pruned risk calculator model was applied to the full sample, lower HVLT-R score and higher unusual thought content and suspiciousness scores significantly predicted conversion at the multivariable level.This pattern was seen in the short symptom duration group as well.In the long symptom duration group, greater decline in social functioning and higher levels of unusual thought content and suspiciousness significantly predicted conversion at the multivariable level.See Table 2 for full results.
Model performance was tested in the validation sample using the mean predicted risk scores as a cutoff for conversion prediction in each subsample.In the full sample, the risk calculator achieved an AUC of 0.71 (BAC = 0.64, calibration slope = 1.29).In the long duration group, the model achieved an AUC of 0.74 (BAC = 0.65, calibration slope = 0.92) and in the short duration group, the model achieved an AUC of 0.69 (BAC = 0.69, calibration slope = 1.58).Full results are reported in Table 3.This level of performance is consistent with the performance of existing risk calculators, notably the original NAPLS2 risk calculator (Cannon et al., 2016).Distributions of predicted risk scores for each stratification group are shown in Fig. 1, which clearly shows the differences in the shapes and central tendencies of risk distributions across shorter and longer duration cases.Receiver operating characteristic curves describing model performance are shown in Fig. 2.
A total of 274 participants (13.4% conversion rate) from NAPLS3 completed follow-up structural MRI scans and were included in the analysis assessing the added value of percent change in cortical thickness as a predictor of conversion to psychosis alongside the clinical risk calculator.When stratified based on symptom duration, the long symptom duration imaging sample consisted of 136 participants (13.2% conversion rate) and the short symptom duration imaging sample consisted of 126 participants (13.5% conversion rate).See online Supplement for group comparisons between short and long symptom duration groups.3 and online Supplementary eTable 5.In the multivariable models that included imaging measures, only the measure of cortical thinning, not the measure of baseline cortical thickness, was a significant predictor of conversion in addition to the risk score for each of the samples.In the short duration group, the addition of cortical thinning to the predicted risk score results in a significant improvement in AUC and a notable improvement in sensitivity and specificity as compared to the original risk calculator model applied in the short duration group of the combined NAPLS2/NAPLS3 sample.In the long duration group, the addition of cortical thinning represents a moderate improvement in performance as compared to the original risk calculator model applied in the long duration group of the combined NAPLS2/NAPLS3 sample, with improved AUC metrics and notably improved sensitivity and specificity.

Discussion
This is the first study to demonstrate a potential moderator effect of the NAPLS2 risk calculator that could explain differences in risk distributions across cohorts, which was also validated with a neuroimaging measure in an NAPLS3 subgroup.CHR-P participants with a shorter duration between APS increase and ascertainment may best align with risk distributions conferring higher risk of conversion (mean risk score = 0.14) and patients with a longer duration between prodromal symptom increase and ascertainment may best align with risk distributions conferring a lower risk of conversion (mean risk score = 0.09).As observed during the effort to replicate the NAPLS2 risk calculator in the PRONIA study, the mean risk scores of the short symptom duration group and the long symptom duration group in this study map on to the mean risk scores of the NAPLS2 study and the PRONIA study, respectively (Koutsouleris et al., 2021) (i.e.prior to adjusting for sample differences in symptom severity and other risk factors), suggesting that this symptom duration approach to stratification may be generally applicable to CHR-P populations.
Between the long symptom duration and short symptom duration groups, mean differences in demographic and clinical variables in addition to differences in significant predictor variables at the multivariable level suggest that these groups represent notably different clinical presentations.The shorter symptom duration group may represent an acute onset and worsening of symptoms, with the presence of a decline in cognitive functioning, whereas the longer symptom duration group may represent a steadier decline in both symptoms and social functioning.
Further, in the NAPLS3 subgroup, we found that decrease in cortical thickness adds to the predictive power of the risk calculator for all CHR-P patients (AUC improvement from 0.71 to 0.78), but that it differentially and significantly improves prediction in the shorter symptom duration group (AUC improvement from 0.71 to 0.84) as compared to the longer symptom duration group (AUC improvement from 0.71 to 0.78).Notably, this improvement was not seen when the baseline cortical thickness measure was added to the risk calculator.These findings further support the validity of the distinction between the symptom duration groups and suggest that cortical thinning as a mechanism of psychosis development not only maps on well to clinical indicators of illness progress, but also adds significant power to the prediction of psychosis onset.In the short symptom duration group, which experiences a higher rate of conversion to psychosis, underlying cortical changes may contribute to a more rapid and severe onset of prodromal symptoms, and eventual full-blown psychosis, than in the long symptom duration group.
Beyond neurodevelopmental changes, it will be important to determine underlying social, environmental, and clinical factors that may contribute to these different clinical presentations.There is some evidence to suggest that pathways to ascertainment and referral sources to specialized CHR-P clinics result in risk enrichment heterogeneity (Fusar-Poli et al., 2016a, 2016b).These pathways and referral sources may represent underlying factors such as access to affordable mental healthcare, healthcare resource utilization, and family support in seeking mental health treatment, which could interact with underlying neurodevelopmental changes (Patel, Leathem, Currin, & Karlsgodt, 2021) to result in the clinical presentations observed in this study.Future studies incorporating measures of social support, systematic support, and help-seeking behaviors could further elucidate the nuances contributing to the timing of seeking treatment for prodromal symptoms.
In addition to improving our ability to predict outcomes, the observed patterns of clinical and neurodevelopmental changes could provide insight into treatment selection for CHR-P individuals.Currently, no psychosocial or pharmacological intervention for preventing or delaying the onset of psychosis has emerged as a preferred treatment (Addington, Devoe, & Santesteban-Echarri, 2019), likely in part because of the heterogeneity in CHR-P cohorts not accounted for in intervention studies.Parsing this heterogeneity through improved prediction of outcomes could yield more precise results from treatment studies and tailored interventions based on factors that can be ascertained at or near a baseline visit (Worthington& Cannon, 2021).Further investigation would be warranted to determine whether different interventions could be appropriate for the groups described in this study.

Strengths and limitations
A significant strength of this study was the ability to leverage the very large sample size of the combined NAPLS2/NAPLS3 studies.This allowed us to not only account for differences across the two samples, but also create a training and validation framework to test the risk calculator models in the full and stratified samples.Nonetheless, it will be important to externally validate these findings in an independent sample.The motivation from this study stemmed from the observed differences in conversion rates and predicted risk scores across the NAPLS2 and PRONIA studies; thus, determining whether the moderator proposed in this study validates across these samples will be an important next step.Because we tested the models with neuroimaging measures only in the NAPLS3 sample, the sample size was significantly reduced, thus predisposing the prediction models to potential overfitting.Further, sample differences between those who completed imaging in the smaller NAPLS3 sample may potentially bias results.The findings incorporating short-term cortical thinning should be externally validated in a larger sample to address sample bias and ensure stability of these results.

Conclusion
This study is the first to provide evidence for a potential moderator of risk prediction in the CHR-P population which could aid clinical implementation of risk calculators to match new patients with appropriate risk distributions.Neuroimaging measures significantly bolstered prediction in the short symptom duration group and improved prediction in the long symptom duration group, and clinical differences were identified between these two groups.These patterns of results suggest that illness progression prior to and shortly after ascertainment not only differs among CHR-P patients, but also could play an essential role in the prediction of outcomes and determination of appropriate risk distribution for prognostication of new cases.
Supplementary material.The supplementary material for this article can be found at https://doi.org/10.1017/S0033291723002301 Financial support.This study was supported by the National Institute of Mental Health (grant U01MH081984 to Dr Addington; grant U01MH081928 to Dr Stone; grant U01MH081944 to Dr Cadenhead; grant U01MH081902 to Drs Cannon and Bearden; grant U01MH082004 to Dr Perkins; grant U01MH081988 to Dr Walker; grant U01MH082022 to Dr Woods; grant U01MH076989 to Dr Mathalon; grant U01MH081857 to Dr Cornblatt).This research was supported in part by a grant from the American Psychological Association to Ms. Worthington.
Competing interest.Dr Mathalon is a consultant for Boehringer-Ingelheim, Cadent Therapeutics, Recognify, and Syndisi.Dr Woods reports that during the last 36 months he has received sponsor-initiated research funding support from Teva, Boehringer-Ingelheim, Amarex, and SyneuRx.He has consulted to Boehringer-Ingelheim, New England Research Institute, and Takeda.He has been granted US patent no.8492418 B2 for a method of treating prodromal schizophrenia with glycine agonists and has received royalties from Oxford University Press.All other authors report no financial relationships with commercial interests.
Ethical standard.The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.

Figure 1 .
Figure 1.Predicted risk of conversion in the full NAPLS2/NAPLS3 sample, short symptom duration sample, and long symptom duration sample.

Figure 2 .
Figure 2. AUC for performance of predictor models without/with imaging variables.

Table 1 .
Mean comparisons of demographic and risk calculator variables for long and short symptom duration groups in the combined NAPLS2/NAPLS3 sample SOPS P1 + P2, Scale of Prodromal Symptoms: unusual thought content and suspiciousness; BACS, brief assessment of cognition in schizophrenia; HVLT-R, Hopkins Verbal Learning Test -Revised.a All participants demographic characteristics are self-report.*Asterisk indicates that mean differences between groups are significant at the p < 0.05 level.

Table 2 .
Statistics for individual predictor variables in the Cox proportional hazard regression pruned risk calculator model predicted conversion to psychosis HR, hazard ratio; CI, confidence interval; SOPS items P1 + P2, Scale of Prodromal Symptoms: unusual thought content and suspiciousness; BACS, brief assessment of cognition in schizophrenia; HVLT-R, Hopkins Verbal Learning Test -Revised.*A single asterisk indicates the predictor variable is significant in the multivariable model at the p < 0.05 level.**A double asterisk indicates the predictor variable is significant in the multivariable model at the p < 0.01 level.

Table 3 .
Performance of models predicting conversion to psychosis using the pruned NAPLS2 risk calculator in the combined NAPLS2/NAPLS3 sample and the predicted risk scores plus cortical thickness measures in the NAPLS3 imaging sample AUC, area under the curve; CI, confidence interval; BAC, balanced accuracy.