Derivation and validation of risk prediction for posttraumatic stress symptoms following trauma exposure

Raphael Kim; Tina Lin; Gehao Pang; Yufeng Liu; Andrew S. Tungate; Phyllis L. Hendry; Michael C. Kurz; David A. Peak; Jeffrey Jones; Niels K. Rathlev; Robert A. Swor; Robert Domeier; Marc-Anthony Velilla; Christopher Lewandowski; Elizabeth Datner; Claire Pearson; David Lee; Patricia M. Mitchell; Samuel A. McLean; Sarah D. Linnstaedt

doi:10.1017/S003329172200191X

Derivation and validation of risk prediction for posttraumatic stress symptoms following trauma exposure

Published online by Cambridge University Press: 01 July 2022

Raphael Kim ,

Tina Lin ,

Gehao Pang ,

Yufeng Liu ,

Jeffrey Jones and

Raphael Kim: Affiliation:
Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
Tina Lin*: Affiliation:
Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
Gehao Pang: Affiliation:
Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
Yufeng Liu: Affiliation:
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA Department of Genetics, Carolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA
Andrew S. Tungate: Affiliation:
Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
Phyllis L. Hendry: Affiliation:
Department of Emergency Medicine, University of Florida College of Medicine, Jacksonville, FL, USA
Michael C. Kurz: Affiliation:
Department of Emergency Medicine, University of Alabama, Birmingham, AL, USA
David A. Peak: Affiliation:
Department of Emergency Medicine, Massachusetts General Hospital, Boston, MA, USA
Jeffrey Jones: Affiliation:
Department of Emergency Medicine, Spectrum Health Butterworth Campus, Grand Rapids, MI, USA
Niels K. Rathlev: Affiliation:
Department of Emergency Medicine, Baystate State Health System, Springfield, MA, USA
Robert A. Swor: Affiliation:
Department of Emergency Medicine, Beaumont Hospital, Royal Oak, MI, USA
Robert Domeier: Affiliation:
Department of Emergency Medicine, St Joseph Mercy Health System, Ann Arbor, MI, USA
Marc-Anthony Velilla: Affiliation:
Department of Emergency Medicine, Sinai Grace, Detroit, MI, USA
Christopher Lewandowski: Affiliation:
Department of Emergency Medicine, Henry Ford Hospital, Detroit, MI, USA
Elizabeth Datner: Affiliation:
Department of Emergency Medicine, Albert Einstein Medical Center, Philadelphia, PA, USA
Claire Pearson: Affiliation:
Department of Emergency Medicine, Detroit Receiving, Detroit, MI, USA
David Lee: Affiliation:
Department of Emergency Medicine, North Shore University Hospital, Manhasset, NY, USA
Patricia M. Mitchell: Affiliation:
Department of Emergency Medicine, Boston University School of Medicine, Boston, MA, USA
Samuel A. McLean: Affiliation:
Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA Department of Emergency Medicine, University of North Carolina, Chapel Hill, NC, USA
Sarah D. Linnstaedt*: Affiliation:
Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
*: Author for correspondence: Sarah D. Linnstaedt, E-mail: sarah_linnstaedt@med.unc.edu
Author for correspondence: Sarah D. Linnstaedt, E-mail: sarah_linnstaedt@med.unc.edu

Article contents

Abstract
Background
Methods
Results
Conclusions
Introduction
Methods
Results
Discussion
Financial support
Conflict of interest
Footnotes
References

Rights & Permissions

Abstract

Background

Posttraumatic stress symptoms (PTSS) are common following traumatic stress exposure (TSE). Identification of individuals with PTSS risk in the early aftermath of TSE is important to enable targeted administration of preventive interventions. In this study, we used baseline survey data from two prospective cohort studies to identify the most influential predictors of substantial PTSS.

Methods

Self-identifying black and white American women and men (n = 1546) presenting to one of 16 emergency departments (EDs) within 24 h of motor vehicle collision (MVC) TSE were enrolled. Individuals with substantial PTSS (⩾33, Impact of Events Scale – Revised) 6 months after MVC were identified via follow-up questionnaire. Sociodemographic, pain, general health, event, and psychological/cognitive characteristics were collected in the ED and used in prediction modeling. Ensemble learning methods and Monte Carlo cross-validation were used for feature selection and to determine prediction accuracy. External validation was performed on a hold-out sample (30% of total sample).

Results

Twenty-five percent (n = 394) of individuals reported PTSS 6 months following MVC. Regularized linear regression was the top performing learning method. The top 30 factors together showed good reliability in predicting PTSS in the external sample (Area under the curve = 0.79 ± 0.002). Top predictors included acute pain severity, recovery expectations, socioeconomic status, self-reported race, and psychological symptoms.

Conclusions

These analyses add to a growing literature indicating that influential predictors of PTSS can be identified and risk for future PTSS estimated from characteristics easily available/assessable at the time of ED presentation following TSE.

Keywords

Machine learning prediction PTSD trauma risk factors

Type: Original Article
Information: Psychological Medicine , Volume 53 , Issue 11 , August 2023 , pp. 4952 - 4961

DOI: https://doi.org/10.1017/S003329172200191X [Opens in a new window]
Copyright: Copyright © The Author(s), 2022. Published by Cambridge University Press

Introduction

Exposure to traumatic events is common in life (Eastel et al., Reference Eastel, Lam, Lee, Lok, Tsang, Pei and Wong2019; Kilpatrick et al., Reference Kilpatrick, Resnick, Milanak, Miller, Keyes and Friedman2013). While most individuals recover following trauma exposure, a substantial subset develops adverse posttraumatic neuropsychiatric sequelae such as posttraumatic stress symptoms (PTSS). PTSS can cause tremendous suffering, functional impairment, disability, and high health care costs (Bleich & Solomon, Reference Bleich and Solomon2004; Dobie et al., Reference Dobie, Kivlahan, Maynard, Bush, Davis and Bradley2004; Gaskin & Richard, Reference Gaskin and Richard2012; Haskell et al., Reference Haskell, Gordon, Mattocks, Duggal, Erdos, Justice and Brandt2010; Kessler, Reference Kessler2000; Lew et al., Reference Lew, Otis, Tun, Kerns, Clark and Cifu2009; McNally & Frueh, Reference McNally and Frueh2013; Outcalt et al., Reference Outcalt, Kroenke, Krebs, Chumbler, Wu, Yu and Bair2015; Stewart, Ricci, Chee, Hahn, & Morganstein, Reference Stewart, Ricci, Chee, Hahn and Morganstein2003; Surís & Lind, Reference Surís and Lind2008). Even though individuals who develop PTSS often present for emergency care or other health care in the immediate/early aftermath of an inciting event, no risk prediction tools are in regular use and the development of such tools is still at an early stage. Continued development of such tools is valuable because preventive interventions delivered in the early aftermath of trauma might be the most efficacious (Fritz et al., Reference Fritz, Magel, McFadden, Asche, Thackeray, Meier and Brennan2015; Kearns, Ressler, Zatzick, & Rothbaum, Reference Kearns, Ressler, Zatzick and Rothbaum2012; Litz, Gray, Bryant, & Adler, Reference Litz, Gray, Bryant and Adler2002; Shalev et al., Reference Shalev, Ankri, Gilad, Israeli-Shalev, Adessky, Qian and Freedman2016).

A number of high-quality studies have successfully identified survey or biological characteristics that predict PTSS, either as individual predictors or sets of items identified via ensemble machine learning-based methodologies (Freedman, Brandes, Peri, & Shalev, Reference Freedman, Brandes, Peri and Shalev1999; Galatzer-Levy, Karstoft, Statnikov, & Shalev, Reference Galatzer-Levy, Karstoft, Statnikov and Shalev2014; Karstoft, Statnikov, Andersen, Madsen, & Galatzer-Levy, Reference Karstoft, Statnikov, Andersen, Madsen and Galatzer-Levy2015b; Kessler et al., Reference Kessler, Rose, Koenen, Karam, Stang, Stein and McLaughlin2014; Kleim, Ehlers, & Glucksman, Reference Kleim, Ehlers and Glucksman2007; Linnstaedt et al., Reference Linnstaedt, Rueckeis, Riker, Pan, Wu, Yu and McLean2019a; Linnstaedt, Zannas, McLean, Koenen, & Ressler, Reference Linnstaedt, Zannas, McLean, Koenen and Ressler2019b; Powers et al., Reference Powers, Warren, Rosenfield, Roden-Foreman, Bennett, Reynolds and Smits2014; Rosellini, Dussaillant, Zubizarreta, Kessler, & Rose, Reference Rosellini, Dussaillant, Zubizarreta, Kessler and Rose2018; Schultebraucks et al., Reference Schultebraucks, Shalev, Michopoulos, Grudzen, Shin, Stevens and Galatzer-Levy2020; Shalev et al., Reference Shalev, Gevonden, Ratanatharathorn, Laska, van der Mei and Qi2019; Symes, Maddoux, McFarlane, & Pennings, Reference Symes, Maddoux, McFarlane and Pennings2016; Ziobrowski et al., Reference Ziobrowski, Kennedy, Ustun, House, Beaudoin, An and van Rooij2021). Identified characteristics have generally come from sociodemographic (Galatzer-Levy et al., Reference Galatzer-Levy, Karstoft, Statnikov and Shalev2014; Karstoft, Galatzer-Levy, Statnikov, Li, & Shalev, Reference Karstoft, Galatzer-Levy, Statnikov, Li and Shalev2015a; Kessler et al., Reference Kessler, Rose, Koenen, Karam, Stang, Stein and McLaughlin2014; Powers et al., Reference Powers, Warren, Rosenfield, Roden-Foreman, Bennett, Reynolds and Smits2014), prior trauma (Karstoft et al., Reference Karstoft, Statnikov, Andersen, Madsen and Galatzer-Levy2015b; Kessler et al., Reference Kessler, Rose, Koenen, Karam, Stang, Stein and McLaughlin2014; Symes et al., Reference Symes, Maddoux, McFarlane and Pennings2016), blood biomarker (Linnstaedt et al., Reference Linnstaedt, Rueckeis, Riker, Pan, Wu, Yu and McLean2019a, Reference Linnstaedt, Zannas, McLean, Koenen and Ressler2019b; Schultebraucks et al., Reference Schultebraucks, Shalev, Michopoulos, Grudzen, Shin, Stevens and Galatzer-Levy2020), and psychological or cognitive domains (Freedman et al., Reference Freedman, Brandes, Peri and Shalev1999; Galatzer-Levy et al., Reference Galatzer-Levy, Karstoft, Statnikov and Shalev2014; Karstoft et al., Reference Karstoft, Galatzer-Levy, Statnikov, Li and Shalev2015a, Reference Karstoft, Statnikov, Andersen, Madsen and Galatzer-Levy2015b; Kleim et al., Reference Kleim, Ehlers and Glucksman2007; Powers et al., Reference Powers, Warren, Rosenfield, Roden-Foreman, Bennett, Reynolds and Smits2014; Symes et al., Reference Symes, Maddoux, McFarlane and Pennings2016). Specifically, examples of previously identified predictors of PTSS include feelings of worthlessness, peritraumatic stress, nightmares, worrying, racing heart, blood cell counts, gender, and preexisting depression (Galatzer-Levy et al., Reference Galatzer-Levy, Karstoft, Statnikov and Shalev2014; Karstoft et al., Reference Karstoft, Statnikov, Andersen, Madsen and Galatzer-Levy2015b; Kessler et al., Reference Kessler, Rose, Koenen, Karam, Stang, Stein and McLaughlin2014; Powers et al., Reference Powers, Warren, Rosenfield, Roden-Foreman, Bennett, Reynolds and Smits2014; Schultebraucks et al., Reference Schultebraucks, Shalev, Michopoulos, Grudzen, Shin, Stevens and Galatzer-Levy2020; Ziobrowski et al., Reference Ziobrowski, Kennedy, Ustun, House, Beaudoin, An and van Rooij2021). The continued development and exposition of predictive factors and tools is important for several reasons. First, an array of tools is needed because the optimal tool may vary greatly depending on the timing related to trauma, trauma type, patient population, type of intervention (e.g. affecting optimal sensitivity/specificity trade-off), time/resources available to administer the tool, and types of screening questions that can be asked (e.g. even if highly predictive within a tool, childhood trauma history might be a viable assessment in a therapist's office but not an emergency or primary care waiting room). In addition, different datasets invariably contain information regarding different types of patient characteristics, and therefore the evaluation of the most influential predictors using a variety of large, high-quality datasets allows the continued surfacing of promising predictors and tools.

In the current study, we sought to contribute the continued development and exposition of predictive factors and tools for PTSS by identifying the optimal set of survey items that, at the time of emergency department (ED) evaluation after motor vehicle collision (MVC) trauma, predict substantial PTSS at 6 months. We utilized data from two longitudinal studies of MVC survivors [n = 776 Black Americans (Linnstaedt et al., Reference Linnstaedt, Hu, Liu, Soward, Bollen, Wang and Velilla2016) and n = 770 White Americans (Platts-Mills et al., Reference Platts-Mills, Ballina, Bortsov, Soward, Swor, Jones and Rathlev2011)] that were performed by a common research team, with nearly identical methods and high follow-up rates (Linnstaedt et al., Reference Linnstaedt, Hu, Liu, Soward, Bollen, Wang and Velilla2016; Platts-Mills et al., Reference Platts-Mills, Ballina, Bortsov, Soward, Swor, Jones and Rathlev2011). Available candidate predictors assessed in the ED included sociodemographic, trauma, reported pre-MVC health status, and peritraumatic pain and psychological symptom domains. The optimal set of predictors was derived using ensemble learning methods and validation was performed using data from a hold-out subsample of ED sites.

Methods

Cohorts

Data used in the current study were collected as part of two longitudinal cohort studies of MVC trauma survivors. These two studies enrolled individuals at one of 16 ED sites in the immediate aftermath of MVC and followed study participants over the course of 1 year. MVC trauma is one of the most common civilian traumatic stress exposures in industrialized nations, and similar to other forms of trauma, adverse posttraumatic neuropsychiatric sequelae are common (McLean et al., Reference McLean, Ressler, Koenen, Neylan, Germine, Jovanovic and Kessler2019). The first of the two studies enrolled only self-reporting White American individuals (June 2011 and June 2014) and the second study enrolled only self-reporting Black American individuals (between July 2012 and July 2015). These two racial groups were enrolled separately to avoid population stratification effects in each individual cohort. Both sister studies shared the common goal of understanding recovery v. development of adverse posttraumatic neuropsychiatric sequelae following trauma exposure. They have been described thoroughly previously (Linnstaedt et al., Reference Linnstaedt, Hu, Liu, Soward, Bollen, Wang and Velilla2016; Platts-Mills et al., Reference Platts-Mills, Ballina, Bortsov, Soward, Swor, Jones and Rathlev2011) and details are provided below. The studies were approved by Institutional Review Boards (IRBs) at all collaborating institutions and all participants provided written informed consent after receiving a complete description of the study. Trained research assistants at each ED site used web-based screeners and questionnaires to determine eligibility and perform assessments (described below).

Study design and population

Study design

We adopted a study design that leveraged the multiple study sites enrolling participants in the two studies. This study design is illustrated in Fig. 1 and is described in the ‘Site split study design’ section. In brief, enrollment sites were grouped into three geographic regions in each cohort [cohort 1: n = 361, 361, and 54, for geographic regions 1 (Michigan study sites), 2 (Northeastern US study sites), and 3 (Southeastern US study sites), respectively; cohort 2: n = 304, 152, and 314 for geographic regions 1, 2, and 3]. Participant data were then partitioned into training (70% of the data) and test sets (30% of data, data from different study sites than training data). The training dataset had equal numbers of participants from each cohort and equal representation from each geographical region while, to increase rigor, the test dataset included participant data that were shuffled randomly (i.e. potential non-equal race, sex, age, etc., representation). Two hundred permutations of these training and test sets were used to determine external validation metrics. Within the training set, feature selection was performed using 100 rounds of Monte Carlo cross-validation that identified the top 30 variables based on average rankings across internal validation metrics. The selection of 30 variables was determined via the one standard error rule (Chan, Pristach, Welte, & Russell, Reference Chan, Pristach, Welte and Russell1993). This rule indicated that the most parsimonious subset of variables with the least error (up to one standard error) is 30 variables (online Supplementary Fig. S2; mean error and standard error for 30 variables: 0.17 ± 0.003). These 30 variables were then used to assess AUC, accuracy, sensitivity, specificity, negative predictive values (NPVs), and positive predictive values (PPVs) in the hold out test dataset.

Fig. 1. Schematic of the study design employed in the current study to achieve rigorous training and test sets for machine learning algorithms. Participant data were derived from two longitudinal studies of motor vehicle collision trauma survivors. Enrollment occurred across 16 emergency department (ED) sites in the Eastern United States (gray dots top panel). Geographic locations of these ED enrollment sites were grouped into three broad areas as defined by blue numbers for cohort 1, the White America cohort (Platts-Mills et al., Reference Platts-Mills, Ballina, Bortsov, Soward, Swor, Jones and Rathlev2011) and orange numbers for cohort 2, the Black American cohort (Linnstaedt et al., Reference Linnstaedt, Hu, Liu, Soward, Bollen, Wang and Velilla2016). Participant data from each of these three geographic locations were then used to generate ‘site splits’ for training datasets (70% of the combined Black and White cohorts) and test datasets (30% of the combined cohort). As shown in the middle panel, training datasets were balanced across races and geographic locations. Within each training data site split, 100 rounds of Monte Carlo cross-validation were performed (represented by gray and green bars, bottom panel) to estimate variable selection probabilities and conduct feature selection. Using this methodology, average variable rankings were calculated, and the top variables were used for external validation within test datasets that were not constrained for race, sex, or geographic locations.

Motor vehicle collision study, cohort 1

The details of the first of our two MVC studies have been reported previously (Platts-Mills et al., Reference Platts-Mills, Ballina, Bortsov, Soward, Swor, Jones and Rathlev2011). In brief, individuals ⩾18 and ⩽65 years of age presenting to one of eight EDs in four no-fault insurance states (i.e. states that restrict ones right to seek compensation for pain or suffering that is associated with MVC: Michigan, Massachusetts, New York, and Florida) within 24 h of MVC and who did not have fracture other than finger or toe, other injury requiring hospital admission, were enrolled between June 2011 and June 2014. Additionally, to be enrolled, patients had to provide a telephone number for follow-up contact. Patients who were not alert and oriented per the treating clinician were excluded, as were pregnant patients, prisoners, patients unable to read and understand English, substantial soft tissue injury, passengers on a bus, or patients taking opioids above a total daily dose of 30 mg of oral morphine or equivalent. In addition, enrollment was limited to self-identifying non-Hispanic White Americans. Informed consent was obtained from all participants and IRB approval was obtained at all study sites.

Motor vehicle collision study, cohort 2

The details of the second of our two MVC studies have also been reported previously (Linnstaedt et al., Reference Linnstaedt, Hu, Liu, Soward, Bollen, Wang and Velilla2016). This prospective longitudinal study enrolled self-identifying Black American individuals ⩾18 and ⩽65 years of age who presented within 24 h of MVC to one of 11 EDs in six states/districts (Michigan, Pennsylvania, Florida, Alabama, Massachusetts, and Washington DC) between July 2012 and July 2015. In brief, individuals who did not have a fracture or other injury requiring hospital admission were screened for eligibility. Patients who were not alert and oriented were excluded, as were patients who did not self-identify as Black American, were pregnant, prisoners, unable to read and understand English, or taking opioids above a total daily dose of 30 mg of oral morphine or equivalent. Furthermore, only non-Hispanic Black Americans were enrolled in the study. The study was approved by the IRB of all participating hospitals. Each participant provided written informed consent before enrollment.

Assessments collected at the time of trauma exposure (i.e. potential predictors)

All variables included as potential predictors are presented in online Supplementary Table S1. Descriptions of these assessments are provided in online Supplementary Methods.

Data cleaning, imputation, and variable reduction

The two MVC cohort datasets were cleaned and imputed separately and then merged into a final dataset. Cleaning, variable reduction, and imputation steps were adapted from previously published protocols (Kuhn & Johnson, Reference Kuhn and Johnson2013; Stekhoven & Bühlmann, Reference Stekhoven and Bühlmann2012) and are summarized in online Supplementary Fig. S1. Briefly, we first removed variables with >10% missingness, and participants with >50% missing data. This resulted in a total of 966 variables in cohort 1 and 958 variables in cohort 2 (of these variables, >97% of them contained less than 5% missing data). We then used missForest, a random forest-based non-parametric method, to impute variables with missing values. Compared to other methods like MICE that individually fit data types, missForest can leverage all available data during imputation. Without imputation, many variables would be removed, thus harming data quality (e.g. there could be potential overconfidence in results and induced bias). Using these complete data, we then scaled continuous covariates [(0,1) range], removed variables with zero or low variance (i.e. those variables in which the fraction of unique values over the sample size was 10% and the ratio of the frequency of the most prevalent value to the frequency of the second most prevalent value was 19), and removed one of any pair of variables in high correlation with each other (i.e. |r| > 0.75) (e.g. number of alcoholic drinks consumed per week was correlated with alcohol consumed per day, therefore one of these variables was removed). Finally, variables not present in both cohorts (i.e. because a questionnaire was used in one cohort but not the other) were removed. A total of 160 variables remained and are provided in online Supplementary Table S1. All cleaning steps were performed using RStudio (version 4.0.0).

Site split study design

Instead of using our two datasets as separate discovery and validation datasets and given the nearly identical study design of the two studies and that they were comprised of self-identifying Blacks and Whites, respectively, we opted to combine them into one large final dataset and then hold out a subset of study sites as external validation sites (test data), with the rest of the sites used as our training data. This enabled us to include self-identified race as a candidate predictor, increasing generalizability of the study. Further, we opted to not evaluate using only one train-test (or hold out) split, but instead evaluate test performance over several train-test splits. In this way, we can better assess the generalizability of the models rather than rely on a single, potentially sensitive estimate.

To generate our train-test splits, we generated all possible combinations of study site splits that could fulfill a 70:30 split between internal training and external test data. Within the training data, we constrained the possible combinations of study sites by three metrics: ratio of self-identifying Black Americans to self-identifying White Americans was between 0.45 and 0.55, the ratio of women to men was between 0.45 and 0.55 and every training set had to have at least one study site from each major geographical location (defined as Michigan area, Northern east coast, and Southern east coast). These constraints resulted in 605 different combinations of possible training sets. Due to computational costs of running our machine learning pipelines on all 605 combinations, we randomly selected 200 of these splits on which to evaluate performance.

Machine learning methods

K-nearest neighbors

K-nearest neighbors is a non-parametric supervised learning method for classification. Given a new input to classify, it looks for the majority class among the k-nearest data points in the training set, and chooses that majority class as the output (Kramer, Reference Kramer and Kramer2013). It has been applied extensively in medical research (Ali, Neagu, & Trundle, Reference Ali, Neagu and Trundle2019; Gallego, Pertusa, & Calvo-Zaragoza, Reference Gallego, Pertusa and Calvo-Zaragoza2018; Li et al., Reference Li, Zhang, Zhang, Pang, Lam, Hui and Zhang2012; Shouman, Turner, & Stocker, Reference Shouman, Turner and Stocker2012; Xing & Bei, Reference Xing and Bei2020; Zhuang, Cai, Wang, Zhang, & Zheng, Reference Zhuang, Cai, Wang, Zhang and Zheng2020).

Regularized regression

Regularized regression is least squares regression with either an L1 penalty (lasso regression), L2 penalty (ridge regression), or a combination of both (Elastic Net). Using regression allows for interpretable models, making it popular in the biomedical community, while still allowing for robust performance via regularization (Austin, Pan, & Shen, Reference Austin, Pan and Shen2013; de Vlaming & Groenen, Reference de Vlaming and Groenen2015; Kessler et al., Reference Kessler, Hwang, Hoffmire, McCarthy, Petukhova, Rosellini and Bossarte2017; Lund et al., Reference Lund, Kuo, Brookhart, Meyer, Dalton, Kistler and Lewis2019; Marafino, Boscardin, & Dudley, Reference Marafino, Boscardin and Dudley2015; Odgers, Tellis, Hall, & Dumontier, Reference Odgers, Tellis, Hall and Dumontier2016; Parker et al., Reference Parker, Mullins, Cheang, Leung, Voduc, Vickery and Hu2009; Pavlou, Ambler, Seaman, De Iorio, & Omar, Reference Pavlou, Ambler, Seaman, De Iorio and Omar2016; Privé, Aschard, & Blum, Reference Privé, Aschard and Blum2019).

Random forest

Random forest is an ensemble non-parametric method that aggregates over several decision trees to make predictions. Random forests are popular for its ability to model complex and non-linear interactions of effects. Random forests have also proven successful in the biomedical community (Antoniadi, Galvin, Heverin, Hardiman, & Mooney, Reference Antoniadi, Galvin, Heverin, Hardiman and Mooney2021; Bayramli et al., Reference Bayramli, Castro, Barak-Corren, Madsen, Nock, Smoller and Reis2021; Chen & Ishwaran, Reference Chen and Ishwaran2012; Hu & Steingrimsson, Reference Hu and Steingrimsson2018; Kim, Yoo, Oh, & Kim, Reference Kim, Yoo, Oh and Kim2013; Wongvibulsin, Wu, & Zeger, Reference Wongvibulsin, Wu and Zeger2019).

Support vector machines

Support vector machine (SVM) for classification works by finding the optimal ‘separating hyperplane’ among the classes. Radial kernels were explored for our prediction task, but linear SVMs were the most performant. SVMs have also shown success in biomedical classification (Byun & Lee, Reference Byun and Lee2002; Georgoulas, Stylios, & Groumpos, Reference Georgoulas, Stylios and Groumpos2006; Kim et al., Reference Kim, Yoo, Oh and Kim2013; Mittag et al., Reference Mittag, Büchel, Saad, Jahn, Schulte, Bochdanovits and Sharma2012; Yokota, Endo, & Ohe, Reference Yokota, Endo and Ohe2017).

Neural networks

Neural networks are mathematical models inspired by the brain. Information is transmitted across the network by taking a linear combination of inputs (with weights, as in regression analysis) and applying non-linear functions. This design allows for powerful approximations (Bishop, Reference Bishop1995; Cybenko, Reference Cybenko1989). Here, we used a simplified version of published methods, i.e. a single-layer neural network within an ensemble, to guard against overfitting.

SuperLearner

SuperLearner is an ensemble method that finds the optimal weighting among methods of interest; the authors showed that asymptotically, it is as optimal as the best possible prediction algorithm tested (Gruber et al., Reference Gruber, Krakower, Menchaca, Hsu, Hawrusik, Maro and Klompas2020; Petersen et al., Reference Petersen, LeDell, Schwab, Sarovar, Gross, Reynolds and Bangsberg2015; Polley & Van Der Laan, Reference Polley and Van Der Laan2010; Torquati et al., Reference Torquati, Mendis, Xu, Myneni, Noyes, Hoffman and Becerra2022; Wyss et al., Reference Wyss, Schneeweiss, van der Laan, Lendle, Ju and Franklin2018). SuperLearner has been employed in a variety of biomedical applications.

Feature selection and assessment of model performance

For a given study site split (which specifies a train and test dataset), we built machine learning models to perform binary classification of PTSS. We compared the performance of regularized logistic regression (Brennstuhl, Tarquinio, & Montel, Reference Brennstuhl, Tarquinio and Montel2015; Maddoux, McFarlane, Symes, Fredland, & Feder, Reference Maddoux, McFarlane, Symes, Fredland and Feder2018), random forests (Nash, Ponto, Townsend, Nelson, & Bretz, Reference Nash, Ponto, Townsend, Nelson and Bretz2013), linear SVM (Defrin et al., Reference Defrin, Ginzburg, Solomon, Polad, Bloch, Govezensky and Schreiber2008), and SuperLearner (Creamer, Bell, & Failla, Reference Creamer, Bell and Failla2003) [where an ensemble of these methods, with k-nearest neighbors (Johansen, Wahl, Eilertsen, & Weisaeth, Reference Johansen, Wahl, Eilertsen and Weisaeth2007) and single-layer neural network (Zlomuzica, Preusser, Schneider, & Margraf, Reference Zlomuzica, Preusser, Schneider and Margraf2015), was used]. In our pipeline, we considered the number of top covariates to use in the model, k, as a hyperparameter to cross-validate on with model-specific hyperparameters, α. These parameters are selected in a nested CV-like approach using the training data. For a fixed k, α is selected by evaluating several α_i (from a grid), using Monte Carlo cross-validation. Then, the final (k, α) hyperparameter is selected using the one-standard error rule. To determine the top k covariates, we implemented stability selection from Shah and Samsworth (Kind & Buckingham, Reference Kind and Buckingham2018), which utilizes several rounds of Monte Carlo cross-validation in order to robustly estimate the probability of variable selection. In our procedure, if the support of covariate i was non-zero according to Lasso, then i was considered a signal variable. Once the relevant model hyperparameters were chosen from the training set, we trained the model then calculated the mean and standard error for a variety of performance metrics on the test set. The pipeline was repeated for each machine learning method considered.

Results

Participants

Participants included in the current study were only those participants who completed 6-month follow-up questionnaires assessing PTSS outcomes (i.e. n = 1546). These individuals comprise >85% of enrolled individuals (88% of enrolled individuals in cohort 1 and 83% of enrolled individuals in cohort 2). Baseline characteristics of participants are shown in Table 1, and a comparison of individuals included in the current study analyses v. those individuals who were lost to follow-up is shown in online Supplementary Table S2. In both cohort 1 and cohort 2, most individuals were female and in their mid-30s. Education levels were higher in cohort 1, with 39% (n = 303/776) having received college or post-college education. This contrasts with cohort 2, where only 18% (n = 140/770) had college or post-college education. Collision characteristics were similar between the two groups. BMI was slightly higher in cohort 2. Twenty-five percent (n = 394/1547) of all participants reported PTSS 6 months following MVC (15% of White Americans and 36% of Black Americans).

Table 1. Baseline characteristics of study participants from two longitudinal studies of motor vehicle collision trauma survivors (n = 1546)

s.d., standard deviation; HS, high school; BMI, body mass index.

Feature selection and internal validation

Variable importance, determined by calculating mean variable selection probability from 200 randomly selected internal site splits, was used to rank the top 30 predictors of substantial PTSS 6 months after MVC (Fig. 2). The most influential predictors of substantial PTSS included acute pain, psychological, and somatic symptoms, self-reported race, and cognitions and expectations regarding symptoms/recovery. As shown in Table 2, average AUCs for internal validation ranged from 0.83 ± 0.003 (random forest, SVM, SuperLearner) to 0.85 ± 0.002 (regularized regression). Additionally, the top 30 variables were included in a linear regression model to determine the direction of effect of each predictor. Regression coefficients from these linear regression models are presented in online Supplementary Table S3.

Fig. 2. The top 30 characteristics that predict 6-month posttraumatic stress symptom (PTSS) outcomes following motor vehicle collision (MVC) trauma exposure. These data were collected via patient self-report in the early peritraumatic period during emergency department (ED) assessment and enrollment into the two current longitudinal studies. Variables are listed in order of the most predictive (top, predictive probability = 0.78) to the least predictive (bottom, predictive probability = 0.34). For ease of interpretation, predictor characteristics were grouped into the broad category of pain (red), psychological symptoms (blue), sociodemographic characteristics (gray), details about the MVC event (black), and general health of the participant (white).

Table 2. Prediction of 6-month posttraumatic stress symptoms (PTSS) using demographic and questionnaire data collected in the emergency department following motor vehicle collision trauma (n = 1546)

Results presented are the average metrics calculated based on 200 stratified splits of the two cohorts into discovery and validation subsets (70% training and 30% test) based on enrollment study site (mean ± s.e.) as diagrammed in Fig. 1.

^a Linear support vector machines.

External validation and model performance

Following selection of the top 30 variables, we then used our hold out sample (30% of the full dataset) to assess performance via external validation procedures. We found that regularized regression methods showed the strongest performance, with an average AUC of 0.79 ± 0.002 (Table 2 and Fig. 3).

Fig. 3. ROC curves showing the mean (blue line) and standard error (gray lines) associated with 200 iterations of external validation in the current study. These data represent the most performative methodology, i.e. regularized regression, and indicates an AUC of 0.79 ± 0.002 for top variables predicting 6-month posttraumatic stress symptoms following motor vehicle collision trauma.

Discussion

Findings from this study add to a growing body of literature (Freedman et al., Reference Freedman, Brandes, Peri and Shalev1999; Galatzer-Levy et al., Reference Galatzer-Levy, Karstoft, Statnikov and Shalev2014; Karstoft et al., Reference Karstoft, Galatzer-Levy, Statnikov, Li and Shalev2015a, Reference Karstoft, Statnikov, Andersen, Madsen and Galatzer-Levy2015b; Kessler et al., Reference Kessler, Rose, Koenen, Karam, Stang, Stein and McLaughlin2014; Kleim et al., Reference Kleim, Ehlers and Glucksman2007; Linnstaedt et al., Reference Linnstaedt, Rueckeis, Riker, Pan, Wu, Yu and McLean2019a, Reference Linnstaedt, Zannas, McLean, Koenen and Ressler2019b; Powers et al., Reference Powers, Warren, Rosenfield, Roden-Foreman, Bennett, Reynolds and Smits2014; Rosellini et al., Reference Rosellini, Dussaillant, Zubizarreta, Kessler and Rose2018; Schultebraucks et al., Reference Schultebraucks, Shalev, Michopoulos, Grudzen, Shin, Stevens and Galatzer-Levy2020; Shalev et al., Reference Shalev, Gevonden, Ratanatharathorn, Laska, van der Mei and Qi2019; Symes et al., Reference Symes, Maddoux, McFarlane and Pennings2016; Ziobrowski et al., Reference Ziobrowski, Kennedy, Ustun, House, Beaudoin, An and van Rooij2021) indicating that characteristics obtainable in the early aftermath of trauma exposure identify vulnerability to substantial persistent PTSS. The 30 characteristics identified in these datasets together showed good internal (AUC = 0.85 ± 0.002) and external validated prediction accuracy (AUC = 0.79 ± 0.002) for substantial PTSS at 6-month follow-up. Influential predictive domains in these datasets included peritraumatic pain and psychological symptoms, expectations of recovery and cognitions about pain, self-identified race, neighborhood socioeconomic status (Area Deprivation Index), and other sociodemographic characteristics.

The individual predictive characteristics identified in this study provide new insights for potential highly influential predictive factors for the development of substantial chronic PTSS. First, it remains poorly appreciated that peritraumatic pain and somatic symptoms are highly predictive for PTSS. The fact that few studies have evaluated such factors for inclusion in predictive tools for PTSS is consistent with the traditionally siloed approach to the study of adverse posttraumatic neuropsychiatric sequelae such as PTSS, pain, somatic symptoms, and depression, despite the fact that these outcomes are highly co-morbid and that vulnerability to these disorders is shared (Feinberg et al., Reference Feinberg, Hu, Weaver, Fillingim, Swor, Peak and McLean2017; McLean, Clauw, Abelson, & Liberzon, Reference McLean, Clauw, Abelson and Liberzon2005; McLean et al., Reference McLean, Ressler, Koenen, Neylan, Germine, Jovanovic and Kessler2019; Short et al., Reference Short, Tungate, Bollen, Sullivan, D'Anza, Lechner and McLean2022). This literature, and our study findings, supports the inclusion of peritraumatic pain and somatic symptoms in future studies interested in identifying/validating characteristics that individually or collectively best predict substantial chronic PTSS. In addition, this finding suggests the potential value of acute pain treatment to reduce the development of substantial PTSS, which has been identified in several studies (Holbrook, Galarneau, Dye, Quinn, & Dougherty, Reference Holbrook, Galarneau, Dye, Quinn and Dougherty2010; Saxe et al., Reference Saxe, Stoddard, Courtney, Cunningham, Chawla, Sheridan and King2001).

Interestingly, an individual's expectations of recovery – expected time to recover fully, and to recovery physically – were also among the most powerful predictive factors. This finding has several implications. First, expectations of recovery are simple to assess and should be considered when trying to develop predictive tools for use in acute post-traumatic settings. Second, while expectations of time to physical recovery are no doubt influenced by individual circumstance (e.g. age 80 v. 18), self-efficacy (i.e. belief in one's capacity to implement behaviors necessary to attain an outcome) is an important driver of recovery expectations and is associated with more rapid fear extinction (Zlomuzica et al., Reference Zlomuzica, Preusser, Schneider and Margraf2015). Secondary preventive cognitive-behavioral interventions targeting self-efficacy (Nash et al., Reference Nash, Ponto, Townsend, Nelson and Bretz2013) could improve outcomes for at-risk individuals.

To our knowledge, this study is the first to identify neighborhood socioeconomic status (SES) as a leading peritraumatic predictor of substantial PTSS symptoms. These findings are consistent with increasing appreciation that neighborhood SES has wide-ranging effects on health [e.g. via influences on stress system function (Do et al., Reference Do, Diez Roux, Hajat, Auchincloss, Merkin, Ranjit and Seeman2011; Karb, Elliott, Dowd, & Morenoff, Reference Karb, Elliott, Dowd and Morenoff2012), diet (Shahar, Shai, Vardi, Shahar, & Fraser, Reference Shahar, Shai, Vardi, Shahar and Fraser2005), and educational and employment opportunities (Saifi & Mehmood, Reference Saifi and Mehmood2011; Vergunst et al., Reference Vergunst, Tremblay, Nagin, Algan, Beasley, Park and Côté2019)]. Because of its protean influences on health status and barriers to health improvement, neighborhood SES has been proposed as valuable to include in the medical record (Adler & Stead, Reference Adler and Stead2015). Such inclusion would facilitate the examination of neighborhood SES as a predictor of adverse health outcomes and as a potential use in bedside clinical decision tools.

In contrast to neighborhood SES, Black v. White self-identified race has been identified as an important peritraumatic predictor of substantial PTSS in previous studies (Alegría et al., Reference Alegría, Fortuna, Lin, Norris, Gao, Takeuchi and Valentine2013). That this construct is a top predictor of PTSS underscores the need to include diverse racial and ethnic groups in future longitudinal studies assessing predictors of PTSS. It also highlights the need to identify racial/ethnic specific predictors [e.g. those related to discrimination (Brooks Holliday et al., Reference Brooks Holliday, Dubowitz, Haas, Ghosh-Dastidar, DeSantis and Troxel2020) and identity with one's race (Khaylis, Waelde, & Bruce, Reference Khaylis, Waelde and Bruce2007)], as they might contribute substantial predictive power for identifying adverse outcomes of trauma in specific racial groups.

Strengths of this study include the inclusion of self-identified Black and White women and men, focus on a single homogeneous type of trauma exposure, identical study design between the two studies from which participant data were derived, high follow-up rates, and a diverse set of variables included as potential predictors in our models. Several limitations should also be noted when interpreting study results. First, self-identifying racial groups besides self-identifying Black and White Americans were not included. Therefore, the generalizability of our findings to other self-identifying racial groups is currently unknown. Second, despite the inclusion of a diverse set of predictors into the pool of candidate predictors, certain characteristics that have been shown to predict PTSS previously, such as previous trauma exposure (Adams et al., Reference Adams, Sumner, Danielson, McCauley, Resnick, Grös and Ruggiero2014; Ehring, Razik, & Emmelkamp, Reference Ehring, Razik and Emmelkamp2011; Karstoft et al., Reference Karstoft, Statnikov, Andersen, Madsen and Galatzer-Levy2015b; Kessler et al., Reference Kessler, Rose, Koenen, Karam, Stang, Stein and McLaughlin2014), were not included. This is because these data were not collected from self-identifying White participants (those individuals in cohort 1). Third, despite high follow-up rates in both studies (though lower in the Black American cohort), bias could have been introduced via statistically significant differences in sex, age, and education of those who followed up v. those who were lost to follow-up. Fourth, while NPVs were high in all learning methods assessed, PPVs were low. This discrepancy could be due to the low prevalence (25%) of PTSS in this cohort, as low outcome prevalence often favors NPV (Steinberg, Fine, & Chappell, Reference Steinberg, Fine and Chappell2009). While low PPVs are not ideal, in the case of predicting PTSS, one could argue that testing positive while truly negative is less detrimental to treatment decisions than testing negative when truly positive.

Future studies should continue to refine optimal, parsimonious sets of PTSS predictors, leveraging data from studies performed to date. Optimal sets of PTSS predictors will very likely differ based on trauma type, assessment timing in relation to trauma exposure, setting, and/or patient population, and may include biological characteristics and/or utilizing tiering/targeting methods (e.g. utilize additional information only in individuals who cannot be risk stratified using a briefer set of predictors). Ultimately, the goal of this prediction work is to identify individual and collective predictors of PTSS and other adverse posttraumatic neuropsychiatric sequelae, both to gain understanding of potential risk factors and to aide in the development of decision support tools. Such tools will differ according to the factors influencing optimal predictors (e.g. trauma type, time from trauma, setting), and the optimal cut-off for such tools will also differ depending on the risks/benefits of the specific clinical decision the tool is intended to aide (e.g. risks/benefits of a specific secondary preventive intervention).

In conclusion, we identified promising individual predictors and a set of characteristics that effectively stratify individuals for risk of substantial PTSS 6 months following MVC. If further validated, these predictors could improve clinical efforts to identify vulnerable individuals at the time of ED presentation for secondary preventative interventions.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S003329172200191X

Acknowledgements

We would like to thank the study participants for taking part in these studies.

Financial support

Research reported in this publication was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) of the National Institutes of Health (NIH) under Award Number R01AR060852 (McLean), R01AR056328 (McLean), K01AR071504 (Linnstaedt), by the Rita Allen Foundation (Linnstaedt), and by the National Institute of Neurological Disorders and Stroke (NINDS) of the NIH under Award Number R01NS118563 (Linnstaedt and McLean). The content is solely the responsibility of the authors and does not necessarily represent the views of these funding agencies.

Conflict of interest

None.

Footnotes

These authors contributed equally to this work.

†

Co-senior.

References

Adams, Z. W., Sumner, J. A., Danielson, C. K., McCauley, J. L., Resnick, H. S., Grös, K., … Ruggiero, K. J. (2014). Prevalence and predictors of PTSD and depression among adolescent victims of the Spring 2011 tornado outbreak. Journal of Child Psychology and Psychiatry, 55(9), 1047–1055. doi: 10.1111/jcpp.12220CrossRef Google Scholar PubMed

Adler, N. E., & Stead, W. W. (2015). Patients in context – EHR capture of social and behavioral determinants of health. The New England Journal of Medicine, 372(8), 698–701. doi: 10.1056/NEJMp1413945CrossRef Google Scholar PubMed

Alegría, M., Fortuna, L. R., Lin, J. Y., Norris, F. H., Gao, S., Takeuchi, D. T., … Valentine, A. (2013). Prevalence, risk, and correlates of posttraumatic stress disorder across ethnic and racial minority groups in the United States. Medical Care, 51(12), 1114–1123. doi: 10.1097/mlr.0000000000000007CrossRef Google Scholar PubMed

Ali, N., Neagu, D., & Trundle, P. (2019). Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets. Springer Nature Applied Sciences, 1(12), 1559. doi: 10.1007/s42452-019-1356-9Google Scholar

Antoniadi, A. M., Galvin, M., Heverin, M., Hardiman, O., & Mooney, C. (2021). Prediction of caregiver quality of life in amyotrophic lateral sclerosis using explainable machine learning. Scientific Reports, 11(1), 12237. doi: 10.1038/s41598-021-91632-2CrossRef Google Scholar PubMed

Austin, E., Pan, W., & Shen, X. (2013). Penalized regression and risk prediction in genome-wide association studies. Statistical Analysis and Data Mining, 6(4), 315–328. doi: 10.1002/sam.11183CrossRef Google Scholar PubMed

Bayramli, I., Castro, V., Barak-Corren, Y., Madsen, E. M., Nock, M. K., Smoller, J. W., & Reis, B. Y. (2021). Temporally informed random forests for suicide risk prediction. Journal of the American Medical Informatics Association, 29(1), 62–71. doi: 10.1093/jamia/ocab225CrossRef Google Scholar PubMed

Bishop, C. M. (1995). Neural networks for pattern recognition. New York, NY: Oxford University Press, Inc.Google Scholar

Bleich, A., & Solomon, Z. (2004). Evaluation of psychiatric disability in PTSD of military origin. The Israel Journal of Psychiatry and Related Sciences, 41(4), 268–276.Google Scholar PubMed

Brennstuhl, M. J., Tarquinio, C., & Montel, S. (2015). Chronic pain and PTSD: Evolving views on their comorbidity. Perspectives in Psychiatric Care, 51(4), 295–304. doi: 10.1111/ppc.12093CrossRef Google Scholar PubMed

Brooks Holliday, S., Dubowitz, T., Haas, A., Ghosh-Dastidar, B., DeSantis, A., & Troxel, W. M. (2020). The association between discrimination and PTSD in African Americans: Exploring the role of gender. Ethnicity & Health, 25(5), 717–731. doi: 10.1080/13557858.2018.1444150CrossRef Google Scholar PubMed

Byun, H., & Lee, S.-W. (2002). Applications of support vector machines for pattern recognition: a survey. Paper presented at the Pattern Recognition with Support Vector Machines, Berlin, Heidelberg.CrossRef Google Scholar

Chan, A. W., Pristach, E. A., Welte, J. W., & Russell, M. (1993). Use of the TWEAK test in screening for alcoholism/heavy drinking in three populations. Alcoholism, Clinical and Experimental Research, 17(6), 1188–1192. doi: 10.1111/j.1530-0277.1993.tb05226.xCrossRef Google Scholar PubMed

Chen, X., & Ishwaran, H. (2012). Random forests for genomic data analysis. Genomics, 99(6), 323–329. doi: 10.1016/j.ygeno.2012.04.003CrossRef Google Scholar PubMed

Creamer, M., Bell, R., & Failla, S. (2003). Psychometric properties of the impact of event scale – revised. Behaviour Research and Therapy, 41(12), 1489–1496. doi: 10.1016/j.brat.2003.07.010CrossRef Google Scholar PubMed

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303–314. doi: 10.1007/BF02551274CrossRef Google Scholar

Defrin, R., Ginzburg, K., Solomon, Z., Polad, E., Bloch, M., Govezensky, M., & Schreiber, S. (2008). Quantitative testing of pain perception in subjects with PTSD – implications for the mechanism of the coexistence between PTSD and chronic pain. Pain, 138(2), 450–459. doi: 10.1016/j.pain.2008.05.006CrossRef Google Scholar PubMed

de Vlaming, R., & Groenen, P. J. (2015). The current and future use of ridge regression for prediction in quantitative genetics. BioMed Research International, 2015, 143712. doi: 10.1155/2015/143712CrossRef Google Scholar PubMed

Do, D. P., Diez Roux, A. V., Hajat, A., Auchincloss, A. H., Merkin, S. S., Ranjit, N., … Seeman, T. (2011). Circadian rhythm of cortisol and neighborhood characteristics in a population-based sample: The multi-ethnic study of atherosclerosis. Health & Place, 17(2), 625–632. doi: 10.1016/j.healthplace.2010.12.019CrossRef Google Scholar

Dobie, D. J., Kivlahan, D. R., Maynard, C., Bush, K. R., Davis, T. M., & Bradley, K. A. (2004). Posttraumatic stress disorder in female veterans: Association with self-reported health problems and functional impairment. Archives of Internal Medicine, 164(4), 394–400. doi: 10.1001/archinte.164.4.394CrossRef Google Scholar PubMed

Eastel, J. M., Lam, K. W., Lee, N. L., Lok, W. Y., Tsang, A. H. F., Pei, X. M., … Wong, S. C. C. (2019). Application of NanoString technologies in companion diagnostic development. Expert Review of Molecular Diagnostics, 19(7), 591–598. doi: 10.1080/14737159.2019.1623672CrossRef Google Scholar PubMed

Ehring, T., Razik, S., & Emmelkamp, P. M. (2011). Prevalence and predictors of posttraumatic stress disorder, anxiety, depression, and burnout in Pakistani earthquake recovery workers. Psychiatry Research, 185(1–2), 161–166. doi: 10.1016/j.psychres.2009.10.018CrossRef Google Scholar PubMed

Feinberg, R. K., Hu, J., Weaver, M. A., Fillingim, R. B., Swor, R. A., Peak, D. A., … McLean, S. A. (2017). Stress-related psychological symptoms contribute to axial pain persistence after motor vehicle collision: Path analysis results from a prospective longitudinal study. Pain, 158(4), 682–690. doi: 10.1097/j.pain.0000000000000818CrossRef Google Scholar PubMed

Freedman, S. A., Brandes, D., Peri, T., & Shalev, A. Y. (1999). Predictors of chronic post-traumatic stress disorder. A prospective study. The British Journal of Psychiatry, 174(4), 353–359. doi: 10.1192/bjp.174.4.353CrossRef Google Scholar PubMed

Fritz, J. M., Magel, J. S., McFadden, M., Asche, C., Thackeray, A., Meier, W., & Brennan, G. (2015). Early physical therapy vs usual care in patients with recent-onset low back pain: A randomized clinical trial. JAMA, 314(14), 1459–1467. doi: 10.1001/jama.2015.11648CrossRef Google Scholar PubMed

Galatzer-Levy, I. R., Karstoft, K.-I., Statnikov, A., & Shalev, A. Y. (2014). Quantitative forecasting of PTSD from early trauma responses: A machine learning application. Journal of Psychiatric Research, 59, 68–76. doi: 10.1016/j.jpsychires.2014.08.017CrossRef Google Scholar PubMed

Gallego, A.-J., Pertusa, A., & Calvo-Zaragoza, J. (2018). Improving convolutional neural networks’ accuracy in noisy environments using k-nearest neighbors. Applied Sciences, 8(11), 2086. Retrieved from https://www.mdpi.com/2076-3417/8/11/2086.CrossRef Google Scholar

Gaskin, D. J., & Richard, P. (2012). The economic costs of pain in the United States. The Journal of Pain, 13(8), 715–724. doi: 10.1016/j.jpain.2012.03.009CrossRef Google Scholar PubMed

Georgoulas, G., Stylios, C. D., & Groumpos, P. P. (2006). Predicting the risk of metabolic acidosis for newborns based on fetal heart rate signal classification using support vector machines. IEEE Transactions on Biomedical Engineering, 53(5), 875–884. doi: 10.1109/TBME.2006.872814CrossRef Google Scholar PubMed

Gruber, S., Krakower, D., Menchaca, J. T., Hsu, K., Hawrusik, R., Maro, J. C., … Klompas, M. (2020). Using electronic health records to identify candidates for human immunodeficiency virus pre-exposure prophylaxis: An application of super learning to risk prediction when the outcome is rare. Statistics in Medicine, 39(23), 3059–3073. doi: 10.1002/sim.8591CrossRef Google Scholar PubMed

Haskell, S. G., Gordon, K. S., Mattocks, K., Duggal, M., Erdos, J., Justice, A., & Brandt, C. A. (2010). Gender differences in rates of depression, PTSD, pain, obesity, and military sexual trauma among Connecticut War Veterans of Iraq and Afghanistan. Journal of Women's Health, 19(2), 267–271. doi: 10.1089/jwh.2008.1262CrossRef Google Scholar PubMed

Holbrook, T. L., Galarneau, M. R., Dye, J. L., Quinn, K., & Dougherty, A. L. (2010). Morphine use after combat injury in Iraq and post-traumatic stress disorder. The New England Journal of Medicine, 362(2), 110–117. doi: 10.1056/NEJMoa0903326CrossRef Google Scholar PubMed

Hu, C., & Steingrimsson, J. A. (2018). Personalized risk prediction in clinical oncology research: Applications and practical issues using survival trees and random forests. Journal of Biopharmaceutical Statistics, 28(2), 333–349. doi: 10.1080/10543406.2017.1377730CrossRef Google Scholar PubMed

Johansen, V. A., Wahl, A. K., Eilertsen, D. E., & Weisaeth, L. (2007). Prevalence and predictors of post-traumatic stress disorder (PTSD) in physically injured victims of non-domestic violence. A longitudinal study. Social Psychiatry and Psychiatric Epidemiology, 42(7), 583–593. doi: 10.1007/s00127-007-0205-0CrossRef Google Scholar PubMed

Karb, R. A., Elliott, M. R., Dowd, J. B., & Morenoff, J. D. (2012). Neighborhood-level stressors, social support, and diurnal patterns of cortisol: The Chicago Community Adult Health Study. Social Science & Medicine, 75(6), 1038–1047. doi: 10.1016/j.socscimed.2012.03.031CrossRef Google Scholar PubMed

Karstoft, K.-I., Galatzer-Levy, I. R., Statnikov, A., Li, Z., Shalev, A. Y., & For Members of the Jerusalem Trauma Outreach and Prevention Study. (2015a). Bridging a translational gap: Using machine learning to improve the prediction of PTSD. BMC Psychiatry, 15(1), 30. doi: 10.1186/s12888-015-0399-8CrossRef Google Scholar PubMed

Karstoft, K.-I., Statnikov, A., Andersen, S. B., Madsen, T., & Galatzer-Levy, I. R. (2015b). Early identification of posttraumatic stress following military deployment: Application of machine learning methods to a prospective study of Danish soldiers. Journal of Affective Disorders, 184, 170–175. doi: 10.1016/j.jad.2015.05.057CrossRef Google Scholar PubMed

Kearns, M. C., Ressler, K. J., Zatzick, D., & Rothbaum, B. O. (2012). Early interventions for PTSD: A review. Depression and Anxiety, 29(10), 833–842. doi: 10.1002/da.21997CrossRef Google Scholar PubMed

Kessler, R. C. (2000). Posttraumatic stress disorder: The burden to the individual and to society. The Journal of Clinical Psychiatry, 61 (Suppl 5), 4–12; discussion 13–14.Google Scholar

Kessler, R. C., Hwang, I., Hoffmire, C. A., McCarthy, J. F., Petukhova, M. V., Rosellini, A. J., … Bossarte, R. M. (2017). Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans health Administration. International Journal of Methods in Psychiatric Research, 26(3). doi: 10.1002/mpr.1575CrossRef Google Scholar PubMed

Kessler, R. C., Rose, S., Koenen, K. C., Karam, E. G., Stang, P. E., Stein, D. J., … McLaughlin, K. A. (2014). How well can post-traumatic stress disorder be predicted from pre-trauma risk factors? An exploratory study in the WHO World Mental Health Surveys. World Psychiatry, 13(3), 265–274. doi: 10.1002/wps.20150CrossRef Google Scholar PubMed

Khaylis, A., Waelde, L., & Bruce, E. (2007). The role of ethnic identity in the relationship of race-related stress to PTSD symptoms among young adults. Journal of Trauma & Dissociation, 8(4), 91–105. doi: 10.1300/J229v08n04_06CrossRef Google Scholar PubMed

Kilpatrick, D. G., Resnick, H. S., Milanak, M. E., Miller, M. W., Keyes, K. M., & Friedman, M. J. (2013). National estimates of exposure to traumatic events and PTSD prevalence using DSM-IV and DSM-5 criteria. Journal of Traumatic Stress, 26(5), 537–547. doi: 10.1002/jts.21848CrossRef Google Scholar PubMed

Kim, S. K., Yoo, T. K., Oh, E., & Kim, D. W. (2013). Osteoporosis risk prediction using machine learning and conventional methods. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2013, 188–191. doi: 10.1109/EMBC.2013.6609469CrossRef Google Scholar

Kind, A. J., & Buckingham, W. R. (2018). Making neighborhood-disadvantage metrics accessible – The neighborhood atlas. The New England Journal of Medicine, 378(26), 2456–2458. doi: 10.1056/NEJMp1802313CrossRef Google Scholar PubMed

Kleim, B., Ehlers, A., & Glucksman, E. (2007). Early predictors of chronic post-traumatic stress disorder in assault survivors. Psychological Medicine, 37(10), 1457–1467. doi: 10.1017/S0033291707001006CrossRef Google Scholar PubMed

Kramer, O. (2013). K-nearest neighbors. In Kramer, O. (Ed.), Dimensionality reduction with unsupervised nearest neighbors (pp. 13–23). Berlin, Heidelberg: Springer Berlin Heidelberg.CrossRef Google Scholar

Kuhn, M., & Johnson, K. (2013). Data pre-processing. In Applied predictive modeling (pp. 27–59). New York, NY: Springer.CrossRef Google Scholar

Lew, H. L., Otis, J. D., Tun, C., Kerns, R. D., Clark, M. E., & Cifu, D. X. (2009). Prevalence of chronic pain, posttraumatic stress disorder, and persistent postconcussive symptoms in OIF/OEF veterans: Polytrauma clinical triad. Journal of Rehabilitation Research & Development, 46(6), 697–702. doi: 10.1682/jrrd.2009.01.0006CrossRef Google Scholar PubMed

Li, C., Zhang, S., Zhang, H., Pang, L., Lam, K., Hui, C., & Zhang, S. (2012). Using the K-nearest neighbor algorithm for the classification of lymph node metastasis in gastric cancer. Computational and Mathematical Methods in Medicine, 2012, 876545. doi: 10.1155/2012/876545CrossRef Google Scholar PubMed

Linnstaedt, S. D., Hu, J., Liu, A. Y., Soward, A. C., Bollen, K. A., Wang, H. E., … Velilla, M.-A. (2016). Methodology of AA CRASH: A prospective observational study evaluating the incidence and pathogenesis of adverse post-traumatic sequelae in African-Americans experiencing motor vehicle collision. BMJ Open, 6(9), e012222. doi: 10.1136/bmjopen-2016-012222CrossRef Google Scholar PubMed

Linnstaedt, S. D., Rueckeis, C. A., Riker, K. D., Pan, Y., Wu, A., Yu, S., … McLean, S. A. (2019a). microRNA-19b predicts widespread pain and posttraumatic stress symptom risk in a sex-dependent manner following trauma exposure. Pain, 161(1), 47–60. doi: 10.1097/j.pain.0000000000001709CrossRef Google Scholar

Linnstaedt, S. D., Zannas, A. S., McLean, S. A., Koenen, K. C., & Ressler, K. J. (2019b). Literature review and methodological considerations for understanding circulating risk biomarkers following trauma exposure. Molecular Psychiatry, 25(9), 1986–1999. doi: 10.1038/s41380-019-0636-5CrossRef Google Scholar PubMed

Litz, B. T., Gray, M. J., Bryant, R. A., & Adler, A. B. (2002). Early intervention for trauma: Current status and future directions. Clinical Psychology: Science and Practice, 9(2), 112–134. doi: 10.1093/clipsy.9.2.112Google Scholar

Lund, J. L., Kuo, T. M., Brookhart, M. A., Meyer, A. M., Dalton, A. F., Kistler, C. E., … Lewis, C. L. (2019). Development and validation of a 5-year mortality prediction model using regularized regression and Medicare data. Pharmacoepidemiology and Drug Safety, 28(5), 584–592. doi: 10.1002/pds.4769CrossRef Google Scholar PubMed

Maddoux, J., McFarlane, J., Symes, L., Fredland, N., & Feder, G. (2018). Using baseline data to predict chronic PTSD 48-months after mothers report intimate partner violence: Outcomes for mothers and the intergenerational impact on child behavioral functioning. Archives of Psychiatric Nursing, 32(3), 475–482. doi: 10.1016/j.apnu.2018.02.001CrossRef Google Scholar PubMed

Marafino, B. J., Boscardin, W. J., & Dudley, R. A. (2015). Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes. Journal of Biomedical Informatics, 54, 114–120. doi: 10.1016/j.jbi.2015.02.003CrossRef Google Scholar PubMed

McLean, S. A., Clauw, D. J., Abelson, J. L., & Liberzon, I. (2005). The development of persistent pain and psychological morbidity after motor vehicle collision: Integrating the potential role of stress response systems into a biopsychosocial model. Psychosomatic Medicine, 67(5), 783–790. doi: 10.1097/01.psy.0000181276.49204.bbCrossRef Google Scholar PubMed

McLean, S. A., Ressler, K., Koenen, K. C., Neylan, T., Germine, L., Jovanovic, T., … Kessler, R. (2019). The AURORA study: A longitudinal, multimodal library of brain biology and function after traumatic stress exposure. Molecular Psychiatry, 25(2), 283–296. doi: 10.1038/s41380-019-0581-3CrossRef Google Scholar PubMed

McNally, R. J., & Frueh, B. C. (2013). Why are Iraq and Afghanistan War veterans seeking PTSD disability compensation at unprecedented rates? Journal of Anxiety Disorders, 27(5), 520–526. doi: 10.1016/j.janxdis.2013.07.002CrossRef Google Scholar PubMed

Mittag, F., Büchel, F., Saad, M., Jahn, A., Schulte, C., Bochdanovits, Z., … Sharma, M. (2012). Use of support vector machines for disease risk prediction in genome-wide association studies: Concerns and opportunities. Human Mutation, 33(12), 1708–1718. doi: 10.1002/humu.22161CrossRef Google Scholar PubMed

Nash, V. R., Ponto, J., Townsend, C., Nelson, P., & Bretz, M. N. (2013). Cognitive behavioral therapy, self-efficacy, and depression in persons with chronic pain. Pain Management Nursing, 14(4), e236–e243. doi: 10.1016/j.pmn.2012.02.006CrossRef Google Scholar PubMed

Odgers, D. J., Tellis, N., Hall, H., & Dumontier, M. (2016). Using LASSO Regression to predict rheumatoid arthritis treatment efficacy. AMIA Joint Summits on Translational Science Proceedings, 2016, 176–183. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/27570666 Google Scholar PubMed

Outcalt, S. D., Kroenke, K., Krebs, E. E., Chumbler, N. R., Wu, J., Yu, Z., & Bair, M. J. (2015). Chronic pain and comorbid mental health conditions: Independent associations of posttraumatic stress disorder and depression with pain, disability, and quality of life. Journal of Behavioral Medicine, 38(3), 535–543. doi: 10.1007/s10865-015-9628-3CrossRef Google Scholar PubMed

Parker, J. S., Mullins, M., Cheang, M. C., Leung, S., Voduc, D., Vickery, T., … Hu, Z. (2009). Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology, 27(8), 1160–1167. doi: 10.1200/JCO.2008.18.1370CrossRef Google Scholar PubMed

Pavlou, M., Ambler, G., Seaman, S., De Iorio, M., & Omar, R. Z. (2016). Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Statistics in Medicine, 35(7), 1159–1177. doi: 10.1002/sim.6782CrossRef Google Scholar PubMed

Petersen, M. L., LeDell, E., Schwab, J., Sarovar, V., Gross, R., Reynolds, N., … Bangsberg, D. R. (2015). Super learner analysis of electronic adherence data improves viral prediction and may provide strategies for selective HIV RNA monitoring. Journal of Acquired Immune Deficiency Syndrome, 69(1), 109–118. doi: 10.1097/QAI.0000000000000548CrossRef Google Scholar PubMed

Platts-Mills, T. F., Ballina, L., Bortsov, A. V., Soward, A., Swor, R. A., Jones, J. S., … Rathlev, N. K. (2011). Using emergency department-based inception cohorts to determine genetic characteristics associated with long term patient outcomes after motor vehicle collision: Methodology of the CRASH study. BMC Emergency Medicine, 11(1), 14. doi: 10.1186/1471-227X-11-14CrossRef Google Scholar PubMed

Polley, E. C., & Van Der Laan, M. J. (2010). Super Learner in prediction. U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 266. Retrieved from https://biostats.bepress.com/ucbbiostat/paper266.Google Scholar

Powers, M. B., Warren, A. M., Rosenfield, D., Roden-Foreman, K., Bennett, M., Reynolds, M. C., … Smits, J. A. (2014). Predictors of PTSD symptoms in adults admitted to a Level I trauma center: A prospective analysis. Journal of Anxiety Disorders, 28(3), 301–309. doi: 10.1016/j.janxdis.2014.01.003CrossRef Google Scholar

Privé, F., Aschard, H., & Blum, M. G. B. (2019). Efficient implementation of penalized regression for genetic risk prediction. Genetics, 212(1), 65–74. doi: 10.1534/genetics.119.302019CrossRef Google Scholar PubMed

Rosellini, A. J., Dussaillant, F., Zubizarreta, J. R., Kessler, R. C., & Rose, S. (2018). Predicting posttraumatic stress disorder following a natural disaster. Journal of Psychiatric Research, 96, 15–22. doi: 10.1016/j.jpsychires.2017.09.010CrossRef Google Scholar PubMed

Saifi, S., & Mehmood, T. (2011). Effects of socioeconomic status on student's achievement. International Journal of Social Sciences and Education, 1(2), 119–128.Google Scholar

Saxe, G., Stoddard, F., Courtney, D., Cunningham, K., Chawla, N., Sheridan, R., … King, L. (2001). Relationship between acute morphine and the course of PTSD in children with burns. Journal of the American Academy of Child & Adolescent Psychiatry, 40(8), 915–921. doi: 10.1097/00004583-200108000-00013CrossRef Google Scholar PubMed

Schultebraucks, K., Shalev, A. Y., Michopoulos, V., Grudzen, C. R., Shin, S.-M., Stevens, J. S., … Galatzer-Levy, I. R. (2020). A validated predictive algorithm of post-traumatic stress course following emergency department admission after a traumatic stressor. Nature Medicine, 26(7), 1084–1088. doi: 10.1038/s41591-020-0951-zCrossRef Google Scholar PubMed

Shahar, D., Shai, I., Vardi, H., Shahar, A., & Fraser, D. (2005). Diet and eating habits in high and low socioeconomic groups. Nutrition, 21(5), 559–566. doi: 10.1016/j.nut.2004.09.018CrossRef Google Scholar PubMed

Shalev, A. Y., Ankri, Y., Gilad, M., Israeli-Shalev, Y., Adessky, R., Qian, M., & Freedman, S. (2016). Long-term outcome of early interventions to prevent posttraumatic stress disorder. The Journal of Clinical Psychiatry, 77(5), e580–e587. doi: 10.4088/JCP.15m09932CrossRef Google Scholar PubMed

Shalev, A. Y., Gevonden, M., Ratanatharathorn, A., Laska, E., van der Mei, W. F., Qi, W., … International Consortium to Predict PTSD (2019). Estimating the risk of PTSD in recent trauma survivors: Results of the International Consortium to Predict PTSD (ICPP). World Psychiatry, 18(1), 77–87. doi: 10.1002/wps.20608CrossRef Google Scholar

Short, N. A., Tungate, A. S., Bollen, K. A., Sullivan, J., D'Anza, T., Lechner, M., … McLean, S. A. (2022). Pain is common after sexual assault and posttraumatic arousal/reactivity symptoms mediate the development of new or worsening persistent pain. Pain, 163(1), e121–e128. doi: 10.1097/j.pain.0000000000002329CrossRef Google Scholar PubMed

Shouman, M., Turner, T., & Stocker, R. (2012). Applying k-nearest neighbour in diagnosing heart disease patients. International Journal of Information and Education Technology, 2(3), 220–223. doi: 10.7763/IJIET.2012.V2.114CrossRef Google Scholar

Steinberg, D. M., Fine, J., & Chappell, R. (2009). Sample size for positive and negative predictive value in diagnostic research using case-control designs. Biostatistics, 10(1), 94–105. doi: 10.1093/biostatistics/kxn018CrossRef Google Scholar PubMed

Stekhoven, D. J., & Bühlmann, P. (2012). MissForest – non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118. doi: 10.1093/bioinformatics/btr597CrossRef Google Scholar PubMed

Stewart, W. F., Ricci, J. A., Chee, E., Hahn, S. R., & Morganstein, D. (2003). Cost of lost productive work time among US workers with depression. JAMA, 289(23), 3135–3144. doi: 10.1001/jama.289.23.3135CrossRef Google Scholar PubMed

Surís, A., & Lind, L. (2008). Military sexual trauma: A review of prevalence and associated health consequences in veterans. Trauma, Violence, & Abuse, 9(4), 250–269. doi: 10.1177/1524838008324419CrossRef Google Scholar PubMed

Symes, L., Maddoux, J., McFarlane, J., & Pennings, J. (2016). A risk assessment tool to predict sustained PTSD symptoms among women reporting abuse. Journal of Women's Health, 25(4), 340–347. doi: 10.1089/jwh.2015.5287CrossRef Google Scholar PubMed

Torquati, M., Mendis, M., Xu, H., Myneni, A. A., Noyes, K., Hoffman, A. B., … Becerra, A. Z. (2022). Using the Super Learner algorithm to predict risk of 30-day readmission after bariatric surgery in the United States. Surgery, 171(3), 621–627. doi: 10.1016/j.surg.2021.06.019CrossRef Google Scholar PubMed

Vergunst, F., Tremblay, R. E., Nagin, D., Algan, Y., Beasley, E., Park, J., … Côté, S. M. (2019). Association of behavior in boys from low socioeconomic neighborhoods with employment earnings in adulthood. JAMA Pediatrics, 173(4), 334–341. doi: 10.1001/jamapediatrics.2018.5375CrossRef Google Scholar PubMed

Wongvibulsin, S., Wu, K. C., & Zeger, S. L. (2019). Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis. BMC Medical Research Methodology, 20(1), 1. doi: 10.1186/s12874-019-0863-0CrossRef Google Scholar PubMed

Wyss, R., Schneeweiss, S., van der Laan, M., Lendle, S. D., Ju, C., & Franklin, J. M. (2018). Using super learner prediction modeling to improve high-dimensional propensity score estimation. Epidemiology, 29(1), 96–106. doi: 10.1097/ede.0000000000000762CrossRef Google Scholar PubMed

Xing, W., & Bei, Y. (2020). Medical health big data classification based on KNN classification algorithm. IEEE Access, 8, 28808–28819. doi: 10.1109/ACCESS.2019.2955754CrossRef Google Scholar

Yokota, S., Endo, M., & Ohe, K. (2017). Establishing a classification system for high fall-risk among inpatients using support vector machines. Computers Informatics Nursing, 35(8), 408–416. doi: 10.1097/CIN.0000000000000332CrossRef Google Scholar PubMed

Zhuang, J., Cai, J., Wang, R., Zhang, J., & Zheng, W.-S. (2020). Deep kNN for medical image classification. Paper presented at the Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, Cham.CrossRef Google Scholar

Ziobrowski, H. N., Kennedy, C. J., Ustun, B., House, S. L., Beaudoin, F. L., An, X., … van Rooij, S. J. H. (2021). Development and validation of a model to predict posttraumatic stress disorder and major depression after a motor vehicle collision. JAMA Psychiatry, 78(11), 1228–1237. doi: 10.1001/jamapsychiatry.2021.2427CrossRef Google Scholar

Zlomuzica, A., Preusser, F., Schneider, S., & Margraf, J. (2015). Increased perceived self-efficacy facilitates the extinction of fear in healthy participants. Frontiers in Behavioral Neuroscience, 9, 270. doi: 10.3389/fnbeh.2015.00270CrossRef Google Scholar PubMed

Fig. 1. Schematic of the study design employed in the current study to achieve rigorous training and test sets for machine learning algorithms. Participant data were derived from two longitudinal studies of motor vehicle collision trauma survivors. Enrollment occurred across 16 emergency department (ED) sites in the Eastern United States (gray dots top panel). Geographic locations of these ED enrollment sites were grouped into three broad areas as defined by blue numbers for cohort 1, the White America cohort (Platts-Mills et al., 2011) and orange numbers for cohort 2, the Black American cohort (Linnstaedt et al., 2016). Participant data from each of these three geographic locations were then used to generate ‘site splits’ for training datasets (70% of the combined Black and White cohorts) and test datasets (30% of the combined cohort). As shown in the middle panel, training datasets were balanced across races and geographic locations. Within each training data site split, 100 rounds of Monte Carlo cross-validation were performed (represented by gray and green bars, bottom panel) to estimate variable selection probabilities and conduct feature selection. Using this methodology, average variable rankings were calculated, and the top variables were used for external validation within test datasets that were not constrained for race, sex, or geographic locations.

Table 1. Baseline characteristics of study participants from two longitudinal studies of motor vehicle collision trauma survivors (n = 1546)

Table 2. Prediction of 6-month posttraumatic stress symptoms (PTSS) using demographic and questionnaire data collected in the emergency department following motor vehicle collision trauma (n = 1546)

Kim et al. supplementary material

File 454.1 KB

Article contents

Derivation and validation of risk prediction for posttraumatic stress symptoms following trauma exposure

Abstract

Keywords

Introduction

Methods

Cohorts

Study design and population

Study design

Motor vehicle collision study, cohort 1

Motor vehicle collision study, cohort 2

Assessments collected at the time of trauma exposure (i.e. potential predictors)

Data cleaning, imputation, and variable reduction

Site split study design

Machine learning methods

K-nearest neighbors

Regularized regression

Random forest

Support vector machines

Neural networks

SuperLearner

Feature selection and assessment of model performance

Results

Participants

Feature selection and internal validation

External validation and model performance

Discussion

Supplementary material

Acknowledgements

Financial support

Conflict of interest

Footnotes

References

Kim et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests