Predicting childhood ADHD-linked symptoms from prenatal and perinatal data in the ABCD cohort

This study investigates the capacity of pre/perinatal factors to predict attention-deficit/hyperactivity disorder (ADHD) symptoms in childhood. It also explores whether predictive accuracy of a pre/perinatal model varies for different groups in the population. We used the ABCD (Adolescent Brain Cognitive Development) cohort from the United States ( N = 9975). Pre/perinatal information and the Child Behavior Checklist were reported by the parent when the child was aged 9 – 10. Forty variables which are generally known by birth were input as potential predictors including maternal substance-use, obstetric complications and child demographics. Elastic net regression with 5-fold validation was performed, and subsequently stratified by sex, race/ethnicity, household income and parental psychopathology. Seventeen pre/perinatal variables were identified as robust predictors of ADHD symptoms in this cohort. The model explained just 8.13% of the variance in ADHD symptoms on average (95% CI = 5.6% – 11.5%). Predictive accuracy of the model varied significantly by subgroup, particularly across income groups, and several pre/perinatal factors appeared to be sex-specific. Results suggest we may be able to predict childhood ADHD symptoms with modest accuracy from birth. This study needs to be replicated using prospectively measured pre/perinatal data.


Introduction
Adverse prenatal and perinatal experiences are linked with neurodevelopmental disorders such as attention-deficit/hyperactivity disorder (ADHD). For instance, increased risk of ADHD has been associated with: maternal alcohol, smoking and drug-use in pregnancy (He et al., 2020;Lees et al., 2020;Paul et al., 2021), pregnancy complications such as anemia, genitourinary infections and preeclampsia (Mann & McDermott, 2011;Wiegersma et al., 2019), preterm birth, and low birth weight (Franz et al., 2018;Momany et al., 2018). Biological mechanisms suggested to explain these links have included maternal hypothalamic-pituitary-adrenal axis dysregulation, excess glucocorticoids, inflammation, and insufficient oxygen and blood supply to the fetus, which can alter fetal gene expression (Kim et al., 2015;Smith et al., 2016).
The causal order of such pre/perinatal risks is difficult to establish, as many co-occur (e.g., maternal smoking & low birth weight). Furthermore, both pre/perinatal adversities and child mental health are confounded by socioeconomic and demographic context. This paper therefore focuses on the pragmatic predictiveness of information known at birth, rather than their causal order. We include various types of obstetric complication and substances used in pregnancy, but also basic demographic information known at birth such as parental age, sex, race/ethnicity of the child, and presense of a multiple birth. The primary aims of this study were to assess which of these factors best predicted ADHD-linked symptoms, and how well they can predict such symptoms, in a large cohort of 9-to 10-year-old children from the Adolescent Brain Cognitive Development study (ABCD; United States).
A secondary exploratory aim was to assess whether accuracy of our predictive model varied for different subgroups in the population. The United States is a demographically-heterogenous nation, and prediction of ADHD symptoms from birth may be more accurate (and useful) for specific groups within this population. Socioeconomics, race/ethnicity or family psychiatric history may confound the relationship between prenatal adversity and ADHD risk, or may moderate the effects of prenatal adversity on ADHD risk. For instance, both ADHD and pre/perinatal complications are associated with lower socioeconomic status (Finch, 2003;Madden, 2014;Martinson & Reichman, 2016;Russell et al., 2016). Regarding race/ethnicity, Black women in the United States are at elevated risk of giving birth to preterm and low birth weight children compared to white women (Catov et al., 2015;Giscombé & Lobel, 2005). The latter-cited studies suggest maternal health issues such as gestational hypertension or diabetes, and stress related to experiences of racism, may explain these race/ethnicity-based discrepancies. Prenatal effects on ADHD risk may also be moderated by genetic (Brinksma et al., 2017;O'Donnell et al., 2017) or familial predisposition (Clarke et al., 2009). For example, Brinksma et al. (2017) found that low birth weight was more strongly associated with ADHD symptoms in children with a low activity (vs high activity) MAOA genotype. They also found that both pregnancy/delivery complications and maternal smoking interacted with 5-HTTLPR genotype to influence ADHD symptom severity. By predicting ADHD-like symptoms in groups stratified by income (e.g., low, middle, high), race/ethnicity, and family psychiatric history, we may be able to identify for whom the link between pre/perinatal factors and ADHD is most relevant.
We also wished to explore whether predictive capacity of a pre/ perinatal model of ADHD risk varied by sex. Several reviews suggest males are more susceptible than females, to poor health outcomes following the same pre/perinatal adversities (DiPietro & Voegtline, 2017;Inkster et al., 2021). Two studies have shown that the effects of low birth weight on attention problems is moderated by sex such that the association is stronger in males (Dooley et al., 2022;Momany et al., 2017). While not formally testing moderation-by-sex, others have also found that the association between birth weight and ADHD symptoms scales is stronger in males compared to females (McNicholas et al., 2016;Martel et al., 2007). It is currently not well-understood why a "male vulnerability" might exist prenatally. Possible explanations include: a slower maturation of fetal organ systems in males; the foreign Y chromosome of males leading to greater placental immune response in the mother; and a greater prioritization of fetal growth over other repair functions in males (DiPietro & Voegtline, 2017). There is also a well-known sex difference (male > female) in the prevalence of ADHD diagnosis and symptoms (Murray et al., 2019;Willcutt, 2012). A pre/perinatal model of ADHD risk could have greater predictive capacity, and therefore be more useful, for males compared to females.
One of the challenges with pre/perinatal prediction models is that inputs are highly correlated (e.g., prenatal smoking and lower birth weight). Two main methodological approaches have been used to overcome this collinearity issue. The first is to observe the bivariate association between each perinatal predictor and ADHD outcome, and to include only significant predictors in a multivariate prediction model (Getahun et al., 2013;Schwenke et al., 2018;Sciberras et al., 2011;Silva et al., 2014;Willoughby et al., 2020). However, this does not capture inter-dependencies of multiple exposures, and can ignore important confounds (e.g., sex). A second popular approach is to perform dimension reduction, and to use the resultant set of latent factors to predict childhood ADHD (Milberger et al., 1997;Wiggs et al., 2016). While this acknowledges the correlated structure among predictors, it can be difficult to compare results across studies and to translate units of a latent factor to real-life measures. Similarly, a sum total of risks can be made to capture "cumulative prenatal adversity." Roffman et al. (2021) did so in the ABCD cohort and found that the number of prenatal adversities (0-8) was significantly and linearly associated with Child Behavior Checklist (CBCL) attention problems. Two or more prenatal adversities was associated with an 86% increased odds of high attention problems (T ≥ 60) compared with children with no prenatal adversity. However, the study did not report which prenatal factors were the best predictors of attention problems, and their solution was not internally validated. Elastic net regression has not yet been used to predict ADHD from pre/perinatal factors. This penalized regression algorithm can predict an outcome from a large number of predictors, and avoids overfitting by tuning penalty terms which reduce the number of predictors in the model. Elastic nets also tend to keep correlated predictors together, retaining or eliminating them as a group (Zou & Hastie, 2005) which may help to highlight broad groups of risk factors for intervention.
Improving prediction of ADHD is important for two reasons. First, given the strong genetic basis in ADHD, early identification of risk may be the best way to prevent or minimize symptoms (Fusar-Poli et al., 2019). Secondly, ADHD in childhood has long-lasting consequences on other areas of mental health and wellbeing throughout adolescent and adult life (Yoshimasu et al., 2012;Agnew-Blais et al., 2018). Early intervention may be efficient in minimizing the "snowballing" of issues and the development of complex cases. To date, studies that have used pre/perinatal data to predict variance in childhood ADHD symptoms have shown relatively low predictive capacity. For instance, Smidts & Oosterlaan, 2007 found that prenatal factors (maternal age, disease, smoking or alcohol during pregnancy) explained a further 1%-4% of the variance in ADHD symptoms at 3-6 years, after demographic, socioeconomic, and parent psychiatric factors were accounted for. O'Donnell et al. (2017) explained 4%-10% of the variance in ADHD symptoms (age 4-15) using information known at birth (sex, birth weight, gestational age, maternal age, maternal perinatal anxiety, smoking and alcohol use), but also included family demographics, COMT genotype, and toddler parenting styles. In this study, we wanted to quantify the variance in childhood ADHD symptoms explained solely by information available at the birth. Further, those previous studies pre-selected a small number of pre/perinatal variables (<15) to avoid overfitting their standard linear regressions. Elastic net regression provides a more data-driven approach to variable selection and model fitting, deciding among many potential predictors, which are most relevant to the outcome.
Other models designed to predict ADHD diagnosis are either reliant on magnetic resonance imaging (MRI) data, or are not relevant to children. MRIs remain expensive and non-routine in child psychiatry assessment, and the current MRI-based classifiers are only capable of concurrently identifying ADHD children that have already been clinical diagnosed (e.g., Sen et al., 2018). One prospective risk prediction model for adult ADHD has shown good predictive accuracy (Caye et al., 2019), however, childhood ADHD symptoms were used as a predictor in this model and the continuity of ADHD symptoms from childhood to adulthood is wellknown (Biederman et al., 2006). It may be more clinically useful to be able to predict ADHD at a younger age when interventions are most effective (Ornoy & Spivak, 2019;Sampaio et al., 2021).
While all pre/perinatal data were collected retrospectively in this study (age 9-10), all "pre/perinatal" variables are typically available by birth. Our study therefore investigates which pre/perinatal variables are most relevant to childhood ADHD symptoms, how well we can predict ADHD symptoms using such data, and how that predictive accuracy varies across subgroups within the population.

Participants
The Adolescent Brain Cognitive Development (ABCD) study is a large cohort study of children aged 9-10 from the United States (https://abcdstudy.org). The baseline data contains rich information on pregnancy and delivery, albeit retrospective, as well as detailed measures on the mental health of participants and their parents. Participating children were born between 2007 and 2009, making ABCD one of the most recently born large-scale child cohorts. The importance of choosing a relatively recently born cohort is underscored by significant changes over the past two decades in: rates of neonatal mortality and mortality (Goldenberg et al., 2008;Hug et al., 2019), maternal behavior during pregnancy (Cnattingius, 2004), and the prevalence of ADHD (Boyle et al., 2011).
Exclusion criteria imposed by ABCD researchers included contraindications to MRI, non-fluency in English, history of major neurological disorders, traumatic brain injury, extreme prematurity (<28 weeks gestation), and diagnoses of schizophrenia, moderate to severe autism spectrum disorder, intellectual disability, or substance-use disorder. In cases of sibling participants, the eldest was retained in the sample. All remaining subjects with outcome data were included in the analysis (N = 9,975).
The primary caregiver was the biological mother in 85% of cases. The remaining caregivers were biological fathers (10%), adoptive parents (2.5%), custodial parents (1%) and "other" (1.5%). A sensitivity analysis tested whether using biological mother reports only affected findings.
The 22 geographic locations that comprise the ABCD research sites are nationally distributed and were chosen in an attempt to capture the range of demographic and socioeconomic diversity of the United States. Within study sites, consenting parents and assenting children were primarily recruited through a probability sample of schools and summer camp programs and community volunteers.
The University of California at San Diego (San Diego, CA, USA) Institutional Review Board was responsible for the ethical oversight of the ABCD study. The secondary analysis of the data was approved by the Research Ethics Committee for the Royal College of Surgeons in Ireland.

Outcome: CBCL attention problems
The attention problems sub-scale of the CBCL was used to capture ADHD symptoms (Achenbach & Rescorla, 2001). This parentrated questionnaire contains 119 items in total, 10 of which are summed to form an attention problems score. Items cover behaviors such as inattention, hyperactivity and impulsivity (item list in Supplementary Material). Items are rated on a 3-point Likert scale (0 = not at all true; 1 = somewhat true; 2 = very true). For most analyses, we treat CBCL attention problems as a continuous scale (0-20) and use the R 2 statistic to capture predictive capacity of the model. However, we also predicted "likely ADHD diagnosis," defined as a score of 9 or above on this scale (equivalent to T-score of 64 in males; 66 in females). A cutoff of 9/20 was used based on Lampert et al. (2004) who found this was the optimum cutoff for predicting ADHD in a mixed group of clinic-referred and general community children. A meta-analysis showed that this CBCL scale, with cutoffs around T = 65, can discriminate ADHD cases and controls with good accuracy (Chang et al., 2016; pooled sensitivity & specificity = 77% & 73%). The attention problems score shows good reliability as a general factor of its 10 constituent items in this sample (omega total = 0.86).
Prenatal and perinatal predictors Any variable measured by the ABCD study that referred to the pre/ perinatal period, or that would typically be known by birth (e.g., child's sex), was included as a potential predictor to the model. All variables were reported retrospectively by the primary caregiver in the cohort's Developmental History Questionnaire (list of variables available online: https://nda.nih.gov/data_structure.html? short_name=dhx01). Maternal retrospective reporting on pre/ perinatal factors has been shown to be generally valid: Liu et al. (2013) found that maternal recall of most prenatal variables showed "substantial" to "perfect" agreement with medical records 8-10 years postpartum (κ = 0.60-1.00). Notable exceptions included substance-use when mother provide continuous data (e.g., number of cigarettes per day) and certain medical problems during pregnancy (proteinuria, nausea & vomiting; κ ≤ 0.40). We therefore used dichotomous substance-use measurers (yes/no). Rice et al. (2007) found maternal recall of birth weight, Neonatal Intensive Care Unit admission, delivery method, and gestational hypertension 4-9 years postpartum to show very good consistency with medical records. Finally, Ramos et al. (2020) found that maternal reports of substance-use and pregnancy complications at 9 months and 8 years postpartum were mostly consistent with one another (65%-98% agreement).
Birth information. Birth weight was reported by the parent in pounds and ounces, which was converted to kilograms. Some cases (N = 683) reported in ounces only and were removed for being too small likely mis-entered. One individual was removed from the analysis due to improbable birth weight for their gestational age (6 weeks early at 6.7 KG). Prematurity referred to the number of weeks prior to gestational week 40. Parental ages at birth (mother and father) was simplified into "years under 20" and "years over 35" given specific risks to the child associated with these cutoffs (Chang et al., 2014;Cleary-Goldman et al., 2005).
Pregnancy and delivery complications. Binary data on 13 pregnancy complications was available (see Table 1). "Persistent proteinuria" was merged with "pre-eclampsia, eclampsia or toxemia" due to low prevalence (n = 47). Binary data on eight delivery complications was available (Table 1). Two summary variables, total pregnancy complications and total delivery complications were created. Low rates of gestational rubella (n = 12), convulsions at birth (n = 13) and blood transfusion at birth (n = 37) were observed so these were excluded from analysis, though counted within sum totals. Caesarian section delivery and days spent in incubator were also included.
Maternal substance-use in pregnancy. Maternal smoking, alcohol use and drug-use referred to use of these substances at any point during pregnancy and at any frequency. Prevalence was low (<1%) for maternal use of oxycontin (n = 32), heroin/morphine (n = 20), cocaine/crack (n = 68) and "other" drugs (n = 91) so these were merged into an "other" drug class (n = 182) distinguishing them from marijuana/cannabis. Use of prenatal vitamin supplements was also included.

Subgroups
Child's race/ethnicity was captured by 5 groups: White, Black, Hispanic, Asian or Other, however stratified prediction was only performed for White, Black and Hispanic subgroups due to small sample sizes for Asian and Other (Table 1). Household income referred to total annual income from all sources before tax/deductions. Low income was defined as below $50,000, middle income as $50,000-99,000 and high income as over $100,000 (corresponding to tertile split points). Familial liability for ADHD referred to (a) parental ADHD symptoms and (b) parental psychiatric history. Parent ADHD symptoms were captured by the Adult Self-Report attention problems scale (Achenbach & Verhulst, 2010), as completed by the primary caregiver. Low, moderate and high scorers on this scale were defined as scoring <50th percentile, 50th-80th percentile, or >80th percentile respectively. Parental psychiatric history was captured by the number of lifetime mental health issues (from a list of 10) among biological parents, and was reported by the primary caregiver. Parental psychiatric history was defined as none (0 issues), average (1 or 2 issues) and strong (3þ issues), with the middle category based on the average number of issues reported (median = 1; mean = 1.7). Further information on these grouping variables is available in supplementary material.

Statistics
R code for all data processing and analysis is available on http:// rpubs.com/dooleyr/.

Elastic net
Elastic net regression was chosen for several reasons. First, we have a large number of potential predictors (40), and we wish to produce the most parsimonious model possible. Second, many of our pre/ perinatal factors are correlated (various pregnancy complications, prenatal smoking, etc.), which elastic net is well designed for (Zou & Hastie, 2005). Finally, we trained the elastic net within one section of the data, and validated/tested it in another (5-fold Other drug 2% (n = 182) 97% Vitamin Supplements 95% (n = 9,123) 5%
validation), thus ensuring that the subset of predictors chosen by the elastic net are generalizable to other samples, and not just that which it was trained on. We used the glmnet package in R (cv.glmnet function), which uses a quadratic approximation to the log-likelihood and then a cyclical coordinate descent algorithm (Friedman et al., 2010). Simply put, the elastic net is a standard linear regression with additional limits on the number of predictors and the size of beta coefficients. These degree of coefficient shrinkage is decided interactively by two penalty terms "alpha" (value between 0 and 1) and "lambda" (value > 0). When alpha = 0, ridge regression is run, which retains all predictors in the final model albeit with shrunk coefficients, and when alpha = 1, Least Absolute Shrinkage and Selection Operator (LASSO) regression is run which shrinks many coefficients to zero thus retaining only a few predictors in the model. In elastic net regression, we try multiple values for alpha between 0 and 1 and choose the one which provides the best predictive accuracy. Both LASSO and ridge approaches can be employed: shrinkage of some coefficients while setting others to 0. The lambda value determines the extent of shrinkage to each coefficient. Given some values for alpha and lambda values, elastic net produces a regression equation with a subset of predictors, their B coefficients, and an R 2 reflecting how well predicted outcome matched actual outcome.
Five-fold cross-validation was used to avoid overfitting. Five folds were chosen to provide ∼1000 individuals for validation and ∼1000 for testing in the primary analysis, samples which would likely contain a wide distribution of attention problems. Training-validation-test split was 80%-10%-10%. The optimal lambda value was found for 20 different alphas in the training set, we used the validation set to identify the alpha-lambda combination with the smallest mean absolute error using grid search, and finally we tested predictive performance on the test set. Mean test set R 2 and the number of times each predictor had a non-zero B coefficient ("selection frequency") were calculated, as were the 95% confidence intervals for these metrics. This process was repeated for a different test set (n = 998), until all participants had been in the test set once. This 5-fold validation process in turn was repeated 20 times, leading to 100 test set results (i.e., 100 R 2 estimates; 100 versions of predictor coefficients). Figure S1 describes the 5-fold cross-validation process in greater detail. Predictors with a selection frequency of at least 95% (95/100 runs) were defined as robust.
We use the terms risk vs protective factors to indicate positive Vs negative association with attention problems. However, we note that these associations bear no causal direction.
Sampling weights were not used in this study given the limitations of our elastic net algorithm to use weights when predicting unseen data (i.e., validation and test sets).

Models
First, we used all available pre/perinatal information (40 variables; Table 1) to predict CBCL attention problems in the full sample.
Second, the prediction model was stratified by sex (male/ female), race/ethnicity (White/Black/Hispanic), family income (high/mid/low), parent attention problems (high/mid/low) and parent psychiatric history (high/mid/none). Predictive accuracy, measured by test set R 2 , was compared across subgroups visually, with 1-way analysis of variances or Welch's t-tests where appropriate. One hundred test set R 2 values were available for each analysis (e.g., 100 R 2 s for males and 100 for females), leading to difference tests between groups with 100 observations each. A more conservative p-threshold of 0.01 was used to interpret subgroup results given that the sample was stratified five times (Bonferroni correction = 0.05/5 = 0.01).
We opted to conduct stratified prediction rather than including interactions with each contextual variable, because it allowed us to train and optimize the elastic net to each subgroup. As such, models were tailored to each subgroup rather than applying a "onemodel-fits-all" approach. Group-specific results are also be easier to interpret and may be more translatable to community-specific interventions.
Three sensitivity analyses were performed: (a) to predict binary CBCL attention problem score ≥ 9; (b) limiting the sample to cases where the respondent was the biological mother (n = 8,495); and (3) limiting the sample to cases with no missing data (n = 7,429). For (1), we used an elastic net with a binary logistic outcome and under-sampled those with CBCL scores < 9 (i.e., the majority class) such that the model could train on a balanced data set (N cases ∼ N controls ∼ 100). That is, the model is designed to discriminate cases from controls, given approximately 50:50 prevalence. Randomly under-sampling from the majority class is a way to avoid models becoming biased and resulting in a high proportion of false negatives (Mohammed et al., 2020).

Imputation
Any predictor with ≥10% missing values was excluded. Validation and test set missing values were replaced with the training set mean to maintain complete independence of data. Mean imputation has been shown to perform similarly to multiple imputation approaches when the proportion of missingness is <10% (Shrive et al., 2006;Waljee et al., 2013). Furthermore, few variables had rates of missingness above 5% (maternal medication and alcohol use, infant incubator-use, paternal age; Table S9). A sensitivity analysis (see (3) above) with complete observations only tested whether imputation affected results.

Results
The pre/perinatal model, including sex and race/ethnicity, explained approximately 8% of the variance in CBCL attention problems at age 9 (Mean R 2 = 8.13%; 95% CI = 5.61%-11.47%). Robust risk factors included: male sex, maternal illicit drug-use and smoking, total number of pregnancy and delivery complications, urinary tract infection (UTI), anemia and "other" pregnancy complication, medication-use, younger parental ages, Black and "other" race/ethnicity (Figure 1). No specific type of delivery complication was robustly linked with attention problems, nor was Csection, birth weight or gestational age. Asian children had lower attention problem scores compared to White children, and nonsingletons (e.g., twins) had lower scores compared to singletonborn children, controlling for all other factors.
Selection frequencies for all variables are shown in Table 1 while the mean coefficients for robust predictors are shown in Figure 1.
Note, averaged B coefficients in Figure 1 represent the average change on the CBCL attention problem scale (0-20) for each unit change in predictor, holding all other terms constant. There was no one "final" modelresults represent the average of 100 runs of the prediction model ( Figure S1). Mean values of penalty parameters alpha and lambda were 0.46 and 0.18 respectively (Table S8). Bivariate correlations between all potential predictors showed multi-collinearity, particularly between the different types of obstetric complication, validating the use of elastic net ( Figure S6).

Subgroups
Model performance varied significantly across subgroups (Figure 2). Mean R 2 was lower for all subgroups compared to full-sample analysis, with the exception of the low-income group (Table 2).

Income
A strong difference in model performance was observed across income groups as determined by 1-way analysis of variance on test set R 2 (F(2,297) = 285.1, p < .001). The pre/perinatal model was most predictive of attention problems in children of low-income homes (mean R 2 = 9.61%), less predictive for middle-income homes (mean R 2 = 4.09%) and performed poorly for children of high-income homes (mean R 2 = 2.90%). Risk factors specific to the low-income setting included maternal anemia, UTI, "other" complication in pregnancy, younger paternal age, maternal smoking and alcohol use. Only three risk factors were identified as robust in high-income settings: male sex, maternal drug-use in pregnancy and total number of pregnancy complications. Asian race/ethnicity was protective in high-income families, while Hispanic race/ethnicity was protective in low-income families (Table S2).

Race/ethnicity
Model R 2 differed significantly across racial/ethnic groups (F(2,297) = 36.93, p < .001). The pre/perinatal prediction model performed slightly better for Black children (mean R 2 = 7.10%,), compared to White children (mean R 2 = 6.43%), though this difference was not significant (t(127.68) = −1.39, p = 0.17). Mean R 2 values were significantly higher for White and Black children compared to Hispanic children (mean R 2 = 3.61%; Figure 2; Table 2). Confidence intervals for R 2 included zero in Hispanic (95% CI = −1.85% to 6.94%) and Black children (95% CI = −2.90% to 14.22%), suggesting that pre/perinatal did not predict attention problems reliably in these groups. Low model precision was also evident from wide CIs in the sample of Black children (95% CI = −2.90% to 14.22%), particularly compared to White children (95% CI = 3.06%-9.69%). Robust risk factors common to all race/ ethnicity groups were: male sex, pregnancy complications and birth complications. Robust risk factors specific to White children included UTI and severe nausea in pregnancy, while those specific to Black children included exposure to gestational anemia and diabetes. Maternal marijuana-use was a robust risk factor for attention problems in Hispanic children only (Table S3).
Familial liability to ADHD R 2 varied significantly across the different levels of parental psychiatric history (F(2,297) = 63.86, p < .001) and parent ADHD symptoms (F(2,297) = 15.6, p < .001). Variance explained by the model was low for parents with no psychiatric history (mean R 2 = 2.88%), higher for parents with a history of 1/2 issues (mean R 2 = 4.96%) and higher again for parents with a strong history, that is, 3þ issues (mean R 2 = 5.98%). Similarly, pre/perinatal factors predicted attention problems significantly better in children whose parents had high attention problems themselves (mean R 2 = 5.32%) compared to those whose parents scored in the moderate range (mean R 2 = 4.30%) or the low range (mean R 2 = 3.75%; Figure 2; Table 2). Black race/ethnicity robustly predicted attention problems only in children whose parents had a strong psychiatric history or who scored in the moderate ADHD symptom range (Tables S4-5). Gestational anemia was a risk factor for attention problems specific to children whose parent had high ADHD scores (Table S5).

Sex
Pre/perinatal variables explained more slightly more variance in attention problems in males (R 2 = 5.28%) compared to females (R 2 = 4.82%; Table 2) however this difference was not significant at the corrected threshold of 0.01 (F(1,198) = 4.07, p = 0.045). More predictors were identified as robust for males versus females (12 vs. 8). Common risk factors among males and females were maternal drug-use and smoking during pregnancy, total number of pregnancy complications, gestational UTIs, severe nausea and younger maternal age. Common protective factors included being Asian and a non-singleton birth (Table 3).
Pre/perinatal predictors of attention problems stronger in males than females (according to selection frequency) included total number of delivery complications, maternal prescription medication-use, gestational anemia and "other" race/ethnicity. There were no female-specific risk factors however gestational diabetes was a protective factor not identified for males (Table 3).

Sensitivity analyses
Additional analyses tested (a) the clinical relevance of a pre/perinatal model, and whether findings were biased by (b) inaccurate retrospective reporting from non-maternal caregivers or (c) by imputation.
Variance in attention problems explained by the model did not change significantly (t(196.67) = 0.19, p = .85) when the data was limited to that reported by the biological mother (85% of cases; N = 8,495). R 2 dropped slightly from 8.13% (95% CI: 5.61%-11.47%) to 8.09% (95% CI: 5.71%-10.47%). The same predictors identified in the full sample were also identified in this biological-mother-only model, with the exception of: Black race/ethnicity and use of non-cannabis drug ( Figure S5).
Variance explained in attention problems in the full sample dropped from 8.13% (95% CI: 5.61%-11.47%) to 6.72% (95% CI: 4.19%-9.43%) when limiting the sample to participants with full data (N = 7,429) which was a significant difference in R 2 (t(196.58) = 6.94, p < .001). The 11 robust predictors identified in the restricted analysis were: non-singleton birth (protective), male sex, total pregnancy and delivery complications, gestational nausea, UTI and anemia, younger maternal age, and maternal smoking, drug-use and prescription medication-use ( Figure S4). Robust predictors chosen in the imputed data but not the restricted data, and therefore possibly influenced by imputation, were: Asian, Black and other race/ethnicity, maternal use of non-cannabis drug, "other" pregnancy complication and younger paternal age (Figures 1 & S4).

Discussion
Approximately 8% of the variance in age 9 attention problems could be explained by information generally known at birth such as sex, parental age, and prenatal exposures. However, there was considerable heterogeneity in the degree to which the model predicted attention problems across groups in the population. Pre/ perinatal variables were more predictive of attention problems in low-income households compared to high-or middle-income households, more predictive in families with a stronger parental psychiatric history, more reliable for White children compared to Black or Hispanic children (Figure 2). Predictive capacity (R 2 ) did not differ significantly between males and females, however several male-specific risk factors were identified including maternal medication-use in pregnancy and total number of delivery complications (Table 3).

Comparison with other studies
Few other studies have reported R 2 statistics from pre/perinatal models predicting continuously measured ADHD symptoms. A study in the ALSPAC cohort (n = 6,969), explained 9.4% of the variance in ADHD symptoms at age 4, and 4.6% of the variance in symptoms at age 15, however they also included genotype and postnatal data such as parenting styles (O'Donnell et al., 2017). In another study, pre/perinatal factors accounted for 3.7% of the variance in impulsivity at 3-6 years, 1.5% of the variance in hyperactivity and <1% of the variance in inattention (Smidts & Oosterlaan, 2007). We explained 8% of the variance in ADHD symptoms using only information typically known at the birth. Additively or multiplicatively combining such a model with postnatal variables such as early life adversity and parenting styles, may improve prediction of childhood ADHD further (Huhdanpaa et al., 2021;Willoughby et al., 2020).
We found that a pre/perinatal model could discriminate those with clinically relevant attention problems (scores of 9 or above) Figure 2. Subgroup variation in capacity to predict age 9 CBCL attention problems from pre/perinatal factors. Rsquared averaged over 100 test sets. Error bars indicate ±1 standard deviation.
better than chance, but with low accuracy (average AUC = 67.4%). Despite many studies using pre/perinatal variables to predict ADHD in childhood (probable or diagnosed), none to our knowledge report accuracy statistics on an unseen data (i.e., internal validation). One exception is a prediction model in very preterm/low birth weight children, which can predict ADHD with good accuracy (AUC = 81%; Franz et al., 2022). This is a useful model for preterm/low birth weight children, however it cannot capture the majority of children who develop ADHD (not born very preterm/low birth weight). The model described in this study is not intended for clinical prediction, as it is based on cross-sectional data and has not been externally validated. However, it does provide an up-to-date estimate of the capacity of pre/perinatal information to predict childhood ADHD symptoms, and an overview of how that capacity may change across demographic groups (Figure 2). There remains a need to improve clinical risk prediction of ADHD in general population samples.

Pre/perinatal predictors of interest
Seventeen out of a possible 40 pre/perinatal variables were identified as robust predictors of CBCL attention problems.
Non-prescription drug-use was associated with a ∼1-point (∼5%), increase on the ADHD symptom scale and smoking was associated with a 0.69 point (3.45%) increase in attention problems controlling for all other variable. The association between maternal smoking and ADHD in the child appears to be strongly confounded by familial factors and may not be causal (Rice et al., 2018;Thapar et al., 2009). The same concerns may extend to drug-use . However, the issue of causality does not necessarily invalidate the predictiveness of these variables. Unlike another investigation of the ABCD cohort (Paul et al., 2021), our study did not identify maternal cannabis-use as a risk factor for attention problems (Table 1), though cannabis-use did predict attention problems in Hispanic children (Table S3). Our findings suggest non-cannabis drugs drove the association between maternal drug-use and attention problems. While speculative, it may be relevant that the years in which ABCD participants were born (2005)(2006)(2007)(2008)(2009) overlaps with a time of substantial increased rates of overdose and mortality due to prescription opioids in the United States (Centers for Disease Control & Prevention, 2011).
The observation that total number of complications in pregnancy, use of any medications, and nausea were robust predictors of attention problems supports the theory that common factors among infections and illnesses (e.g., elevated cytokines, fever) underlie the association between obstetric complications and neurodevelopmental issues (Flinkkilä et al., 2016). On the other hand, UTIs and anemia in pregnancy also predicted ADHD risk independently of "total complications," nausea and medication-use. We found UTI during pregnancy was linked with a 2.8% increase (0.56 points) on the CBCL attention problems scale, and anemia was associated with a 1.75% increase (0.35 points). Another large a Intercept refers to group mean on the CBCL attention problem scale when all other predictors are 0 or at the reference level (e.g., mean birth weight, White race, male/female average). b Number of pre/perinatal predictors with a selection frequency of 95% or more. c Restricted sample only includes participants with full data on all pre/perinatal variables.
US study found that maternal genitourinary infection during pregnancy was associated with a 29% increased odds of ADHD diagnosis in the child aged 8-9 (N = 84,721; Mann & McDermott, 2011). As UTI and anemia are preventable in many cases, results support a reappraisal and improvement of screening for asymptomatic bacteriuria and iron deficiencies during pregnancy. Subgroup analysis shows that anemia during pregnancy was a particularly strong risk factor in children who were male, Black, from low-income homes, and whose parents had high attention problems (Tables S2-6), suggesting strategies to prevent gestational anemia could be further targeted. Birth weight was not a robust predictor of attention problems when included alongside other pre/perinatal factors. This is noteworthy given the high replicability of this association (Momany et al., 2018). The inclusion of pregnancy complications and maternal substance-use may have overshadowed the effects of birth weight on attention problems, though birth weight may still be a useful proxy for prenatal adversity. For instance, low birth weight has been shown to mediate the association between maternal smoking and ADHD (Brannigan et al., 2020).

Variance by subgroup
The variance in ADHD symptoms explained in low-income homes was more than three times greater than in the high-income homes ( Figure 2). Socioeconomic disparities in adverse birth outcomes and ADHD risk are widely reported (Blumenshine et al., 2010;Russell et al., 2016). Other have reported that gestational diabetes (Nomura et al., 2012) and maternal depression (Herba et al., 2016) pose greater risk to the mental health of children from lower socioeconomic strata. We add to these findings, showing a combined pre/perinatal model is more predictive of childhood attention problems in lower income settings. Possible explanations include reduced access to quality healthcare, increased financial stress and the compounded effects of both. The largest jump in prediction accuracy across income groups was from middle to low incomes ( Figure 2) suggesting that annual income below $50,000 ("low") may be a particularly relevant threshold for increased vulnerability to pre/perinatal risks in the United States. Descriptive statistics in Figure S3 shows that UTIs in pregnancy, maternal drug-use, smoking and younger maternal ages at birth were more common in lowincome households.
The association between pre/perinatal factors and attention problems also appeared to be moderated by race/ethnicity of the child. A meta-analysis found that the association between birth weight and ADHD symptoms in United States-based studies became stronger as the proportion of Black children in the sample increased (Momany et al., 2018). In this study, predictive capacity of the pre/perinatal model did not vary significantly between children of White and Black race/ethnicities (mean R 2 = 6.43 and 7.10 respectively). However, there was large variability in the variance explained in attention problems for Black children (R 2 CI = −2.90% to 14.22%), potentially explained by smaller sample size relative to other racial/ethnic groups (n = 1508; Table 2). The model performed particularly poorly in Hispanic children in terms of both reliability and precision (R 2 = 3.61%; 95% CI = −1.85% to 6.94%) despite reasonable sample size (n = 2,121) and similar average ADHD scores as other groups ( Figure S2). Given many Hispanic/LatinX families in the United States have South American roots, it is relevant that South American samples show particularly weak associations between birth weight and ADHD symptoms, compared to samples from North America, Europe and Asia (Momany et al., 2018). It is not currently well-understood why this is. Race/ethnicity-linked parental attitudes toward ADHD may influence CBCL scores (Bailey et al., 2014;Collins & Cleary, 2016) and observed associations with pre/perinatal variables. Other potential explanations for moderation by race/ethnicity may be unequal access to (or quality of) healthcare, and exposure to race-based discrimination (Bailey et al., 2014;Rosenthal et al., 2018). Prediction of attention problems improved with increasing familial psychopathology (Table 2). Higher levels of parental psychopathology might reflect genetic predisposition and our results may reflect a gene-environment interaction. For instance, Brinksma et al. (2017) found a gene-environment interaction between 5-HTTLPR genotype (involved in serotonin signaling) and obstetric complications on childhood ADHD symptoms. However, it is also possible that our results reflect an environment-environment interaction or a gene-environment correlation. Parents with mental health issues may have difficulty attending to their child's needs (Dubber et al., 2015), and may be more likely to be substance users pre-and postnatally (Smedberg et al., 2015). Recent studies have suggested that the effects of environmental factors on ADHD risk may be largely captured by genetic transmission of risk from parents (Agnew-Blais et al., 2022;Pingault et al., 2022). Future studies should use intergenerational Mendelian randomization to test whether associations between pre/perinatal factors are independent from such genetic processes.
While Figure 2 suggests any psychopathology in parents moderated model accuracy more strongly than parental attention problems, this should be interpreted with caution. Parental psychiatric history was based on the lifetime history of both parents (where available) while parental attention problems was based on the self-reported symptoms of one parent, leading to differences in the variance of each measure.
Being male was the most robust predictor of age 9 attention problems in the full sample and many subgroups (Table S2-5). This is consistent with the well-replicated sex difference in the prevalence of ADHD and many other prediction studies (Getahun et al., 2013;Huhdanpaa et al., 2021). The pre/perinatal model was also slightly better at predicting attention problems in males (mean R 2 = 5.28%, SD = 1.51%) compared to females (mean R 2 = 4.82%, SD = 1.72%) supporting male fetal vulnerability theories (DiPietro & Voegtline, 2017;Inkster et al., 2021).
Delivery complications, gestational anemia, maternal medication-use and "other" race/ethnicity were more robust predictors of attention problems in males compared to females (Table 3). Consistent with our finding on anemia, Santa-Marina et al. (2020) found that the association between maternal iron deficiency in pregnancy and ADHD symptoms was stronger in male compared to female children. This may be explained by sex-specific differences in the effects of perinatal oxygen shortages on dopamine levels in the prefrontal cortex, gene methylation and behavior (Laplante et al., 2012;Wang et al., 2013). Slow heartbeat at birth specifically had the highest selection frequency of all delivery complications in males (72% ; Table S6). Relevantly, females display a more adaptive heart rate increase during labor compared to males (Bernardes et al., 2009;DiPietro & Voegtline, 2017). Medications in pregnancy were also a stronger risk factor for males compared to females. Unfortunately, a standardized labeling procedure was not applied to medications in ABCD, with brand names (e.g., Prozac) and chemical names (e.g., fluoxetine) used interchangeably. Future studies should assess whether specific types of medication increase the risk of ADHD in a sexually dependent manner.
One unexpected finding in sex-stratified models was that gestational diabetes was robustly linked with lower attention problems in females, but not in males (Table 3). It is unknown whether this reflects a sex-moderated effect of diabetes, diabetes treatment, or another unmeasured variable such as maternal weight status. Relevantly, sex differences have been observed in neurodevelopment and cardiometabolic outcomes following gestational diabetes and insulin treatment in experimental animals (Sousa et al., 2020;Talbot & Dolinsky, 2019).
Differential vulnerabilities of males and females to certain gestational factors may explain why the effect of birth weight on ADHD symptoms is moderated by sex in some samples (Dooley et al., 2022;Momany et al., 2017), but not others (Lim et al., 2018;Pettersson et al., 2019).

Protective factors
Children of Asian background had lower attention problems compared to all other race/ethnicities ( Figure S2; Figure 1). Our results are consistent with other studies suggesting Asian American children are significantly less likely than White American children to receive ADHD diagnosis and treatment (Shi et al., 2021;Wong & Landes, 2021). A meta-analysis showed that the association between birth weight and ADHD symptoms was largest in Asia than other continents (North America, South America, Australia & New Zealand, Europe; Momany et al., 2018) which, when combined with our results, suggests distinct effects of race/ethnicity and place of residence (e.g., Asian Americans vs. Asians in Asia). Our findings also show that non-singleton birth is a robust protective factor (Figure 2). Twin-ship may offer a more supportive environment due to the close social contact with the co-twin postnatally (Pulkkinen et al., 2003).

Strengths and limitations
The main strengths of this study are its use of a large data set and the inclusion of wide range of prenatal/perinatal factors. There are few studies in this area which possess both these methodological strengths. Second, use of elastic net regression and 5-fold validation to identify robust predictors of attention problems was novel, and appropriate for this set of correlated pre/perinatal predictors ( Figure S6). Third, we used symptom scales as our outcome rather than diagnosis. While diagnosis is clinically informative, it ignores the extent of inter-individual variation and may overlook smaller pre-clinical effects. Finally, we stratified our analyses by social and economic factors which tested generalizability of findings to subgroups of this population.
The primary limitation of this study is the retrospective reporting of pre/perinatal data and the reliance on parental report for exposure, outcome and covariate data. Such reports are subject to recall and rater bias. For instance, studies matching maternal retrospective reports with medical records show mismatch in rates of alcohol use and some complications such as nausea and proteinuria Rice et al., 2007). A second limitation is the sampling bias associated with cohort studies. Missing data patterns were non-random, with those reporting on all variables more likely to have lower CBCL attention problems and fewer prenatal risk factors (Table S10). Third, some important pre/perinatal variables were not available in the data set (e.g., maternal stress). Fourth, CBCL attention problems did not discriminate between the different ADHD symptom domains, which could be differentially impacted by pre/perinatal factors. Finally, despite using machine learning approaches and elastic net feature selection, the model used is relatively simple. For instance, we did not test for interactions between pre/perinatal factors or acknowledge clustering of participants within the 22 research sites. Once the predictive accuracy of an ADHD-prediction model is improved by future studies, the performance of various predictive modeling techniques should be compared (e.g., random forests, classification and regression trees).

Conclusion
We explained roughly 8% of the variance in age 9 ADHD-like symptoms using information available at most births. Almost half of the robust predictors identified were potentially preventable (maternal substance-use, anemia, UTI, young parental age at birth). Future studies will need to validate these results using prospectively collected data and try to improve prediction accuracy by comparing other, more complex, prediction techniques. These findings may inform interventions to prevent and minimize childhood ADHD symptoms, particularly in the United States.
Supplementary material. The supplementary material for this article can be found at https://doi.org/10.1017/S0954579423000238 Acknowledgements. We gratefully acknowledge the contribution of participants and parents from the Adolescent Brain Cognitive Development (ABCD) study, as well as and the researchers that conducted the research. We also thank the funding bodies that enabled this research.  award  numbers  U01DA041022,  U01DA041028,  U01DA041048,  U01DA041089,  U01DA041106,  U01DA041117,  U01DA041120,  U01DA041134, U01DA041148, U01DA041156, U01DA041174, U24DA041123, and U24DA041147. A full list of supporters is available at https://abcdstudy.org/nih-collaborators. None of the funding or supportive bodies of ABCD (NIH) take any responsibility for the views expressed or the outputs generated in this study.
Conflicts of interest. None.