Early environmental risk factors for neurodevelopmental disorders – a systematic review of twin and sibling studies

While neurodevelopmental disorders (NDDs) are highly heritable, several environmental risk factors have also been suggested. However, the role of familial confounding is unclear. To shed more light on this, we reviewed the evidence from twin and sibling studies. A systematic review was performed on case control and cohort studies including a twin or sibling within-pair comparison of neurodevelopmental outcomes, with environmental exposures until the sixth birthday. From 7,315 screened abstracts, 140 eligible articles were identified. After adjustment for familial confounding advanced paternal age, low birth weight, birth defects, and perinatal hypoxia and respiratory stress were associated with autism spectrum disorder (ASD), and low birth weight, gestational age and family income were associated with attention-deficit/hyperactivity disorder (ADHD), categorically and dimensionally. Several previously suspected factors, including pregnancy-related factors, were deemed due to familial confounding. Most studies were conducted in North America and Scandinavia, pointing to a global research bias. Moreover, most studies focused on ASD and ADHD. This genetically informed review showed evidence for a range of environmental factors of potential casual significance in NDDs, but also points to a critical need of more genetically informed studies of good quality in the quest of the environmental causes of NDDs.

The causes of NDDs are multiple, both genetic and environmental (Martin, Taylor, & Lichtenstein, 2018;Taylor et al., 2019), but, the exact causes driving atypical neurodevelopment remain poorly understood. Based on findings from twin and family studies, NDDs are considered highly heritable (Polderman et al., 2015;Posthuma & Polderman, 2013;Ronald & Hoekstra, 2011), with both common and rare genetic variants contributing to the phenotypes (Hansen & Rogers, 2013). While the research focus has until recently been mostly on genetic causes (Bauxbaum & Hof, 2011;Demontis et al., 2019;Landrigan, Lambertini, & Birnbaum, 2012;Szatmari et al., 2007), heritability estimates leave space for the potential significance of environmental factors (Herbert, 2010;Pessah, Cherednichenko, & Lein, 2010;Shelton, Hertz-Picciotto, & Pessah, 2012;Zuk et al., 2012). In addition, for several NDDs, such as ASD and ADHD, clinical phenotypes, broader phenotypes, and traits of the conditions are continuously distributed in the general population, with overlapping etiologies and sources of variation (Martin et al., 2018). Therefore, it is also important to look at outcomes of NDD as both categorical (diagnoses) and dimensional (traits and symptoms) for two reasons. First, as dimensional definitions in contrast to categorical ones may be more sensitive to subtle subclinical toxic effects, they may enable the development of more detailed exposure-response profiles and facilitate the testing of complex functional relationships between continuous behavior measures and biological outcomes like brain structure and behavior (Rauh & Margolis, 2016). Second, because the etiology of clinical phenotypes overlaps with the etiology of subclinical phenotypes and condition traits, studying those traits might generate heuristic hypotheses to be tested in clinical samples.
Animal, human cell, and epidemiological studies suggest a wide range of environmental risks impact on neurodevelopment. Recently, prenatal maternal anemia has been associated with several NDDs, including ID, ASD, and ADHD (Wiegersma, Dalman, Lee, Karlsson, & Gardner, 2019). In ASD, associations with advanced parental age, maternal valproate intake during pregnancy, toxic chemical exposure, maternal diabetes, enhanced steroidogenic activity, immune activation, possibly altered zinc-copper cycles, and treatment with selective serotonin reuptake inhibitors (SSRI) during pregnancy have been reported (Bölte, Girdler, & Marschik, 2019). Environmental factors commonly linked to ADHD are food additives/diet, lead contamination, cigarette and alcohol exposure during pregnancy, and low birth weight (Banerjee, Middleton, & Faraone, 2007). For reading disabilities, Mascheretti, Andreola, Scaini, and Sulpizio (2018) found evidence for gestational age and birth weight being the most important pre-and perinatal risk factors, while reporting inconclusive findings for maternal cigarette smoking, family history of psychiatric and medical diseases, and risk of miscarriage. Prenatal alcohol consumption, diabetes, treatment with antidepressants, being deficient in iodine or iron, and dietary fish, as well as postnatal depression, low birth weight, and neonatal problems have all been linked to motor difficulties in childhood (Golding, Emmett, Iles-Caven, Steer, & Lingam, 2014). Pregnancy-related noxious exposures and lower birth weight may be more frequent in pregnancies of children who later develop Tourette's syndrome, particularly maternal smoking and prenatal life stressors, and psychosocial stress influences tic severity (Hoekstra, Dietrich, Edwards, Elamin, & Martino, 2013). With regards to developmental mechanisms, research from different disciplines found alterations of key biological systems in NDDs, such as catecholaminergic imbalances, glutamatergic synapse function, chromatin modelling, and ion channel pathways (Cristino et al., 2014;Geschwind & Levitt, 2007;Pinto et al., 2014). It is suggested that changes to immunological, endocrinological, and gutbrain axis processes are involved in causal pathways (Edmiston, Ashwood, & Van de Water, 2017;Kelly, Minuto, Cryan, Clarke, & Dinan, 2017).
Familial confounding is a major limitation to much of the current literature on environmental risk factors. Familial confounders are shared factors within a family, including both unmeasured shared environmental and genetic factors, that increase similarity in siblings. Although many of the above environmental factors have been shown to be associated with NDDs, many of the exposures are in themselves, to a degree, heritable. Thus, it cannot be ruled out that they are driven by genetic links between exposure and outcome, and not by the environment itself. As discussed by van Dongen, Slagboom, Draisma, Martin, andBoomsma (2012) andD'Onofrio, Lahey, Turkheimer, andLichtenstein (2013b), twin, sibling, and family studies, as compared to conventional case control studies, have the potential to disentangle the effects of environment from genetic and unknown environmental factors. By comparing the risk of a given outcome in twins or siblings who are differentially exposed to a given factor-or conversely, comparing exposure across pairs who are discordant for the outcome-it is possible to adjust for many factors that are shared within the pairs of twins or siblings. This has often been neglected in previous research on environmental factors in NDD. Indeed, making causal inference with confidence requires far more prerequisites than just control for familial confounding (Hill, 1965;Sjölander & Zetterqvist, 2017). Still, this type of adjustment has proven highly useful in refuting proposed causal associations. For example, a meta-analysis by Mezzacappa et al. (2017) estimated the odds ratio [OR] for ASD to be 1.52 (95% CI, 1.09-2.12) for SSRI exposure during pregnancy. However, a later epidemiological study found that this association was to a large degree confounded by familial factors since it was attenuated in a sibling comparison analyses . Likewise, regarding the above listed potential environmental risk factors for ADHD, a more recent review by Sciberras, Mulraney, Silva, and Coghill (2017) revealed a pattern indicating that the stronger the study design-especially regarding genetic and familial confounders-the less likely it was to support an association of SSRI use in pregnancy and the presence of ADHD in offspring. Similarly, the strong association between ADHD and smoking during pregnancy seem to be better accounted for by genetic and familial factors rather than a causal association between smoking during pregnancy and ADHD. This is key to understanding the rationale behind twin and sibling studies. Other analytical approaches assume that there are no concurrent explanations of the associations among initial risks, the mediating variables, and the outcome of interest (in this case smoking during pregnancy and ADHD), although it is clearly the case in reality. First, other environmental risks such as parental intellectual abilities, socioeconomic status (SES), and psychiatric problems also predict offspring ADHD; second, smoking during pregnancy is influenced by genetic factors (D'Onofrio et al., 2013b). The same also holds true for many other environmental factors and makes studies controlling for familial confounding a crucial aspect when trying to establish causal inference.
When comparing sibling and twin studies, the within-pair comparisons among twins, in particular those in monozygotic (MZ) twins, hold the best premises for adjustment for familial confounding when studying environmental risks. Despite this, there are several valid reasons why sibling studies are occasionally preferred over twin studies. First, it is rarely possible to measure prenatal differences in twins sharing the same prenatal environment. In order to be able to perform an analysis of the withinpair association of a prenatal factor with a particular outcome, one would require individual and separate prenatal exposure information for each twin, an often-impossible demand. In some instances, as in the case of gestational age, there is no within-pair difference to measure. Therefore, we are left with studying siblings from different pregnancies, regarding prenatal exposures, when trying to adjust for familial confounding. Second, siblings are more common than twins, and therefore, sibling studies are easier to perform, and larger cohorts possible to collect. Third, replication of results in both twins and siblings ensures that the results obtained from twin studies generalize beyond twins.
This systematic review spans from pregnancy-related factors to the early childhood, inviting the investigation of the timing related to the effect of potential environmental factors. Interestingly, studies have shown that the heritability of fetal growth rate changes across trimesters (Workalemahu et al., 2018), and that the heritability of autistic traits changes from childhood to early adulthood (Taylor, Gillberg, Lichtenstein, & Lundström, 2017). These are examples pointing to the possibility that the controlling for familial confounding might be differentially important during different stages of development.
The aim of this systematic review was to summarize the evidence from twin and family studies about the role of environmental risk factors for NDDs, defined both dimensionally and categorically, controlling for familial confounding, in order to inform researchers and funding agencies in both preclinical and applied areas of NDDs, and guide clinical management. The potential costs of environmental factors being incorrectly connected to NDDs, owing to a lack of control of familial confounding in research, include waste of public resources, unnecessary worry, misleading advice, and eroded public trust. The broad and systematic approach of this review, incorporating all NDDs according to the fifth edition of Diagnostic and Statistical Manual of Mental Disorders (DSM-5) nomenclature, allowed us to map a wide range of environmental factors postulated to be involved in their etiology and identify factors that have not yet been sufficiently studied in relation to NDDs.

Method
This systematic review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement (Moher, Liberati, Tetzlaff, & Altman, 2009). The protocol was registered in advance with PROSPERO (CRD42018079513).

Search strategy
A systematic literature search was performed by two librarians at Karolinska Institutet in October 2017 in the following databases: Medline (Ovid), PsycInfo (Ovid), Embase, Web of Science Core Collection and Cochrane Library. The search was updated in March 2019 for recently published articles. The complete search strategy for each database is available in Supplementary Appendix 1.

Eligibility
Study design: Case control and cohort studies including a twin or sibling comparison. Case control studies should include twins or siblings discordant for one or more NDDs, with the unaffected or less affected twin or sibling as the comparator. Cohort studies should include twins or siblings discordant for exposure and with one or more NDD as the outcome.
Exposure: Any specified environmental factor, with exposure time up to the age of 5 years. Only studies with a specified environmental factor were included.
Outcome: One or more of the NDDs included in DSM-5 (ASD, ADHD, ID, CD, SLD, DCD, and TD). The conditions could either be reported as categorical (diagnoses) or dimensional (symptom or traits severity). Categorical outcomes were defined according to DSM-III, DSM-IV, DSM-5, International Classification of Diseases (ICD-9), ICD-10, or earlier diagnostic practices, and based on clinical assessment, medical registries, or cut-offs for diagnosis on diagnostic instruments (APA, 1987(APA, , 2000(APA, , 2013NCHS, 1990;WHO, 1992). Dimensional outcomes were defined using disorder specific scales, or scales measuring constructs closely related to the respective conditions. Eligible studies should report the within-pair association of the exposure with one or more NDD, or with symptom or traits severity. Studies only reporting on the heritability in general terms were excluded.
Publication type: Peer-reviewed articles published in English.

Study selection and data extraction
After removal of duplicates the titles and abstracts of the studies retrieved from the search were screened using EndNote X8 and X9. The titles and abstracts of all references were screened independently by two reviewers. At this stage, a publication was excluded if the reviewers unanimously found that it was clear that it did not meet the given eligibility criteria. Publications found to be of potential relevance by at least one of the reviewers were obtained in full text and assessed for eligibility independently by two reviewers. Disagreement at this stage was solved by consensus. If necessary, a third reviewer was consulted. The main study characteristics and results were extracted independently by two reviewers. A data extraction sheet was created, pilot tested, and modified based on the Cochrane EPOC Data Collection Checklist . Discrepancies were solved by consensus. Items extracted included: author; publication year; country; study design; study cohort; sample size; sex; age; sibling or twin control; disorder /-s studied; environmental factor /-s studied; study methodology; recruitment method; completion rates; missing data; outcomes and type of measures; and the main results.

Risk of bias assessment
The overall risk of bias of each study was rated according to the Newcastle-Ottawa Scale (NOS) for longitudinal case control and cohort studies (Wells et al., 2019). Three quality domains (selection, comparability, and exposure) and additional subdomains according to the NOS were assessed. Subdomains for case control studies were: adequacy of the case, representativeness of the cases, selection of controls, definition of controls, comparability of cases and controls on the basis of the design or analysis, ascertainment of exposure, same method of exposure ascertainment in cases and controls, and nonresponse rate. Subdomains for cohort studies were: representativeness of the exposed cohort, selection of the nonexposed cohort, ascertainment of exposure, demonstration that outcome of interest was not present at start of study, comparability of cohorts on the basis of the design or analysis, assessment of outcome, follow-up long enough for outcomes to occur, and adequacy of follow-up of cohorts. The NOS scores range from 0 to 9, with one point given for each subdomain reaching a predefined quality threshold (except for "comparability" where a maximum of 2 points could be allotted). It was modified to fit the bias assessment for twin and sibling studies, so that only twin studies could reach the maximum of 2 points for the comparability criterion. Consequently, sibling studies could not exceed 8 points. Studies with a score of 2 points or below were excluded. The quality of each study was assessed independently by two reviewers. If discrepancy could not be solved by consensus, a third reviewer was consulted.

Synthesis
Identified environmental factors were sorted according to chronology (prenatal; perinatal/neonatal; and infancy/childhood) and grouped by category for readability. For studies with categorical NDD outcomes, the relevant estimated association(s) were extracted. The estimates presented in the studies were sorted based on the type of measure of association used (hazard ratio; odds ratios; relative risks; and other). When no estimated association was reported, available data were used to calculate such estimates if possible. Since studies with dimensional measures routinely reported several estimated associations, an evaluation of these studies was conducted to determine if the overall findings provided a signal of an association or not (yes; possibly; or no).
A narrative synthesis of the eligible studies for each NDD was performed, with separate presentations of studies with categorical and dimensional outcomes. For environmental factors represented in more than one study for a specific condition, a judgement was made on the importance of the exposure across studies. The judgement was based on the estimated associations and the risk of bias in the respective studies. When appropriate, meta-analyses of the results on specific environmental factors and conditions were conducted, unless prevented by heterogeneity of the included studies' exposures, study characteristics, or data presentation (Higgins & Green, 2011).

Study selection
A total of 140 studies were identified for inclusion ( Figure 1). The search provided 7,315 unique citations. Two additional studies were identified from reference lists in published articles. After reviewing the abstracts, 7,061 citations were discarded in the preliminary screening, mainly due to not being consistent with the defined study design. The remaining 254 citations were examined in full text; of these 114 did not meet the eligibility criteria and were excluded (see Supplementary Appendix 2 available online). All included studies reported on within-pair associations, referred to as "association" below.

Study characteristics
In total, 58 studies (22 cohort studies and 36 case control studies) on ASD were included (see Table 1 for full list of references). All studies used a categorical definition of ASD, except for one study that used a dimensional outcome (Ronald et al., 2010) and one that used both . The studies were published between 1971 and 2019, with a steady increase from the year 2000. The studies were predominantly conducted in Scandinavia (k = 25) and North America (k = 18). The majority were sibling studies (k = 51), and seven were twin studies. The number of cases in the case control studies ranged from 5 to 1,133, with a median of 72, while the number of analyzed siblings or twins in the cohort studies ranged from 68 to 2 665,666, with a median of 921. Prospectively collected data were used in all but two of the cohort studies, and in approximately half of the case control studies. Regarding age at diagnosis, all but three of the cohort studies lacked information for the sibling subsamples, as well as a majority of the case control studies. When considering the general methodology of the complete samples, the risk of misclassification bias due to age of diagnosis was deemed low. The sex distribution among differently exposed sibling or twins was not reported in the majority of the cohort studies, while this distribution was skewed towards male cases in the case control studies. The NOS score for study quality ranged from three to nine, with more consistently high scores for cohort studies. Typical reasons for downgrading the study quality were ascertainment of exposure and definition of controls. See Table 1.

Prenatal exposure
The included studies examined a total of 42 prenatal exposures, 18 of which were investigated in more than one study (Table 2). Three factors were identified with predominantly positive findings. Advanced paternal age was found to be associated with ASD in three large population-based sibling cohort studies with HR (95% CI) between 1.39 (1.01-1.90) and 3.45 (1.62-7.33) and F (3, 631) = 2.40, P = .049 (D'Onofrio et al., 2014b;Hultman et al., 2011;Parner et al., 2012), while a small sibling case control study with a higher risk of bias failed to replicate this finding (Hadjkacem et al., 2016). Similarly, two populationbased twin cohort studies (Losh et al., 2012;Willfors et al., 2017) and two population-based sibling cohort studies (Class et al., 2014;Pettersson et al., 2019) found associations for low birth weight with HR 2.44 (95% CI, 1.99-2.97), OR (95% CI) between 3.25 (1.47-7.18) and 1.38 (1.31-1.44) and Z = − 2.20, p = .028, while three sibling case control studies with higher risk of bias did not (Chien et al., 2018;Mason-Brothers et al., 1990;Oerlemans et al., 2016). A similar pattern was seen for birth defects were two large population-based studies, one cohort and one case control, found an association with HR 1.3 (95% CI, 1.0-1.7) and OR 1.5 (95% CI, 1.0-2.3) (Dawson et al., 2009;Tillman et al., 2018), while one sibling case control study with higher risk of bias did not (Mason-Brothers et al., 1990). Mixed findings were reported for antidepressive medication during pregnancy (k = 5 studies) (with only one reporting a positive association ), advanced maternal age (k = 4), rubella infection during pregnancy (k = 2), birth order (k = 2), gestational weight gain (k = 2), stress during pregnancy (k = 2), as well as for a composite scores of prenatal complications (k = 9). No statistically significant within-pair associations were reported regarding maternal uterine bleeding (k = 4), maternal infection during pregnancy (k = 3), season of birth (k = 3), preeclampsia (k = 3), prenatal testosterone level (k = 2), urinary tract infection (k = 2), gestational diabetes (k = 2), and pre-pregnancy body mass index (k = 2). All these studies reported low effect sizes, except for the smaller of the two case control studies on urinary tract infection that reported a medium effect size (Hadjkacem et al., 2016). An additional 24 factors were investigated in single studies. These studies found associations of ASD with measles and mumps infections during pregnancy, an interpregnancy interval of more than year, metal uptake in uterus (lead and manganese), low serum level of vitamin D at birth, and a parity greater than two.
Perinatal and neonatal exposure Out of 19 perinatal and neonatal exposures, 17 were investigated in more than one study (Table 3). Twelve case control studies investigated composite scores of complications occurring during the neonatal period or earlier, of which nine reported an association. Predominantly positive findings were found for hypoxia and respiratory stress. Hypoxia was measured in two case control studies, of twins (N = 274) and siblings (N = 941), respectively, both of which showed a significant association with ASD with an OR (95% CI) of 1.71 (1.08-2.71) and 1.81 (1.21-2.69) (Froehlich-Santino et al., 2014;Glasson et al., 2004). Similarly, three out of four case control studies on respiratory distress found an association, both using twins (N = 274) (Froehlich-Santino et al., 2014) and using siblings (N = 1,125) (Glasson et al., 2004;Hadjkacem et al., 2016;Piven et al., 1993). Small effect sizes with OR (95% CI) between 1.64 (1.15-2.34) and 2.11 (1.27-3.51) were reported in all studies but one, where the confidence interval was wide, (Hadjkacem et al., 2016). Mixed findings were reported for preterm birth (k = 5 studies) with one large population-based sibling cohort study reporting an HR of 3.2 (95%CI, 2.6-4.0), labor induction (k = 4), jaundice (k = 4), and low Apgar scores (k = 3). No statistically significant within-pair associations were reported regarding elective (k = 6) and emergency (k = 3) cesarean section, general anesthesia during labor (k = 3), breech presentation (k = 3), gestation more than 42 weeks (k = 2), difficult labor (k = 2), umbilical cord around neck (k = 2), and resuscitation (k = 2). All these studies reported small effect sizes, except for one small sibling case control study on difficult labor reporting a medium effect size (Hadjkacem et al., 2016). Single studies found associations of ASD with incubation and neonatal respiratory infection.

Infancy and childhood exposure
Nine types of exposures in infancy and childhood were investigated (Table 3). Breastfeeding (k = 3 studies) and early exposure to antibiotics (k = 2) were the only factors included in more than one study. The studies on breastfeeding were of small sizes, assessed as having high risk of bias and showed mixed results with wide confidence intervals (Brown et al., 2014;Burd et al., 1988;Manohar et al., 2018). No statistically significant within-pair association was observed for early exposure to antibiotics with both sibling studies reporting small effect sizes (Grossi et al., 2018;Hamad et al., 2018). Single studies reported      Denmark 1996 and2006 (n = 668,468). For SSRI: 81 exposed and 6,036 unexposed. p All births in Sweden between 1996 and 2012. For SSRI: 9,063 exposed and 15,906 unexposed. q All births in Sweden between 1973 and 2012. r All births in Denmark between 1998 and 2008. s 17 MZ and 15 DZ twin pairs. t From the California Autism Twin Study (CATS). u All births in Sweden between 1992 and 2007. v High-functioning autism. w Ages at examination ranged from 8 to 29 years (M = 16.2, SD = 5.2). In total 70 MZ pairs and 49 DZ pairs.   Abbreviations: N = number of exposed twins/sibling or number of twin/sibling cases, depending on cohort or case/control study, SSRI = selective serotonin reuptake inhibitor, PCB = polychlorinated biphenyl, PVC = polyvinyl chloride, GWG = gestational weight gain, ns = nonsignificant at p = .05 level. a HR 0.81 (0.57 to 1.14) for SSRI.  Piven et al., 1993 reports without statistics that autistic subjects were more commonly first or fourth born. j All births in Sweden between 1973 and All births in Denmark between 1997 and 2013, in total differently exposed 8,156 siblings in analysis.
l Subjects had lower vitamin-D level (mean = 24.0 nM, SD = 19.6, n = 58) than in their siblings (mean = 31.9 nM, SD = 27.7, n = 5). m Optimal one or two as reference.  significant associations with recurrent infections in childhood, dysregulation during first year of life, and medical events the first 5 years of childhood.

Study characteristics
A total of 69 studies (53 cohort studies and 16 case control studies) on ADHD were included (see Tables 4 and 5 for full list of references). A categorical definition of ADHD was used in 30 studies, while 36 studies used dimensional outcomes and three studies applied both (Altink et al., 2008;Chatterji et al., 2014;Eilertsen et al., 2017). The studies were published between 1987 and 2019. Similar to ASD, the amount of publications increased considerably during the last decade. The studies originated mainly from Scandinavia (k = 36) and North America (k = 19). A twin design was used in 13 of the studies, while the remaining 57 studies used siblings as controls. The number of cases in the case control designs ranged from 16 to 3,447, with a median of 233.5. In the cohort studies, the number of analyzed siblings or twins ranged from 28 to 2 665,666, with a median of 12,674. Prospectively collected data were used in 46 cohort studies, and in four of the case control studies. Regarding age at diagnosis, all but two of the cohort studies on ADHD diagnosis lacked information for the sibling subsamples. As for the ASD studies, when considering the general methodology of the whole samples, the risk of misclassification bias due to age of diagnosis was deemed low. Of the 39 studies using a dimensional outcome, 11 were performed on participants aged five years or younger. Several case control studies included a larger proportion of males among the cases than among the controls, and the sex distribution was often insufficiently reported. The NOS quality scores of the studies ranged between 3 to 9, with low scores more frequently seen in the case control studies. Common reasons for reduced scores were ascertainment of exposure and definition of controls. See Tables 4 and 5.

Prenatal exposure
The studies included 19 prenatal exposures, of which 10 were investigated in more than one study (Tables 6 and 7). Predominantly positive associations were observed for fetal growth/birth weight. For a categorical outcome, two large population-based sibling studies (Class et al., 2014;Pettersson et al., 2019) and a small twin study (N = 38) (Lehn et al., 2007) showed associations including a reported HR of 2.44 (95% CI, 1.99-2.97), an OR of 2.36 (95% CI, 2.27-2.43) and t (18) = −1.99, p (one-tailed) = .031, while a study with higher risk of bias reported no association (N = 1,464) (Chatterji et al., 2014). The same pattern was seen for a dimensional outcome, with two large population-based sibling studies (Jackson & Beaver, 2015;Lim et al., 2018) and two twin studies (N = 8,594) (Groen-Blokhuis et al., 2011;Hultman et al., 2007;Pettersson et al., 2015;Tore et al., 2018) reporting associations, while studies with higher risk of bias reported no associations or mixed results (N = 2,581) (Asbury et al., 2006;Mascheretti et al., 2017). Mixed results were seen for smoking (k = 11 studies) and alcohol use (k = 3) during pregnancy, parental age (k = 6) and maternal depression (k = 2). Smoking was frequently studied, and, interestingly, somewhat different patterns emerged depending of types of outcome. Predominantly, no statistically significant within-pair associations were seen regarding a diagnosis of ADHD, with three large population-based sibling cohort studies (Obel et      Dimensional outcomes in italic. "-" and " " = not reported. Abbreviations: N = number of subjects, M = mean, SD = standard, deviation, pop. based = population based, pro. = pro. exposure data, retro. = retrospective exposure data, quasi-exp. = quasi-experimental, G×E = Gene×Environment interaction, ASD = autism spectrum disorder, ADHD = attention-deficit/hyperactivity disorder, ID = intellectual disability, CD = communication disorders, DCD = developmental coordination disorder, ADHD-sympt. = ADHD-symptoms, MZ = monozygotic, DZ = dizygotic, Clinical. = clinical assessment, Quest. = questionnaire. a All births in Demark between 1997 and 2013, in total differently exposed 12,467 siblings in analysis. b All births in Denmark between 1997 and 2010, in total 117,529 exposed to cesarean delivery and 483,546 exposed to antibiotic treatment, with 6,821 informative families in analyses.      association, and with one sibling case control study of higher risk of bias reporting an association (N = 906) (Altink et al., 2008). For dimensional measures of ADHD-symptomatology the results were less clear, with one large population-based sibling cohort study (Gustavson et al., 2017) and two sibling case control studies (D'Onofrio et al., 2008;Ellingson et al., 2014), showing no association, with two studies of higher risk of bias reporting an association (Altink et al., 2008;Mascheretti et al., 2017) and with two of a lower risk of bias reporting mixed results Marceau et al., 2017). A population-based cohort study on siblings found an association between alcohol use during pregnancy and a dimensional but not categorical ADHD outcome , while one population-based case control study (D'Onofrio et al., 2007) and one cohort study (Ichikawa et al., 2018), both on siblings, found no association using dimensional measures. For parental age, one sibling case control study found no association of parental age with a categorical outcome of ADHD (N = 476) (Oerlemans et al., 2016), one large population-based sibling cohort study found teenage birth to be protective, HR 0.81 (95%CI, 0.71-0.94), one did not, HR 1.28 (95%CI, 0.94-1.73) (Chang et al., 2014;Hvolgaard Mikkelsen, Olsen, Bech, & Obel, 2017), and one sibling case control study with higher risk of bias found an association with a small effect size between advanced maternal age and ADHD (N = 108) (Mimouni-Bloch et al., 2013). Furthermore, one population-based sibling cohort study found a strong association between advanced paternal age and ADHD diagnosis, HR 13.13 (95%CI, 6.85-25.16) (D'Onofrio et al., 2014b). Two sibling cohort studies with moderate risk of bias, including age of participants five years or younger, showed conflicting results regarding associations between maternal depression and dimensional outcomes of ADHD Nulman et al., 2015). No statistically significant within-pair associations were reported for antidepressive medication during pregnancy (k = 3 categorical and 2 dimensional studies), maternal infection (k = 2 categorical and 1 dimensional study), stress or adverse family life events during pregnancy (k = 2 categorical and 1 dimensional studies), maternal weight (k = 2 categorical studies), and birth order (k = 2 categorical studies). All these studies reported small effect sizes, except for one small sibling case control study on stress during pregnancy reporting a medium effect size, with wide confidence interval, on ADHD diagnosis (Grizenko et al., 2012). An additional 10 environmental factors were investigated in single studies. These studies suggested associations of head circumference at birth and orofacial clefts with ADHD diagnosis, and paracetamol exposure and possibly history of miscarriage with ADHD-symptoms.
Perinatal and neonatal exposure Out of 11 perinatal and neonatal exposures, the only factors included in more than one study were mode of delivery and gestational age (Tables 6 and 7). Regarding gestational age, two large sibling cohort studies found associations; one with a categorical, HR 2.3 (95% CI, 2.0-2.8), and one with a dimensional outcome (Ask et al., 2018;D'Onofrio et al., 2013a). For mode of delivery all studies used a categorical outcome of ADHD. The results were mixed depending on the specific mode. One large population-based sibling cohort study showed an association, HR 1.13 (95% CI, 1.01-1.26), between emergency cesarean All births in Demark between 1997 and 2013, in total differently exposed 12,467 siblings in analysis. l Transformed by us from logistic regression betas. m All births in Sweden between 1992 and 2000, in total 272,790 siblings with 91.0% families contributing with two siblings. n Obesity HR: 1.15 (0.85 to 1.56). o From a large regional health care system in the upper Midwest of the United States of America, in total 1,958 siblings in analysis. p Maternal infection requiring hospitalization during pregnancy. q All births between 1973 to 2008, in total 2,665,666 siblings in analyses. r All births in Denmark between 1997 and 2010, in total 117,529 exposed to cesarean delivery and 483,546 exposed to antibiotic treatment, with 6,821 informative families in analyses. s All births in Sweden between 1992 and 2000, with 430,344 siblings within 202,408 families.  No Alcohol use during pregnancy was associated to attention/impulsivity problems, but more exposed siblings did not have more problems than there less exposed siblings 5 Ichikawa et al. (2018) 550 No Regression analysis revealed no difference in attention problems between differently exposed siblings (β = −0.58, 95% CI −2.78-1.   section, but not for elective cesarean or assisted vaginal delivery (Curran et al., 2016), while another large population-based sibling cohort study found no statistically significant within-pair association with either form of cesarean section (Axelsson et al., 2019). A small twin study with higher risk of bias also found no association (N = 32) (Pearsall-Jones et al., 2008). One single sibling study found associations of ADHD diagnosis with a composite score of pre-, peri-, and neonatal complications (Ben Amor et al., 2005). Regarding dimensional outcomes, three single studies found an association of attention problems with heart surgery, hypothyroidism, and neuroblastoma, respectively, and one study found an association between higher levels of phenylalanine exposure and executive functions.

Infancy and childhood exposure
Twelve different exposures in infancy and early childhood were investigated (Tables 6 and 7). Breastfeeding (k = 2 studies), low income or transient income decline (k = 2), meningitis (k = 2), and parenting (k = 2, based on the same cohort) were examined in more than one study. Positive associations were found for low income or transient income decline with one large cohort study regarding ADHD diagnosis, HR 1.37 (95% CI, 1.07-1.75) , and one large cohort study regarding dimensionally assessed externalizing behaviors (Ramanathan et al., 2017). Regarding breastfeeding, one sibling case control study with high risk of bias reported an association between lack of breastfeeding at 3 months and a categorical outcome of ADHD (N = 108) (Mimouni-Bloch et al., 2013), while another sibling case control study showed no association to ADHD-symptoms (Mascheretti et al., 2017). Mixed results were found for meningitis and parenting, with all studies using dimensional outcomes. Single studies reported associations of ADHD diagnosis with parental divorce and maternal depression.

Study characteristics
A total of 26 studies (21 cohort studies and five case control studies) on ID or a dimensional measure of IQ were identified (see Tables 8 and 9  included analyses based on IQ scores, and one study used both (Petik et al., 2012). The studies were published between 1965 and 2019, and were predominantly from North America (k = 16). A twin design was used in eight of the studies. The number of cases in the case control studies ranged from 49 to 3,296. In the cohort studies, the number of analyzed siblings or twins ranged from 24 to 20,471, with a median of 73. The data were prospectively collected in all of the studies. Age at assessment were reported in all but six of the studies. When reported, the sex distribution was less skewed than in the ASD and ADHD studies. The NOS scores ranged from 5 to 9, with the most common reason for downgrading being representativeness of the exposed cohort. See Tables 8 and 9.
Prenatal exposure Seven prenatal exposures were identified (Tables 10 and 11). Fetal growth was investigated in six studies, with consistent results that differed between studies of ID and studies of IQ. Two twin studies of high quality (N = 248) and a case control study (N = 1,464) with higher risk of bias showed no statistical within-pair association with a diagnosis of ID (Chatterji et al., 2014;Monset-Couchard et al., 2004;Steingass et al., 2013). For IQ, on the other hand, two twin studies (N = 144) and one sibling cohort study (N = 50) found associations (Bellido-González et al., 2007;Churchill, 1965;Kilbride et al., 2004). Suicide attempts with Tardyl during pregnancy was associated with ID and IQ in one sibling study, one large population-based sibling case control study suggested an increased risk for ID linked to nonoptimal gestational duration, and one large population-based sibling cohort study reported increased risk for ID linked to orofacial clefts.
Perinatal, neonatal, infancy, and childhood Seven perinatal and neonatal exposures were investigated in single studies, of which not being breastfed on discharge from a special care unit was associated with ID (Tables 10 and 11). Seven different exposures in infancy and childhood were investigated, of which three were investigated in more than one study, all with dimensional outcome of IQ. Two small sibling cohort studies (N = 292) found an association between congenital hypothyroidism and IQ (Oerbeck et al., 2003;Rovet, 1986), while three cohort studies showed mixed results regarding malnourishment (Beardslee et al., 1982;Klein et al., 1975; Lloyd-Still et al.,  (2017) 2,069 Yes Exposed children had significantly more externalizing behavioral problems than the non-exposed matched siblings 7 Abbreviations: N = number of exposed twins/sibling or number of twin/sibling cases, depending on cohort or case/control study, SD = standard deviation, 95%CI = 95% confidence interval, MZ = monozygotic. a A total of 13,191 siblings.       1974), and two cohort studies found mixed results for congenital heart disease surgery (Ellerbeck et al., 1998;Schultz et al., 2017).

Developmental coordination disorder
Study characteristics A total of 13 relevant studies (12 cohort studies and one case control study) were found for DCD (see Tables 8 and 9 for full list of references). A categorical definition was used in the case control study (Pearsall-Jones et al., 2008), while the cohort studies used dimensional outcomes of motor skills. The studies were published between 1974 and 2017, and were predominantly from North America (k = 8). A twin design was used in six of the studies. The case control study included 16 cases. In the cohort studies, the number of analyzed siblings or twins ranged from 28 to 3,590, with a median of 67. The data were prospectively collected in all the studies but one (Pearsall-Jones et al., 2008). Age at assessment was reported in all but two of the studies. When reported, no pattern of skewness in the sex distribution could be seen. The NOS scores ranged from 4 to 9, with the most common reason for downgrading being representativeness of the exposed cohort and adequacy of follow up of cohorts. See Tables 8 and 9.

Exposures
Twelve different exposures were identified for DCD or motor skills (see Tables 10 and 11 for full list of references). They were too few in order to be sorted according to chronology. Fetal growth was found associated with motor skills in two twin cohort studies (N = 116) and one sibling cohort study (N = 50) (Kilbride et al., 2004;Monset-Couchard et al., 2004;Ylitalo et al., 1988). Two small sibling cohort studies (N = 292) found an association between congenital hypothyroidism and motor skills (Oerbeck et al., 2003;Rovet, 1986). Perinatal hypoxic risk was not significantly associated to motor skills in two studies from the same twin cohort (N = 56) (Raz et al., 1996(Raz et al., , 1998. One single population-based cohort showed an association with maternal paracetamol use during pregnancy (Brandlistuen et al., 2013), while one twin cohort study found an association with congenital heart disease surgery (Schultz et al., 2017), both using dimensional outcomes for motor skills. No other statistically significant within-pair associations were found.

Study characteristics
Eight eligible studies (seven cohort studies and one case control study) were identified for CD (see Tables 8 and 9 for full list of references). A categorical definition of communication disorder was used in one study (Tillman et al., 2018), while the remaining studies used dimensional outcome of language development. The studies were published between 1986 and 2018, and were predominantly from North America (k = 5). A twin design was used in two of the studies (Bishop, 1997;Schultz et al., 2017). There were 19 cases in the case control study (Bishop, 1997). In the cohort studies, the number of analyzed siblings or twins ranged from 28 to 16,275, with a median of 90. The data were prospectively collected in all the studies. Age at assessment was reported in all but two of the studies. When reported, no pattern of skewness in the sex distribution could be seen. The NOS scores ranged from five to eight, with the most common reason for downgrading being representativeness of the exposed cohort. See Tables 8 and 9.

Exposures
Seven different exposures were identified for CD or language skills (Tables 10 and 11). They were too few in order to be sorted according to chronology. Congenital hypothyroidism was linked to lower language skills in one sibling cohort study (N = 202) (Rovet, 1986), while another sibling cohort study showed an inconsistent and weak association (N = 90) (Oerbeck et al., 2003). One single large population-based cohort study suggested that orofacial clefts were associated with a categorical outcome of CD (Tillman et al., 2018). A possible association was also observed for fetal growth in preterm infants (Kilbride et al., 2004). No other statistically significant within-pair associations were found.

Study characteristics
Two studies were identified for TD (Tables 8 and 9). One of these was a large-scale Swedish cohort study based on registry data on Significant associations in bold.
Abbreviations: N = number of exposed twins/sibling or number of twin/sibling cases, depending on cohort or case/control study, NEC = necrotizing enterocolitis, ns = nonsignificant at p = .05 level. a Gestational age 37-41 weeks as reference. b All births in Sweden between 1973 and2003, including 947,942 families with at least two differently exposed children, and 3,563 families including siblings discordant for tic disorders. c No statistics reported, though Tourette's syndrome occurred in the lighter twin in all of the seven discordant twin pairs, with nine concordant pairs.  siblings from the general population (Brander et al., 2017). The other was a retrospective case control study from the USA based on 16 MZ twins pairs (Hyde et al., 1992). Both studies had a NOS quality score of 7.

Exposures
Seven different exposures were identified for TD (Tables 10 and  11). They were too few in order to be sorted according to chronology. Birth weight, the only exposure studied in both studies, was associated with both a diagnosis of tic disorder, HR 1.46 (95% CI, 1.06-2.01), and with symptom severity. No other statistically significant within-pair associations were found.

Specific learning disorder
No relevant studies were identified.

Discussion
Twin and sibling studies can help disentangle genetic and environmental contributions to the pathways underlying NDDs. In the current systematic review, we found evidence, beyond familial confounding, that advanced paternal age, low birth weight, birth defects, and perinatal hypoxia and respiratory stress are consistently associated with a diagnosis of ASD. We also found evidence that low birth weight, gestational age and low family income or transient income decline during childhood are associated with ADHD, both categorically and dimensionally. There was some evidence for congenital hypothyroidism being associated with lower IQ, low motor skills, and possibly low language skills, but our confidence in these results is limited due to a higher risk of bias. While some studies suggested low birth weight to be associated with TD, tic symptom severity and lower IQ, there was no association with a diagnosis of ID. Furthermore, we found no evidence that maternal uterine bleeding, maternal infection during pregnancy, season of birth, preeclampsia, prenatal testosterone level, urinary tract infection during pregnancy, gestational diabetes, prepregnancy body mass index, elective and emergency cesarean section, general anesthesia during labor, breech presentation, gestation longer than 42 weeks, difficult labor, umbilical cord around neck, resuscitation, and early exposure to antibiotics in childhood is associated with a diagnosis of ASD when familial confounding is taken into account; no evidence that antidepressive medication, maternal infection, and stress or adverse family life events during pregnancy are is associated with ADHD defined both categorically and dimensionally, or that maternal weight, smoking during pregnancy, and birth order are associated with a diagnosis of ADHD; Abbreviations: N = number of exposed twins/sibling or number of twin/sibling cases, depending on cohort or case/control study, SSRI = selective serotonin uptake inhibitor, ALL = acute lymphatic leukemia, SGA = small for gestational age, SD = standard deviation, 95%CI = 95% confidence interval, MZ = monozygotic. a Coxsackievirus, echovirus, or poliovirus. and no evidence that perinatal hypoxic risk is associated with low motor skills, when controlled for familial confounding. It is important to keep in mind that, in general, absence of evidence is not the same thing as evidence that no association exists. This is especially true when the empirical evidence is scarce, due to too few studies and/or small sample sizes. Regarding the associations of maternal uterine bleeding, preeclampsia, gestational diabetes, pre-pregnancy body mass index, and elective and emergency cesarean section, with ASD, our results point in the direction of evidence of no association beyond familial confounding. The same is the case for the associations of antidepressive medication, maternal infection, maternal weight, and maternal smoking during pregnancy, with a diagnosis of ADHD. For the rest, the conclusion to be drawn is that no clear statement can be made. The most extensively studied factors with conflicting findings are the associations between ASD and antidepressive medication during pregnancy, advanced maternal age, preterm birth, labor induction, and neonatal jaundice; and the associations between ADHD, both categorically and dimensionally, and alcohol use during pregnancy, and parental age. We found categorically cross-disorder associations of low birth weight (ASD, ADHD, and TD) and cross-dimensional associations for congenital hypothyroidism (lower IQ, low motor skills, and possibly low language skills).
With familial confounding being controlled for, the findings of the current review may point to several possible mechanisms underlying the associations between NDDs and environmental factors. For ASD, it has been shown that the father's age at conception correlates to the number of de novo mutations in their children (Kong et al., 2012). De novo mutations are in turn, among others, linked to ASD, thereby suggesting a possible genetic pathway (Neale et al., 2012;O'Roak et al., 2012;Sanders et al., 2012). For ADHD, a pathway has been hypothesized to explain the association between low family income or family income decline during early childhood and ADHD in offspring. These include evidence of a strong association between low SES and the prefrontal working memory system (Hackman, Farah, & Meaney, 2010), in turn described as a neuropsychological ADHD endophenotype (Castellanos & Tannock, 2002). As for the pathways underlying the association of restricted fetal growth with ASD, ADHD, and TD our cross-disorder finding is in line with a body of evidence linking fetal growth to these and several other psychiatric disorders. It has even been modelled that a general factor of psychopathology is linked to restricted fetal growth (Pettersson et al., 2019). Furthermore, birth weight differences have previously been linked to altered brain development (Walhovd et al., 2012), although with unknown mediating mechanisms. As for the link between smoking during pregnancy and ASD, Hultman, Sparén, and Cnattingius (2002) reported an OR of 1.4 (95% CI 1.1-1.8), but as shown in the most recent study by Kalkbrenner et al. (2020) this link is better explained by familial confounding, with the exposure of maternal smoking being associated with numerous social and social-class related factors and the possibility of genes affecting both exposure and outcome. This leads to the conclusion that factors considered to be environmental might actually not be strictly environmental. Therefore, it is a problem with referring to them as being 'nongenetic'. This has been pointed out before (Plomin, DeFries, Knopik, & Neiderhiser, 2016), and we suggest for future studies to more comprehensively consider the genetic basis of 'environmental' factors in order to help us understand the etiology of NDDs.
As noted above, some apparent discrepancies were observed when comparing categorical or dimensional outcomes. First, contrary to the associations with ASD, ADHD, and TD, fetal growth did not show an association with ID diagnosis, but to the level of IQ. This points to the possibility of a different mechanism for a clinical diagnosis of ID compared to IQ level in the rest of the distribution. This is in line with the findings of Reichenberg et al. (2016), which suggested that the profound ID is a distinct entity from milder ID, with different genetic and environmental influences to milder ID. Second, regarding ADHD it is interesting to note that for smoking during pregnancy, despite no evidence of it being associated to an ADHD diagnosis aside from familial confounding, three of the four studies with a positive association looking at dimensional outcomes noted a link to hyperactivity/impulsivity, but not to inattentiveness (Table 7). This suggests that these traits might have different underlying mechanisms. Although these dimensions are differentially implicated in neuropsychological impairment (Willcutt et al., 2012), the underlying mechanisms are still unclear.
The latter shows that using dimensional outcomes compared to categorical ones differentiates different symptom dimensions within the same condition. It has previously been shown that social and nonsocial traits in ASD are genetically dissociable (Happé & Ronald, 2008), and that hyperactivity/impulsiveness and inattentiveness in ADHD have distinguishable underlying pathways (Castellanos, Sonuga-Barke, Milham, & Tannock, 2006;Kuntsi et al., 2014;Luo, Weibman, Halperin, & Li, 2019;Sonuga-Barke, 2005). This review cannot answer whether this holds true also for different environmental factors and ASD, since first, only two of the included studies on ASD used a dimensional measure, and second, those two only used a combined measure of total ASD severity, not separated on social and nonsocial traits. So, it remains unclear if social and nonsocial traits in ASD are environmentally dissociable. Although the value of a dimensional approach in NDD research is now undisputed, it is also important to keep in mind that dimensional data do not necessarily have clinical relevance, and there might be a qualitative shift in mechanisms along the symptomatic continuum.
Strikingly, while there is a wealth of studies on exposures in ASD and ADHD, and to some extent low IQ/ID, there is little research on other NDDs, including few to no studies on CD, except for specific learning disorders, despite these being common in the general population (Aschner & Costa, 2015;Bishop, 2010). This systematic review also points to the lack of geographic dispersion with most twin and sibling studies being conducted in North America and Scandinavia, highly developed areas of the world both with regards of environmental regulations and health care. It may, for example, not be possible to generalize our findings on obstetrical complications not being associated with ASD, to areas of the world with less developed obstetrical and neonatal care. Additional factors, not yet identified, could potentially be of relevance for NDDs in other parts of the world. The limited geographical spread points to the existence of a global research bias and divide for NDDs. According to Zhang et al. (2017), only 1.13% of the research productivity worldwide in the field of psychiatry originates from low and lower-middle income countries.
In this review of genetically informed studies, we found evidence, albeit with modest effect sizes, for several environmental factors potentially on the casual pathways for different NDDs, particularly ASD and ADHD. Other previously discussed factors were questioned, such as season of birth and a series of obstetrical-and pregnancy-related factors. Interestingly, a recent meta-analysis on birth by cesarean delivery by Zhang et al.
(2019), came to a different result with an odds ratio [OR] of 1.33 (95% CI, 1.25-1.41) for ASD from 27 studies, and an OR of 1.17 (95% CI, 1.07-1.26) for ADHD from 13 studies. But, as the authors points out, the pattern of attenuation when performing sibling analyses suggested that the observed associations were likely due to familial confounding. Furthermore, our review found no evidence for antidepressive medication, maternal infection, and stress or adverse family life events during pregnancy being associated with ADHD, beyond familial confounding.
This systematic review integrates a number of methodological strengths. First, the most prominent strength is its size with 140 included articles. Second, it is the first systematic review in this growing field of research trying to rule out familial confounding in the search for causal environmental factors for NDDs. Third, the broad approach on NDD, rather than a single diagnosis only, of this review allowed to follow threads otherwise hard to follow regarding diagnostic specificity of particular findings. Fourth, we have included studies of both dimensional and categorical outcomes, addressing the possibility of different pathways for symptom/traits and diagnosis. Fifth, the diversity of the exposures covered reaching from pregnancy to early childhood, has allowed us to relate our findings to the timing of the exposure.
A potential limitation of the present review is the inclusion of early studies on environmental factors dating back decades. With recent study designs and statistical methods, potential environmental factors for ASD such as rubella infection during pregnancy and labor induction have been found to be confounded by familial factors, compared to results from earlier studies with higher risk of bias (Tables 2 and 3). This shows that with incautiously applied family designs we risk deeming risk factors as being free from familial confounding, when in fact, the full information that twins and siblings provide is not utilized to fully account for the familial confounding. This points to the need to utilize state-of-the-art methods for twin and family data. Therefore, it is time to reevaluate potential environmental factors from the past decades with a contemporary statistical approach. Another potential weakness of this review is that there are other ways to control for familial confounding than twin and sibling studies. Particularly, multi-generational population-based cohorts, not only including siblings, but also half-siblings and cousins, sometimes in a quasi-experimental design. Other ways to deal with familial confounding are, as previously discussed, based upon adoptions or in vitro fertilization (IVF) designs (D'Onofrio, 2014a). As explained by Harold et al. (2013), compared to family studies, these designs carry the advantage that further examination of associations between patterns of family interaction and child development is possible, as they also allow control for passive gene-environment interaction. As Loehlin (2016) highlights, the strength of adoption studies to estimate the effects of the prenatal and the postnatal environment, makes them well suited to investigate how familial confounding differentially applies to prenatal versus postnatal environmental risks. Furthermore, there is little control of comorbidity in the included studies. This limitation could not be addressed by this review, owing to a lack of reporting comorbidity in the primary studies examined. Future studies should be careful and comprehensive in mapping somatic and psychiatric comorbidity, which are frequent in NDD (Pan, Tammimies, & Bölte, 2019;Plana-Ripoll et al., 2019) and may have a significant impact on developmental mechanisms. Another potential limitation is discrepancy in age of diagnosis. Regarding ASD, most of the included cohort studies lacked specific information regarding the sibling subsamples. Despite this, the overall assessment of the included studies' methodologies gives little room for a misclassification bias being present. Regarding studies on ADHD, it is important to bear in mind that some of the results rely on dimensional outcomes at a young age thereby introducing a risk of misclassification bias. Finally, while the results indicate that some previously suspected environmental factors are due to familial confounding, we once again caution against general conclusions that absence of evidence of an association equals evidence of absence.

Conclusions and future directions
NDDs are common conditions, and although NDDs are highly heritable, environmental factors do contribute to their causal pathways and associated impairment. Studies on suspected environmental factors often suffer from the bias of familial confounding where exposures are in themselves heritable, with the risk of incorrectly connecting them to NDDs, possibly leading to waste of public resources, unnecessary worry, misleading advice, and eroded public trust.
The conclusions from this comprehensive systematic review of twin and sibling studies are as follows. First, we found evidence, beyond familial confounding, that: • advanced paternal age, low birth weight, birth defects, and perinatal hypoxia and respiratory stress are consistently associated with a diagnosis of ASD, and; • low birth weight, gestational age, and low family income or transient income decline during childhood are associated with ADHD, both categorically and dimensionally.
Second, our result points in the direction of evidence of no association beyond familial confounding regarding the associations of: • maternal uterine bleeding, preeclampsia, gestational diabetes, pre-pregnancy body mass index, and elective and emergency cesarean section, with ASD, and; • antidepressive medication, maternal infection, maternal weight and maternal smoking during pregnancy, with a diagnosis of ADHD.
Third, we found a substantial body of studies with conflicting findings regarding the associations of: • antidepressive medication during pregnancy, advanced maternal age, preterm birth, labor induction, and neonatal jaundice with ASD, and; • alcohol use during pregnancy, and parental age with ADHD, both categorically and dimensionally.
Fourth, there is a lack of geographic dispersion, with most twin and sibling studies being conducted in North America and Scandinavia. Additional factors, not yet identified, could potentially be of relevance for NDDs in other parts of the world. Finally, and perhaps most importantly, too few reliable conclusions can be drawn for conditions other than ASD and ADHD. This is unfortunate, given the considerable frequency of other NDDs, and points to a critical need of more genetically informed studies of good quality in the quest of the environmental causes of NDDs.