Birth weight and cardiometabolic risk factors: a discordant twin study in the UK Biobank

Abstract One of the longstanding debates in life-course epidemiology is whether an adverse intrauterine environment, often proxied by birth weight, causally increases the future risk of cardiometabolic disease. The use of a discordant twin study design, which controls for the influence of shared genetic and environmental confounding factors, may be useful to investigate whether this relationship is causal. We conducted a discordant twin study of 120 monozygotic (MZ) and 148 dizygotic (DZ) twin pairs from the UK Biobank to explore the potential causal relationships between birth weight and a broad spectrum of later-life cardiometabolic risk factors. We used a linear mixed model to investigate the association between birth weight and later-life cardiometabolic risk factors for twins, allowing for both within-pair differences and between-pair differences in birth weight. Of primary interest is the within-pair association between differences in birth weight and cardiometabolic risk factors, which could reflect an intrauterine effect on later-life risk factors. We found no strong evidence of association in MZ twins between the within-pair differences in birth weight and most cardiometabolic risk factors in later life, except for nominal associations with C-reactive protein and insulin-like growth factor 1. However, these associations were not replicated in DZ twin pairs. Our study provided no strong evidence for intrauterine effects on later-life cardiometabolic risk factors, which is consistent with previous large-scale studies of singletons testing the potential causal relationship. It does not support the hypothesis that adverse intrauterine environments increase the risk of cardiometabolic disease in later life.


Introduction
Cardiometabolic diseases are major contributors to mortality and morbidity worldwide, 1-3 yet current therapeutic strategies mostly target individuals after the onset of clinical symptoms. The Developmental Origins of Health and Disease (DOHaD) hypothesis postulates that an adverse environment in utero or in the early years of life causally increases the future risk of cardiometabolic disease. [4][5][6][7] Evidence in favour of this theory has primarily come from experimental studies on animals (reviewed in ref. 8 ), which may not generalise to humans, and observational epidemiological studies, which are susceptible to confounding and bias. 9 However, because randomised controlled trials, the gold standard for determining causality in humans, cannot be performed easily in this context, evidence for this hypothesis in humans is still debated.
Observational epidemiological studies of singleton individuals have reported associations between birth weight and later-life cardiometabolic outcomes. 10,11 However, it is unclear if these results reflect the influence of confounding factors or a true causal relationship. The discordant twin study design is an epidemiological method, which controls for genetic and shared environmental confounding factors, 12 and can be used to investigate causal relationships within the context of DOHaD. Monozygotic (MZ) twins share all of their genes identical by descent and are matched for many early-life environmental exposures (including intrauterine and perinatal factors). This means that any phenotypic differences within an MZ twin pair cannot reflect the influence of shared genetic or environmental factors, enabling the investigation of the impact of environmental experiences during development specific to each individual within a pair.
Birth weight has been used widely as a proxy for intrauterine growth and development. 13 Birth weight may be affected by pre-natal factors common to a twin pair (such as gestational age, maternal environment, common genetic factors) and pre-natal factors specific to each twin (such as twin specific fetoplacental environmental conditions). The difference in birth weight between MZ twins in a pair (i.e. birth weight discordance) must therefore relate to specific factors affecting the growth of each individual fetus and cannot be a result of shared factors. Thus, the discordant twin design, including within-between model, 14 allows the partitioning of the total association between birth weight and cardiometabolic disease risk factors into a part that operates across different twin pairs (referred to here as the "between-pair association") and a part that operates within members of the same twin pair (referred to here as the "within-pair association"). If cardiometabolic disease risk factors are associated with within-pair differences in the birth weight of MZ twins, then factors specific to each fetus, but not common factors, must be involved in the underlying causal pathway. Furthermore, these differences must be the result of environmental exposures that operate pre-natally. Thus, significant within-pair associations would be consistent with the hypothesis that (unshared) intrauterine factors causally increase cardiometabolic risk. In contrast, significant between-pair associations would suggest that shared confounding factors (genetics, common environmental factors) explain at least some of the relationship between birth weight and future cardiometabolic risk.
To further understand the role of genetics in the relationship between birth weight and later-life disease, it is also possible to compare the between and within twin pair estimates obtained from MZ and dizygotic (DZ) twin pairs separately. MZ twins are genetically identical, while DZ twins on average share half of their genes identical by descent. A stronger within-pair association in DZ compared to MZ twins would suggest that some of the relationship between birth weight and cardiometabolic risk factors can be explained by genetic factors.
The discordant twin design has previously been used to explore potential causal relationships between birth weight and cardiometabolic risk factors (systemically reviewed in ref. 15 ). However, this previous work has been affected by two major limitations. Firstly, most of the studies modelled twins as individuals, 15 rather than partitioning the between-and within-pair associations, therefore the observed associations were still subject to confounding factors that differ between twin pairs, 14 such as gestational age. Only a few studies to date that used the within-between model to investigate the relationship between birth weight and body mass index (BMI) and blood pressure (reviewed in ref. 15 ). Secondly, previous studies using the within-between model in the discordant twin design generally analysed cardiometabolic outcomes measured in early to middle age, many years before the typical onset of common cardiometabolic disease.
In the current manuscript, we build on these earlier studies by conducting a discordant twin study using the within-between model in up to 268 twin pairs from the UK Biobank (120 MZ twin pairs and 148 DZ twin pairs aged 40-69 years). We explore the relationship between birth weight and a broad spectrum of cardiometabolic risk factors in later life.

Methods
We used data from MZ and DZ twin pairs who participated in the UK Biobank, a study of over 500,000 individuals aged 40-69 years with a broad range of health-related information, including genome-wide genetic data and a whole spectrum of phenotypic data. 16 We examined the association between self-reported birth weight and a variety of cardiometabolic risk factors that were measured in later life, including systolic blood pressure (SBP), diastolic blood pressure (

Definition of ethnicity
Since birth weights vary by ethnicity, 17 we restricted our sample to "White British" participants. To identify the subset of the UK Biobank participants of European ancestry ("White British"), we conducted ancestry informative principal component (PC) analysis using samples from Phase 3 of the 1000 Genomes Project 18 as a reference. The UK Biobank samples were then projected into this PC space according to the single nucleotide polymorphism (SNP) loadings generated from the PC analysis. The UK Biobank participants' ancestry was classified using K-means clustering centred on the 26 different 1000 genomes populations. Those clustering with the British/European (GBR/CEU) clusters were classified as having "White British" ancestry. The method has been described in detail elsewhere. 19,20 Phenotype preparation Participants were invited to attend the assessment centres at three time points (baseline [initial assessment visit 2006-2010], and two follow-ups [first repeat assessment visit from 2012 to 2013 and imaging visit from 2014 to 2015]) in the UK Biobank. All recruited participants (~500,000) were invited to attend the baseline assessment, and only~20,000 and~7,000 participants for the two followups, respectively.
Invited participants had their SBP and DBP measured at three time points in the UK Biobank, using either an automated machine or a manual sphygmomanometer. The protocol was for each participant to have their blood pressure measurement taken twice, a few moments apart, at each follow-up. Two valid readings were available for most participants (>97%). For each follow-up, the average of the two SBP/DBP measures was calculated if both were recorded; otherwise, the one recorded measure was used. Individuals who had two readings with a difference of more than 4.56 standard deviations (SD) were excluded. If the participant reported taking blood pressure medication at the time of their measurement, then 15 mmHg and 10 mmHg were added to their SBP and DBP measurement, respectively. 21 If participants had an SBP/DBP measurement from the automated machine at baseline, then this measure was used for analysis; otherwise, we used SBP/ DBP measured at either follow-up one or two, or the manual measurements from baseline, follow-up one or follow-up two (in this order).
A blood sample was provided by the invited participants at two time points in the UK Biobank (baseline and the first follow-up), which was used to assay a panel of biomarkers by the UK Biobank using standard laboratory procedures ( http://Biobank.ndph.ox. ac.uk/showcase/refer.cgi?id = 5636). If the participant reported taking cholesterol-lowering medications, LDL-C, total cholesterol, TG and ApoB were adjusted by dividing by the constants 0.7, 0.8, 0.8 and 0.75, respectively. [22][23][24][25] If the participant reported being on insulin at the time their blood sample was taken (0.6% of participants), they were excluded from the analyses of glycaemic biomarkers. If participants had a biomarker measurement at baseline, then this was used; otherwise, their biomarker measurement at follow-up was used. BMI, Lp(a), TG and CRP were natural log-transformed for analysis.
Standing height and weight were measured in cm and kg, respectively, at two time points (baseline and the first follow-up). BMI was then calculated using the formula weight (kg)/height 2 (m 2 ). If participants had a BMI measurement at baseline, then this was used. Otherwise, their BMI at follow-up was used (in this order).
Participants were included in the analysis if they had selfreported their own birth weight at baseline or at least once during follow-up in UK Biobank. Self-reported birth weight from baseline, follow-up one or follow-up two were used (in this order) for analyses.
All measures of each trait greater or less than 4.56 SD from the mean in the whole UKB cohort were set to missing prior to analysis.

Twin identification and zygosity determination
The UK Biobank released kinship coefficients and estimates of the proportion of SNPs with zero alleles shared identical-by-state (IBS0) generated by the KING software package, as described previously. 26 Twin pairs were identified using the previously described kinship coefficient (ϕ; MZ: ϕ > 1 2 3=2 ; DZ: 1 2 3=2 > ϕ > 1 2 5=2 ), the fraction of markers for which they share no alleles (i.e. IBS0; DZ: IBS0 > 0.0012) 16,26 and reported date of birth (identical). After quality control, a total of 120 MZ and 148 DZ twin pairs who reported their own birth weight were available for analysis (not all twins had data available for each of the cardiometabolic risk factors of interest, so numbers may be smaller for specific analyses, see Table 1).

Statistical analysis
In the primary analysis, we tested the association between cardiometabolic risk and birth weight using a linear mixed model 14 of the form: where Y ij represents the cardiometabolic risk factor for the jth individual from the ith twin pair, X ij is the birth weight of individual j in twin pair i and X i is the average birth weight for the twin pair i. In this formulation, β 0 represents an intercept term, β W is the within-pair coefficient, β B the between-pair coefficient, β S the fixed effect of sex, β A the fixed effect of age, α i a random intercept for family membership and ϵ ij is an individual-specific random error term. All analyses were performed in R (version 3.5.3). The model was fitted using restricted maximum likelihood (REML) in the 'lme4' package (version 1.1-21). MZ twins and DZ twins were analysed separately. To ensure our results were not influenced by violations of the normality of errors assumption, we conducted a sensitivity analysis using inverse normalised outcomes (i.e. the cardiometabolic risk factors, or Y ij ).
To compare our results with previous studies, we conducted a secondary analysis, where we regressed the cardiometabolic risk factor on birth weight in the entire sample of MZ and DZ twins while accounting for the paired nature of the data: where β C is a regression coefficient estimating the change in Y ij for a one-unit change in X ij . The model was fitted using REML using the 'lme4' package to ensure the correct calculation of the standard errors. 14 We performed simulations to investigate the power of detecting within-pair associations. For each scenario, we generated 1,000 replicates where we analysed 120 MZ twin pairs (N = 240) who had simulated phenotypic data for their birth weight and a hypothetical cardiometabolic outcome. For each replicate, we generated the mean birth weight for each twin pair using a mean of 2.3 kg and a SD of 0.55 kg (i.e. similar to what we observe in UK Biobank). We set the mean and SD of the within-pair difference X ij À X i À Á to 0 and 0.23, respectively (similar to empirical analysis). We generated the individual's outcome variable for each individual using the following equation: where Y ij represents the cardiometabolic risk factor for the jth individual from the ith twin pair, X ij is the birth weight of individual j from twin pair i and X i is the average birth weight for twin pair i. In this formulation, β W is the within-pair coefficient, and β B the between-pair coefficient. The α i is a random intercept for family membership, which is a random normal variable for each family with a mean zero and a SD of 0.8 (similar to the empirical analysis). The ϵ ij is an individual-specific random error term, which is a random normal variable with mean zero and variance selected to ensure that the simulated cardiometabolic outcome had unit variance asymptotically. Power was calculated as the proportion of tests that reached P < 0.05 and <0.0033 (i.e. Bonferroni correction P < 0.05/15 phenotypes = 0.0033), respectively, under the alternative hypothesis when the true within-pair effect was non-zero.

Results
The MZ twins (80% female) had a mean birth weight of 2.26 kg (SD = 0.59 kg), which was slightly lower than the mean birth weight in the 296 DZ twins (69% female; 2.33 kg [SD = 0.68 kg]; Table 1). In the main analysis of MZ twins, we identified nominal within-pair associations between birth weight and both CRP (log-transformed beta = −0.48 [95% CI, −0.83, −0.12], 48% increase in CRP per kg decrease in birth weight, P = 0.01) and insulin-like growth factor 1 (IGF-1; 1.86 nmol/L per kg increase in birth weight, [95% CI, 0.19, 3.54], P = 0.03; Table 2). However, there was not strong statistical evidence for either of these associations after multiple testing corrections (Bonferroni correction P > 0.0033 = 0.05/15). We did not observe within-pair associations between birth weight and any of the remaining cardiometabolic risk factors in MZ twins. No between-pair associations were observed in the analyses of birth weight and CRP or IGF-1 (Tables 1 and 2). In the DZ twin pairs, we did not observe a within-pair association with either CRP or IGF-1. A nominal within-pair association was observed between birth weight and BMI (log-transformed beta = 0.07 [95% CI, 0.02, 0.12], 7% increase in BMI per kg increase in birth weight, P = 0.01; Table 2). A within-pair association with BMI was not observed in MZ twins, indicating that this association might be due to the genetic differences between DZ twins. However, the association did not pass multiple testing correction, suggesting that the DZ association could be a type I error. The analysis using inverse normal transformed outcomes did not make an appreciable difference in the direction nor statistical significance of these associations (Supplementary Table S1).
In the secondary analysis when treating twins as individuals, a nominal negative association was found between birth weight and CRP (log-transformed beta = −0.18 [95% CI, −0.33, −0.02], 18% increase in CRP per kg decrease in birth weight, P = 0.03, Supplementary Table S2).
Our power calculations (assuming Bonferroni correction) show that we have >80% power to detect a true within-pair effect where at least 2.24% of the phenotypic variance in the outcome (equivalent to an effect size β W = 0.65 on a standardised outcome with unit variance per one kilogram within-pair difference increase in birth weight) is attributable to the within-pair birth weight difference. The effect size is comparable to the estimates in the sensitivity analysis of inverse normalised transformed outcomes (Supplementary Table S1, e.g., CRP β W = −0.42). For the power calculation without Bonferroni correction, we can detect a within-pair effect of as little as β W = 0.5 (1.32% variance explained) to reach a power >80% (Supplementary Table S3).

Discussion
In the current study we investigated whether there was any evidence for developmental origins of a range of cardiometabolic risk factors using twin pairs from the UK Biobank. The pair-wise analysis of MZ twins showed no within-pair association between the birth weight differences and most cardiometabolic risk factors in later life, which is contradictory to most conventional observational studies in twins. 15 However, our findings are mostly consistent with previous large-scale studies of singletons that investigated the relationship between intrauterine growth and later-life risk of cardiometabolic disease using Mendelian randomisation (MR). 20,27 Given both MR and the discordant twin study design are more stringent in terms of testing causality than conventional observational studies of singleton individuals, this lack of evidence for a major causal effect is supported by at least two complementary study designs. Therefore, the current study provides no strong evidence to suggest that the observational association between low birth weight and higher cardiometabolic risk is attributable to intrauterine programming, as opposed to inherited genetics or postnatal confounding factors.
Few previous twin studies have used the within-between model to investigate the relationship between birth weight and later-life traits, other than BMI and SBP. A recent large-scale meta-analysis study investigated the relationship between birth weight and BMI measured between age 1 and 49 years in 27 twin cohorts. 28 They found a persistent positive within-pair association of birth weight with later BMI during childhood, but this positive association attenuated from late adolescence. Additionally, they observed no association between birth weight and BMI measured between 40 and 49 years in either MZ or DZ twins. 28 The authors proposed that the positive association of birth weight with BMI in childhood and adolescence may be partly due to the effects of intrauterine growth on the number of adipocytes and muscle cells, which remain constant into childhood and adolescence. They implied that the attenuated association between birth weight and BMI in later adulthood might be attributable to environmental factors that influence BMI independently of intrauterine effects.
In our study, where BMI was measured at 40-69 years of age, we replicated the previous finding of no within-pair association between birth weight and BMI in later adulthood. 28 We identified a positive within-pair association between birth weight and BMI in DZ twins but not in MZ twins. If replicated, this finding would suggest that genetic factors may play a more important role in the birth weight-BMI association than intrauterine programming in  middle-to-late adulthood. Our finding is also in concordance with a large-scale GWAS meta-analysis performed in the UK Biobank and several other population based cohorts, where genetic correlation analyses suggested that the phenotypic association between birth weight and BMI was primarily driven by genetic pleiotropy, probably predominantly through the fetal genome. 20,29 Similarly, we did not observe a within-pair association between birth weight and either SBP or DBP. This result is consistent with previous twin studies using the within-between model 15 and MR studies in singletons. 20,27 The lack of within-pair association between birth weight and the lipid or glycaemic biomarkers is also consistent with the previous reported non-significant genetic correlations between birth weight and those traits (total cholesterol, TG, HDL-C, LDL-C, non-fasting glucose, HbA1C) in later life. 12,20 We observed nominal associations of within-pair differences in birth weight with IGF-1 and CRP in later life in MZ twins, but the results did not pass correction for the multiple statistical tests we conducted, which indicates the results could be type 1 error. In addition, neither of the associations were observed in DZ twins, which is unexpected given DZ twins share more similarity to each other genetically (on average 0.5 of the genetic makeup) than independent individuals. We also showed a suggestive inverse association between birth weight and CRP in adulthood in the sensitivity analysis where we treated twins as individuals. This agrees with the observational associations in previous studies. 30 The power calculation further indicates that we do not have enough power to detect the small to moderate effect that explained less than 2% of the phenotypic variance. The associations between birth weight and both IGF-1 and CRP in MZ twins, therefore, need to be replicated in larger cohorts.
The main strength of the present study is that we used the discordant twin design to partition phenotypic associations into between and within family components, which helps control for confounding by genetic factors, postnatal environmental factors and pre-natal environmental factors that are shared between twins. In this setting, we were able to test whether there is a causal relationship between birth weight and cardiovascular risk factors later in life by controlling not only the effect of an individual's genome on their own traits, but also indirect effects via the maternal genome. In addition, the UKB has enabled us to test the associations of birth weight and a wide range of cardiometabolic risk factors measured in middle and old age, by which age cardiometabolic diseases are highly prevalent.
There are several limitations with the current study. First, the number of twin pairs in the UK Biobank is much smaller than many singleton cohorts that have been used to investigate the observational association between birth weight and later-life cardiometabolic risk. Therefore, our study may be underpowered to detect small causal effects of birth weight on these later-life risk factors. A replication study or meta-analysis in a larger twin cohort would be desirable. Second, although our twin study accounts for genetic and environmental confounding factors that are shared between twins, the results might not fully capture the causal effects operating in singleton populations due to the fact that twins and singletons are exposed to different environments in utero. Thus, validation of the findings of twin studies using causal modelling in the general population (e.g. by using MR studies) should be implemented. Third, participants volunteered to participate in the UKB study, with a participation rate of 5.45%, 31 which could introduce selection biases by enrolling healthier people of higher socioeconomic status than the general population, 32 as well as issues regarding the generalizability of the findings. Fourth, although the nominal associations in CRP and IGF-1 did not pass multiple testing corrections, the Bonferroni method could be too conservative. Hence, the results need to be replicated in a larger sample. Fifth, the sample sizes differ between the outcomes due to either a failure to test the individual (e.g., declined by participants or poor venous access) or exclusion based on quality control checks. However, we believe that there is no major propensity for data to be missing that is related to either birth weight or the measures taken, hence, we expect the data to be missing at random. Sixth, the self-reported birth weights have not been validated by hospital records in UK Biobank. However, studies in other cohorts have demonstrated that self-reported birth weight was highly correlated with official records. 33,34 Finally, gestational age was not reported in the UK Biobank so could not be incorporated into our analysis. However, adjustment for gestational age would influence the between-pair association (and not the within-pair association) as both twins share the same gestational age, which would not influence our main conclusions.
In conclusion, the present findings provide no strong evidence that pre-natal environmental factors (for example, intrauterine growth restriction) specific to each twin are major determinants of the association between low birth weight and high cardiometabolic risk. Inherited genetic risk and/or postnatal environmental exposures might be of greater importance than intrauterine environmental exposures in the pathogenesis of cardiometabolic diseases. Our results require replication in larger twin cohorts.