Higher intake of milk has been associated with a modestly lower risk of incident stroke, but not CHD, in a meta-analysis of observational studies(Reference Riboli, Hunt and Slimani1) and more recently in EPIC-CVD(Reference Corgneau, Scher and Ritie-Pertusa2). However, potential confounding and reverse causation cannot be excluded(Reference Soedamah-Muthu and de Goede3) and therefore any preventive effect of milk consumption in the development of stroke remains debatable.
A Mendelian randomisation (MR) approach(Reference Soedamah-Muthu and de Goede3) could be used to elucidate the causality of this association. An MR study is essentially an instrumental variable (IV) analysis, using one or more genetic variants as IV. The assumptions in an MR study are as follows: (1) the genetic variant is associated with the determinant of interest, (2) the genetic variant is not associated with known or unknown confounders of the determinant–outcome relationship and (3) there is no pathway from the genetic variant to disease that does not include the exposure of interest.
A genetic variant (rs4988235, −13910C > T) near the lactase gene fulfils the first IV assumption, as this variant has been associated in European populations with adult lactase persistence (LP)(Reference Tong, Appleby and Key4,Reference Davey Smith and Hemani5) , that is, the continued capacity to produce lactase in the intestinal lumen in adulthood. In absence of this enzyme, people cannot break down lactose from dairy products, leading to gastro-intestinal symptoms when they consume milk. Alleles conferring LP have been associated with a higher intake of milk in most European-ancestry cohorts(Reference Itan, Jones and Ingram6–Reference Vissers, Sluijs and van der Schouw12). Furthermore, our previous MR analysis investigating diabetes risk in EPIC-InterAct did not show violation of the second IV assumption(Reference Travis, Appleby and Siddiq10).
Previous MR studies in people of European descent reported no association between LP-associated milk intake and CHD(Reference Lamri, Poli and Emery11,Reference Bergholdt, Nordestgaard and Varbo13) or with total CVD(Reference Ding, Huang and Bergholdt14). A potential causal association between milk consumption and stroke risk has not yet been tested using MR. We therefore used rs4988235 in an IV analysis to investigate whether there is a causal relationship between LP-predicted milk intake and risk of total and ischaemic stroke and CHD in studies of participants of European descent. In addition, we performed gene-outcome analyses for the association of rs4988235 with stroke and CHD using data from the EPIC studies, supplemented with data from large-scale genetic consortia without quantitative data on habitual milk intake.
We performed a one-sample MR analysis, further described as IV analysis, in studies for which we had access to individual participant information on habitual intake of milk and other foods, the presence of cardiovascular risk factors, and stroke and CHD outcomes, as well as information on rs4988235. We performed gene-outcome analysis using data from several studies lacking quantitative data on habitual milk intake. The different studies are further described below.
Data for instrumental variable analysis
We used sub-populations of the EPIC study: the EPIC-CVD case-cohort study and the Dutch EPIC-NL study.
Among participants with a stored blood sample available, a representative sub-cohort was selected(Reference Yang, Lin and Au Yeung15,Reference Smith, Coltell and Sorli16) . During follow-up of the EPIC study, incident stroke and CHD cases occurred and these participants were added to the EPIC-CVD case-cohort population(Reference Langenberg, Sharp and Forouhi17). We included participants with information on intake of milk ((semi-)skimmed or full-fat, regardless of fermentation) from region-specific diet questionnaires, the development and validity of which were reported previously(Reference Yang, Lin and Au Yeung15,Reference Danesh, Saracci and Berglund18,Reference Kroke, Klipstein-Grobusch and Voss19) , and on incident stroke and CHD among those who had data on rs4988235. The EPIC-CVD case-cohort study population for this analysis consists of 13 114 sub-cohort participants including 397 stroke cases and 521 CHD cases from eight European countries, plus additional cases of stroke (n 3737) and CHD (n 8985) (online Supplementary Fig. S1). Of the 4134 total stroke cases in the full case-cohort population, 2746 (66 %) were identified as ischaemic strokes. In EPIC-CVD, variant rs4988235 was genotyped using the Illumina HumanCore Exome Chip array (n 16 685), or genotype dosage was imputed for those who were genotyped on the Illumina HumanCoreExome, Illumina OmniExomeExpress or Illumina Quad660 array (n 8055, impute info∼0·42).
In the Dutch EPIC cohort(Reference Margetts and Pietinen20), we selected an additional random sub-cohort. Incident CHD and stroke cases during follow-up of the EPIC-NL study that occurred after the follow-up for EPIC-CVD was completed were added to obtain the EPIC-NL case-cohort. Data from 3331 participants with information on milk consumption ((semi-)skimmed or full-fat, regardless of fermentation), CHD and stroke incidence and imputed rs4988235 genotype were used for analysis, including 843 CHD and 567 total stroke cases, of which 410 (72 %) were identified as ischaemic strokes. We genotyped participants using the Illumina Global Screening Assay BeadArray and rs4988235 was imputed at high quality (impute info = 0·91).
The EPIC-CVD study cohort and additional participants from EPIC-NL are described in more detail in the online Supplemental Methods. The EPIC studies were approved by institutional ethical review boards. All participants gave written informed consent prior to inclusion.
Data for gene-outcome analysis
For the gene-outcome analysis, we included publically available data from various consortia, namely UK Biobank, CARDIoGRAM GWAS and MEGASTROKE. We additionally included gene-outcome associations from EPIC-CVD and EPIC-NL after exclusion of the population overlap with CARDIoGRAM and MEGASTROKE (online Supplementary Table S1). A description of these cohorts can be found in the online Supplemental Methods.
In descriptive analyses, we assessed baseline characteristics and dietary intakes in the EPIC-CVD and EPIC-NL sub-cohort. We also tested for deviation from Hardy–Weinberg equilibrium(Reference Beulens, Monninkhof and Verschuren21) for rs4988235.
We investigated IV assumptions(Reference Soedamah-Muthu and de Goede3), namely whether rs4988235 was reliably associated with milk consumption and whether it was associated with cardiometabolic risk factors. For continuous variables, we reported a β and 95 % CI and for dichotomous variables, we reported an OR and 95 % CI. We conducted linear regression analyses stratified by imputed and hard-call data and pooled the outcomes with fixed-effects meta-analysis within EPIC-CVD. The models were adjusted for sex, age, the first two genetic principal components (PC)(Reference Rodriguez, Gaunt and Day22) and study centre, assuming an additive effect of rs4988235(Reference Price, Patterson and Plenge23). Additional information can be found in the online Supplemental Methods. Dairy non-consumers were not excluded from analysis since avoidance of dairy could be influenced by rs4988235 genotype. None of the aforementioned variables was missing for any of the participants.
Next, we performed a two-stage least squares IV analysis(Reference Dzialanski, Barany and Engfeldt24), to investigate causality of the association of LP-predicted milk consumption with risk of stroke and CHD separately. LP-predicted milk consumption was calculated for each participant using rs4988235, genotyping platform, sex, age, the first two genetic PC and study centre as predictors in a linear regression model. LP-predicted milk consumption was scaled to obtain interpretable estimates of the hazard ratio for CHD and stroke per 25 g/d increase in milk consumption. We fitted a Prentice-weighted Cox proportional hazard model to take the case-cohort design into account(Reference Burgess, Small and Thompson25) with age as underlying time scale and adjusted for sex, the first two genetic PC and study centre. We tested the effect of including additional genetic PC in our model, but this reduced explained variance (R 2 statistic) of the model and did not affect effect estimates. Analyses were performed by country in EPIC-CVD. Within countries, the analysis was stratified according to hard call v. imputed data for rs4988235 and subsequently pooled across strata using fixed effects. Country-specific results from EPIC-CVD and the additional EPIC-NL results were then pooled with inverse variance weights in a random effect meta-analysis using restricted maximum likelihood estimation. The I 2 statistic was used to assess heterogeneity.
We repeated the aforementioned IV analysis under the assumption of dominance of LP(Reference Prentice26), considering only participants with a C/C genotype lactase non-persistent. We also performed a sensitivity analysis restricting our CHD outcomes to myocardial infarction. Since rs4988235 was associated with smoking in EPIC-NL, and smoking is unlikely to mediate an effect of milk consumption on CVD, we performed a post-hoc analysis, adjusting our MR estimate for smoking status (never v. ever smoker).
For the gene-outcome analysis regarding stroke, we used log OR and standard errors for the association of rs4988235 with stroke from MEGASTROKE (which already included the EPIC-CVD data), and from UK Biobank (adjusted for the first 10 PC). In addition, the gene-outcome association between rs4988235 and stroke was investigated in EPIC-NL using a Prentice-weighted Cox regression with age as underlying time scale and identical covariates as in our IV analysis. The β and standard errors from MEGASTROKE, UK Biobank and EPIC-NL were pooled with inverse variance weights in a random effect meta-analysis using restricted maximum likelihood estimation to obtain a final estimate for the association between rs4988235 and risk of stroke.
For the gene-outcome analysis regarding CHD, we used log odds and standard errors for the association of rs4988235 with CHD from CARDIoGRAM and UK Biobank (adjusted for first 10 PC). In addition, the gene-outcome association between rs4988235 and CHD was investigated in EPIC-CVD and EPIC-NL using a Prentice-weighted Cox regression with the same approach and adjustment for covariates as in our IV analysis. We excluded EPIC-Norfolk from the EPIC-CVD analysis since this cohort was included in CARDIoGRAM. The β and standard errors from CARDIOGRAM, UK Biobank and the combined EPIC studies were pooled with inverse variance weights in a random effect meta-analysis using restricted maximum likelihood estimation to obtain a relative risk for the association between rs4988235 and risk of CHD.
All analyses and data visualisations were performed in R(Reference Sahi27) version 3.4.1, using the R packages survival(28), metafor(Reference Borgan29) and forestplot(Reference Viechtbauer30).
The EPIC-CVD sub-cohort participants were on average 52 (sd 9) years old and consisted of 60·7 % women (online Supplementary Table S2). LP (rs4988235 C/T or T/T) ranged from 30·1 % in Italy to 95·1 % in Denmark. We observed deviation from Hardy–Weinberg equilibrium (at P < 0·05) in the total EPIC-CVD sub-cohort, and in Denmark and Sweden (online Supplementary Table S3). The EPIC-NL sub-cohort had an average age of 51 (sd 11) years and consisted of 80·1 % women (online Supplementary Table S4). LP occurred at a frequency of 90·3 % and genotypes did not deviate from Hardy–Weinberg equilibrium (P = 0·20).
Median milk intake was 160 g/d (interquartile range 39–295) in EPIC-CVD (Table 1), ranging from 32 (interquartile range 2–97) in Germany to 294 (interquartile range 149–440) in the UK (online Supplementary Table S5), and 219 g/d (interquartile range 74–410) in EPIC-NL (Table 2). Dietary intake stratified by LP genotype is reported in online Supplementary Tables S6 and S7.
Dietary intake data are described as median (p25, p75).
* β derived from linear regression model, investigating association between additional rs4988235 T alleles with dietary intake, adjusted for sex, age, two genetic PC and study centre.
† P-value of linear regression model.
‡ I 2 for fixed-effects meta-analysis of analyses among full EPIC-CVD sub-cohort stratified by participants with hard call and imputed rs4988235 genotype.
Dietary intake data are described as median (p25, p75).
* β and 95 % CI derived from linear regression model, adjusting for sex, age, two genetic PC, and study centre.
† Includes coffee, tea, sugar or artificially sweetened beverages, fruit juice and alcoholic beverages.
Checking instrumental variable assumptions
Each additional T allele was associated with higher milk intake (EPIC-CVD: β 13·7 g/d (95 % CI 8·4, 19·1); EPIC-NL: β 36·7 g/d (95 % CI 19·8, 53·6)), but not with intake of total energy or with dairy products other than milk. Each T allele was also associated with lower meat intake (EPIC-CVD: β − 7·3 g/d (95 % CI − 12·1, −2·5); EPIC-NL: β − 4·4 g/d (95 % CI − 8·0, −0·9)) (Tables 1 and 2).
In EPIC-CVD (online Supplementary Table S8), each T allele was associated with a higher BMI (β 0·15 kg/m2; 95 % CI 0·04, 0·27) and waist-to-hip ratio (β 2·4 × 10−3; 95 % CI 7·4 × 10−4, 4·5 × 10−3), lower HDL cholesterol (β –0·02 mmol/l; 95 % CI –0·03, –0·01), and lower odds of having a history of hypercholesterolaemia (OR 0·90; 95 % CI 0·83, 0·99). In EPIC-NL (online Supplementary Table S9), rs4988235 was associated with higher odds of being a never smoker (OR 1·19, 95 % CI 1·03, 1·37) (online Supplementary Table S9).
In additive IV analysis, genetically predicted milk intake was not associated with risk of total stroke (HRper 25 g/d 1·04, 95 % CI 0·94, 1·16, I 2 = 23 %) (Fig. 1) or ischaemic stroke (HRper 25 g/d 1·06, 95 % CI 0·93, 1·21, I 2 = 22 %) (online Supplementary Fig. S1), although CI are wide. We repeated analysis for total stroke under the assumption of a dominant effect of LP and results did not differ (online Supplementary Fig. S2). Additional adjustment for smoking status in the IV analysis of stroke risk in EPIC-NL yielded an HR of 1·15, 95 % CI 1·02, 1·29 as compared with an HR of 1·14, 95 % CI 1·02, 1·27 in original analysis. Gene-outcome analysis did not provide evidence for an association between rs4988235 and total stroke risk (OR 1·02, 95 % CI 0·99, 1·05, I 2 = 39 %) (Fig. 2).
Genetically predicted milk intake was not associated with risk of CHD, with a pooled HRper 25 g/d of 1·02 (95 % CI 0·96, 1·08, I 2 = 0 %) (Fig. 3), nor with risk of myocardial infarction (HRper 25 g/d 1·00, 95 % CI 0·93, 1·07, I 2 = 0 %). Results were not materially different when assuming a dominant effect of rs4988235 (online Supplementary Fig. S3). Additional adjustment for smoking status in the IV analysis of CHD risk in EPIC-NL yielded an HR of 1·00, 95 % CI 0·91, 1·11 as compared with an HR of 1·00, 95 % CI 0·91, 1·10 in original analysis. In gene-outcome analysis, we found no association of rs4988235 with CHD risk, with a pooled OR of 0·99 (95 % CI 0·95, 1·04, I 2 = 73 %) (Fig. 4).
In this IV and gene-outcome analysis among people of European descent, we did not find evidence for a causal relationship between milk consumption and risk of stroke or CHD.
Our study has several strengths. First, we used population-based cohorts with extensive phenotyping for our IV analysis. This allowed us to investigate the association between LP and intake of various (dairy) foods and cardiovascular risk factors, so we can attribute the observed association to a well-defined exposure(Reference Gordon31). In addition, we included a large number of participants in genome-wide association study (GWAS) and population studies to investigate the association between rs4988235 and risk of stroke and CHD.
There are also limitations to address. First, previous studies have observed an association between genetic variation in the lactase region and consumption of total dairy(Reference Vissers, Sluijs and van der Schouw12,Reference Ding, Huang and Bergholdt14) . One of these studies showed a 30·3 g/d (95 % CI 21·3, 29·3) higher total dairy intake per LP allele, and a 26·4 g/d (95 % CI 16·7, 36·2) higher milk consumption, among 20 028 US participants. In EPIC-CVD, each additional LP allele was associated with a 13·6 g/d (95 % CI 8·4, 18·8) higher milk intake on average, but not with dairy products other than milk (β 1·9 g/d (95 % CI −0·9, 4·7)). Milk likely explains most of the association with consumption of total dairy as milk has the highest lactose content(Reference Holmes, Ala-Korpela and Smith32). However, there may be differences in the association of LP with dairy products other than milk between populations and this could affect findings from our gene-outcome analysis, since not all dairy products are the same in macro- and micronutrient content(33).
Second, we used dietary questionnaires to estimate milk intake, which are prone to both non-differential and differential measurement error. People with clinical symptoms of lactose intolerance may be more precise in their recall of dairy consumption, leading to differential misclassification. However, we expect misclassification to be non-differential between participants with and without later CVD. It is also likely that milk consumption in an individual is time-varying, but this is less relevant for an MR study since direction of the population effect is expected to remain the same. Also, we observed modest associations of rs4988235 with dietary and cardiovascular risk factors in EPIC-NL (e.g. more smoking) and EPIC-CVD (e.g. less meat intake), but we consider it unlikely that these modest associations reflect violation of the second IV assumption, as discussed in more detail in our previous EPIC-InterAct analysis(Reference Travis, Appleby and Siddiq10). Third, imputation quality for rs4988235 was low in a subset of the EPIC-CVD population (n 8055, impute info∼0·42), so there may be LP genotype misclassification. Also, we observed deviation of rs4988235 genotype frequencies from Hardy–Weinberg equilibrium within EPIC-CVD, which could reflect selection bias is genetic population cohorts(Reference Beulens, Monninkhof and Verschuren21). However, this deviation can be explained by the widely varying allele frequencies for rs4988235 between different European ancestry populations, and we have adjusted for potential resulting population stratification via principal components(Reference Rodriguez, Gaunt and Day22). Lastly, outcome definitions for stroke and CHD were somewhat different between the cohorts used, but IV sensitivity analyses after restricting to ischaemic stroke and myocardial infarction did not alter conclusions. Due to aforementioned limitations of our study, we cannot exclude a small effect of milk consumption on risk of stroke or CHD.
Despite a modest association found between LP and higher WHR and BMI, which could be due to higher energy intake in LP persons(Reference Kettunen, Silander and Saarela34), we did not observe an association with genetically predicted milk consumption on CHD risk. This lack of association is in line with findings from a meta-analysis of prospective cohort studies(Reference Riboli, Hunt and Slimani1), a recent EPIC analysis(Reference Key, Appleby and Bradbury35) and a previous MR study among a Danish population(Reference Lamri, Poli and Emery11).
Regarding total and ischaemic stroke, we did not find support for a protective effect of milk consumption in the IV or gene-outcome analysis, although CI for the IV analysis were wide. The biological mechanism of a protective effect of milk intake on stroke risk is thought to work via hypertension(Reference Meschia, Bushnell and Boden-Albala36), potentially through intake of minerals such as Ca(Reference van Mierlo, Arends and Streppel37) and K(Reference Geleijnse, Kok and Grobbee38). We observed no association of LP with hypertension, or with systolic or diastolic blood pressure. In line with our findings, a previous MR study found no association between genetically predicted dairy consumption and hypertension(Reference Vissers, Sluijs and van der Schouw12). However, a recent meta-analysis of fifteen prospective cohort studies (4 381 604 participants, 25 377 stroke cases) reported an inverse association (RRper 200 g/d 0·92, 95 % CI 0·88, 0·97) between consumption of milk and risk of stroke(Reference Riboli, Hunt and Slimani1). This could be because aforementioned association was mainly driven by East Asian populations, whereas a null association was observed in Western populations. It could be hypothesised that people from East Asian descent benefit more from milk consumption due to genetic differences with people from European descent, or that increasing milk intake on top of a Western diet – characterised by a high consumption of refined cereals, sugars and vegetable oils, and by consumption of dairy products, fatty meats and salt(Reference Cordain, Eaton and Sebastian39) – does not have a beneficial effect on stroke risk. Another hypothesis could be that the association between milk intake and risk of stroke is non-linear, although this cannot fully explain our results since the meta-analysis suggests some protective effect of milk intake on stroke risk at any intake level(Reference Riboli, Hunt and Slimani1). We were unable to test the hypothesis of a non-linear association due to insufficient power to perform non-linear MR.
In conclusion, in IV analyses including 4611 total stroke and 9828 CHD cases and gene-outcome analyses including 50 804 stroke and 61 612 CHD cases, we did not find evidence of a causal relation between milk intake and risk of stroke or CHD in European populations. This suggests that the inverse association between milk intake and stroke in observational studies may be due to confounding. Given the stronger observational evidence for the association between milk intake and stroke in East Asian populations with generally lower milk intakes, future studies could focus on investigating the relationship between milk intake and stroke in people of East Asian descent, or on a potential non-linear association between milk intake and stroke.
We thank all EPIC participants and staff for their contribution to the study. We also thank staff from the EPIC-CVD coordinating centres for sample preparation and data handling. This research has been conducted using the UK Biobank Resource (application number 29916). Data on coronary artery disease have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.CARDIOGRAMPLUSC4D.ORG. I Sluijs was supported by a personal Dr. Dekker postdoctoral grant (2015T019) from the Netherlands Heart Foundation. NGF and FI acknowledge core Medical Research Council Epidemiology Unit support (MC_UU_12015/5) and NGF acknowledges NIHR Biomedical Research Centre Cambridge: Nutrition, Diet, and Lifestyle Research Theme (IS-BRC-1215-20014).
The InterAct project was funded by the EU FP6 programme (grant number LSHM_CT_2006_037197) and provided the biomarker data in the sub-cohort that was used in the current study. These analyses were supported by Cancer Research UK (C8221/A19170). The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by Danish Cancer Society (Denmark); German Cancer Aid, German Cancer Research Center (DKFZ), Federal Ministry of Education and Research (BMBF), Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); Health Research Fund (FIS) PI13/00061 (EPIC-Granada) and PI13/01162 (EPIC-Murcia), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, ISCIII Health Research Funds RD12/0036/0018 (cofounded by FEDER funds/European Regional Development Fund ERDF) (Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C570/A16491 and C8221/A19170 for EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk, MR/M012190/1 to EPIC-Oxford) (UK). EPIC-CVD has been supported by the European Commission Framework Programme 7 (HEALTH-F2-2012-279233), the European Research Council (268834), the UK Medical Research Council (MR/L003120/1), the British Heart Foundation (RG13/13/30194 and RG/18/13/33946) and the National Institute for Health Research (Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust). The MEGASTROKE project received funding from sources specified at http://www.megastroke.org/acknowledgments.html.
L. E. T. V. analysed the data and drafted the manuscript. L. E. T. V., I. S. and Y. T. vdS. had access to all data for this study. L. E. T. V., I. S., Y. T. vdS., S. B., N. G. F., H. F., F. I., T. K. N., F. R., E. W., K. A., C. D., A. P. C., M. B. S., T. Y. N. T. and A. S. B. contributed to study conception, design and interpretation of data. All authors contributed to critical revision of the manuscript and approval of version to be published.
The authors have no conflict of interest to declare.
For supplementary materials referred to in this article, please visit https://doi.org/10.1017/S0007114521004244