Epidemiological evidence suggests that vitamin D may protect against colorectal, prostate, breast and oropharyngeal cancers(Reference Garland, Gorham and Mohr1–Reference Holick6) and other chronic diseases such as CVD(Reference Holick6–Reference Michos and Melamed9), diabetes(Reference Holick6, Reference Michos and Melamed9–Reference Pittas, Chung and Trikalinos11) and hip fractures(Reference Feskanich, Willett and Colditz12–Reference Cauley, Lacroix and Wu14). Plasma 25-hydroxyvitamin D (25(OH)D), the primary circulating form of vitamin D(Reference Holick6, Reference Holick15), is an accepted biomarker for measuring vitamin D status in clinical settings(Reference Horst, Reinhardt and Reddy16); however, it is strongly dependent on season of blood draw. Although 25(OH)D is fairly reproducible over 2–3 years(Reference Kotsopoulos, Tworoger and Campos17, Reference Giovannucci, Liu and Rimm18), one measurement is weaker in characterising longer-term exposure(Reference Hofmann, Yu and Horst19, Reference Jorde, Sneve and Hutchinson20). Furthermore, measuring 25(OH)D requires the availability of blood samples and monetary resources for laboratory assays, thereby limiting the feasibility of this approach for many large-scale epidemiological studies.
Individual determinants of vitamin D status, such as latitude and regional estimates of solar radiation(Reference Garland and Garland21–Reference Grant and Mohr23) or dietary assessment(Reference Anderson, Cotterchio and Vieth24–Reference Oh, Willett and Wu26), have been used as surrogates of vitamin D exposure, but each alone contributes a small proportion of 25(OH)D. An alternative approach to assess vitamin D status is to combine known determinants of circulating 25(OH)D to derive a predicted score from questionnaire data using measurements of plasma 25(OH)D available for a subset of the study population; reported r 2 from such predictive models have ranged from 0·21 to 0·42(Reference Giovannucci, Liu and Rimm18, Reference Liu, Meigs and Pittas27–Reference Chan, Jaceldo-Siegl and Fraser29). Although the r 2 between predicted and measured 25(OH)D has been used to assess the ‘validity’ of the predicted 25(OH)D approach, the r 2 in this context has limitations, given that a single measure is not a true ‘gold standard’ of long-term average 25(OH)D concentration. Such an ‘alloyed’ or imperfect ‘gold standard’ will underestimate true ‘validity’(Reference Wacholder, Armstrong and Hartge30–Reference Willett32). More importantly, a comparison of high v. low circulating 25(OH)D level in the population, which may be estimated by high and low predicted 25(OH)D, may be the more relevant factor for testing exposure–disease hypotheses in epidemiological studies.
In the present paper, we describe the development and validation of regression models to predict 25(OH)D based on the determinants of vitamin D status in three cohorts: the Nurses’ Health Study (NHS), the Nurses’ Health Study II (NHSII) and the Health Professionals Follow-up Study (HPFS). Predicted 25(OH)D scores have been used in several analyses within these cohorts(Reference Forman, Giovannucci and Holmes8, Reference Giovannucci, Liu and Rimm18, Reference Bao, Ng and Wolpin33, Reference Ng, Wolpin and Meyerhardt34) but, with the exception of HPFS(Reference Giovannucci, Liu and Rimm18), no formal validation has been conducted previously and the specific prediction models varied with each analysis. We also evaluated the reproducibility of plasma 25(OH)D over 10–11 years in NHS participants.
Participants were selected from three US prospective cohort studies. The NHS was established in 1976, when 121 700 female nurses aged 30–55 years completed a self-administered questionnaire on risk factors for cancer and other diseases. During 1989 and 1990, a total of 32 826 participants provided blood samples for analysis. The NHSII began in 1989 when 116 671 female nurses aged 25–42 years completed and returned a baseline questionnaire. Between 1996 and 1999, a total of 29 611 participants (aged 32–54 years) provided blood samples. The HPFS comprises 51 529 male dentists, optometrists, osteopaths, podiatrists, pharmacists and veterinarians aged 40–75 years at baseline in 1986. Blood samples were provided by 18 225 of these men during 1993 and 1994. Blood samples have been stored in liquid N2 freezers ( ≤ − 130°C) since collection. For all three cohorts, biennial questionnaires are sent to participants to update information on risk factors and to identify newly diagnosed diseases. Diet is assessed by a validated semiquantitative FFQ approximately every 4 years(Reference Willett, Sampson and Stampfer35–Reference Feskanich, Rimm and Giovannucci38).
Plasma 25(OH)D measurements were available from men and women who served as controls in previous nested case–control studies of chronic diseases. None of the participants had a history of cancer at the time of blood draw. For each cohort, we selected two independent samples: a ‘training’ sample was used to develop the 25(OH)D prediction model and a ‘test’ sample served as a validation data set. Training samples comprised controls from all completed and on-going nested case–control studies with 25(OH)D assay results when analyses began. Test samples were drawn from more recently established nested case–control studies as this project unfolded and additional plasma 25(OH)D assay results became available. Before exclusions for missing data, the training sets consisted of 2246 women in the NHS, 1646 women in the NHSII and 1255 men in the HPFS. An additional 818 women in the NHS, 479 women in the NHSII and 841 men in the HPFS were available for the test sets.
In 2000 and 2001, all women in the NHS who gave blood in 1989–1990 and were alive were invited to provide a second blood sample. Of the 18 473 women who participated in the second blood collection, 443 women with no history of cancer had measured 25(OH)D available at both time points. These samples were used to assess within-person variability of plasma 25(OH)D concentrations over 10–11 years.
This study was approved by the institutional review boards of the Harvard School of Public Health and Brigham and Women's Hospital. All participants gave written informed consent at enrolment.
Plasma 25(OH)D levels were determined by RIA or chemiluminescence immunoassay, as previously described(Reference Hollis39–Reference Wagner, Hanwell and Vieth41), between 1993 and 2010. The time between blood collection and 25(OH)D assay ranged from 3 to 20 years, with the majority of samples assayed within 14 years of blood collection. The stability of 25(OH)D in frozen plasma has been previously demonstrated, even for samples stored>10 years(Reference Hollis42). Intra-assay CV from blinded, replicate, quality-control samples were < 15 % for twenty-three out of twenty-six laboratory batches; the highest CV was 17·6 %. Mean 25(OH)D concentrations in training samples were: 28·5 (sd 10·9) ng/ml (NHS, n 2079); 26·3 (sd 9·8) ng/ml (NHSII, n 1497) and 25·9 (sd 10·0) ng/ml (HPFS, n 911).
Using the training sample for each cohort, we fit a linear regression model to predict measured plasma 25(OH)D (continuous ng/ml) based on known or suspected determinants(Reference Giovannucci, Liu and Rimm18). Age (years), season of blood draw and laboratory batch were included as independent variables in all models to account for known extraneous variation. Other candidate predictor variables were energy-adjusted(Reference Willett, Howe and Kushi43) vitamin D intake from food, vitamin D intake from supplements, average annual UV-B flux – a composite measure of mean UV-B radiation level reaching the earth's surface that takes into account factors such as latitude, altitude and cloud cover – based on state of residence(Reference Scotto, Fears and Fraumeni44), race/ethnicity, BMI, leisure-time physical activity level, alcohol intake, geographic region of residence (North, South, Midwest, West), smoking history, hair colour, susceptibility to burn, ability to tan and number of lifetime sunburns. Menopausal status, post-menopausal hormone use and age at first birth were also considered for women in the NHS and NHSII. Data were obtained from questionnaires completed closest to blood draw date. Questionnaires were completed within ± 2 years of blood draw for ≥ 97 % of each sample; the median time period was 5 months before blood draw for the NHS, 3 months after blood draw for the NHSII, and 2 months before blood draw for the HPFS.
For each cohort, we first fit a multivariable linear regression model with all candidate predictors with P < 0·05 in univariate analyses adjusted for laboratory batch and age. Then, we eliminated non-significant (P ≥ 0·05) variables from the model, one at a time, based on the largest P value. The final multivariable prediction model includes all statistically significant predictors, plus age, season of blood draw and laboratory batch. The HPFS model is a refinement of the one previously published(Reference Giovannucci, Liu and Rimm18). The general form of the prediction model is: 25(OH)D = β0+β′X′, where β0 represents the intercept and β′ represents the vector of coefficients associated with the vector of predictors, X′ (see Table 1).
MET, metabolic equivalents.*Adjusted for laboratory batch.
† Age and season not used in predicted score calculation.
‡ UV-B flux category (in Robertson–Berger (RB) count × 10− 4): NHS: 1 is>113, 2 is 113, 3 is < 113; NHSII: 1 is 145–196, 2 is 115–144, 3 is 108–114, 4 is < 105; HPFS: 1 is 158–196, 2 is 137–154, 3 is 115–133, 4 is 105–113, 5 is < 105.
§ Mean values in extreme quintiles of physical activity (in MET h/week): NHS: 1 is 1·2, 5 is 41·4; NHSII: 1 is 1·6, 5 is 52·8; HPFS: 1 is 3·6, 5 is 87·4.
∥ Post-menopausal hormone use: NHS: 1 is pre-menopausal, 2 is post-menopausal, never user, 3 is post-menopausal, past user, 4 is post-menopausal, current user, 5 is post-menopausal, unknown use; NHSII: 1 is never user, 2 is past user, 3 is recent past user, 4 is current user, 5 is unknown use.
Regarding the final sets of predictors, we aimed for consistency between cohorts, while allowing for flexibility in cohort-specific models. Factors statistically significant for one cohort were considered for inclusion in models for the other cohorts regardless of statistical significance, given sufficient biological plausibility (e.g. UV-B flux). We excluded individuals with missing values for predictors except post-menopausal hormone use in the NHSII for which a missing category was created. The final prediction models were fit to 2079 women aged 42–69 years in the NHS, 1497 women aged 32–52 years in the NHSII and 911 men aged 46–81 years in the HPFS.
Based on the regression coefficients for each variable in the prediction model, we calculated a predicted 25(OH)D score for each individual in the test samples using personal data for covariates. Age, season of blood draw and laboratory batch were not used in the derivation of predicted 25(OH)D scores. Age is not used in the derivation of predicted 25(OH)D because it is a strong risk factor for many chronic diseases. By excluding age from the derived score, the ability to control finely for potential confounding by age in epidemiological investigations is retained. Predicted 25(OH)D scores were not calculated if predictor data were missing on the questionnaire closest to blood draw or the previous questionnaire (NHS, n 39; NHSII, n 34; HPFS, n 5). For the test samples, there were 779 women in the NHS, 445 women in the NHSII and 836 men in the HPFS with available 25(OH)D measurements and predicted 25(OH)D scores.
For validation, we compared predicted 25(OH)D and actual plasma 25(OH)D measurements in test samples. Laboratory batch-adjusted Spearman correlation coefficients were calculated to assess agreement between predicted score and actual 25(OH)D levels. We examined actual plasma 25(OH)D measurements according to decile of predicted 25(OH)D score(Reference Giovannucci, Liu and Rimm18, Reference Millen, Wactawski-Wende and Pettinger28) and cross-classified individuals by quintile of both predicted and actual 25(OH)D. Using previously published data from a nested case–control study that examined the association between plasma 25(OH)D and colorectal cancer in the NHS and HPFS(Reference Wu, Feskanich and Fuchs4), we calculated OR for colorectal cancer for a 10 ng/ml difference in measured 25(OH)D and then compared the results to analyses that used the predicted 25(OH)D score. In these analyses, we derived separate predicted scores at each questionnaire year based on current predictor data and calculated the average predicted 25(OH)D from 1986 – the year predicted scores were first derived – to date of diagnosis (or matched date for controls) as the main exposure variable. For both measured and predicted 25(OH)D, pooled estimates were calculated for the NHS and HPFS using a meta-analysis approach described by DerSimonian & Laird(Reference DerSimonian and Laird45).
Finally, we evaluated the reproducibility of 25(OH)D measurements over 10–11 years among 443 women in the NHS with two blood measures, using a statistical approach previously described(Reference Kotsopoulos, Tworoger and Campos17). We calculated intraclass correlation coefficients (ICC) by dividing the between-person variance by the sum of the within- and between-person variances; a 95 % CI also was calculated. Using a mixed model, we adjusted for age (continuous) by including it as a fixed effect. ICC measures the fraction of total variation that is due to between-person variability. A high value for the ICC reflects a low within-person variation.
Among NHS participants with two 25(OH)D measurements 10–11 years apart, we compared average plasma 25(OH)D concentration to average predicted 25(OH)D score over the same time period. We calculated Spearman correlation coefficients based on the residuals of plasma 25(OH)D measurements in each time period from a linear regression model to factor out effects of age and season of blood draw. Because random within-person error can attenuate correlations, we used data from the reproducibility sample to correct for these effects(Reference Beaton, Milner and Corey46, Reference Liu, Stamler and Dyer47).
All statistical tests were two-sided and analyses were performed using SAS version 9 for UNIX (SAS Institute, Inc.).
Using multivariable linear regression in the training set within each cohort, we identified the following independent predictors of age-adjusted plasma 25(OH)D levels: race, UV-B flux (NHS and HPFS only), dietary vitamin D intake, supplementary vitamin D intake, BMI, physical activity, alcohol intake (NHS and NHSII only), post-menopausal hormone use (NHS only) and season of blood draw (Table 1). Overall, the predictive models explained 25 % (NHSII), 28 % (HPFS) and 33 % (NHS) of the total variability in plasma 25(OH)D concentration. The strongest predictors of circulating 25(OH)D generally were race (a proxy for skin pigmentation) and BMI, followed by physical activity, dietary and supplementary vitamin D intake and UV-B flux (NHS and HPFS only). Season also was an important predictor of 25(OH)D, but is not used in the calculation of predicted 25(OH)D score because it reflects time of blood draw and is not a factor in determining long-term average between-person variation in 25(OH)D. Age was not a significant independent predictor of 25(OH)D in the NHS or HPFS, but a modest inverse association was observed in the NHSII.
Using the regression coefficients estimated in each training set, we calculated predicted 25(OH)D scores for participants in the corresponding test samples. The batch-adjusted Spearman correlation coefficients between predicted score and actual 25(OH)D level were 0·23 (95 % CI 0·16, 0·29) for the NHS, 0·40 (95 % CI 0·32, 0·47) for the NHSII and 0·24 (95 % CI 0·18, 0·30) for the HPFS (all P values < 0·0001). After further adjusting for age and season of blood draw, correlations were 0·23 (95 % CI 0·16, 0·29; NHS), 0·42 (95 % CI 0·34, 0·49; NHSII) and 0·30 (95 % CI: 0·21, 0·37; HPFS). In all cohorts, actual plasma 25(OH)D levels generally rose with increasing decile of predicted 25(OH)D score (Fig. 1). The differences in mean actual 25(OH)D level between extreme deciles of predicted 25(OH)D score were 8·7 ng/ml (95 % CI 5·4, 11·9) for the NHS, 12·3 ng/ml (95 % CI 8·7, 16·0) for the NHSII and 8·7 ng/ml (95 % CI 5·5, 11·8) for the HPFS.
Because epidemiological studies often categorise exposures into quantiles for analysis, we cross-classified individuals in the validation samples by quintile of predicted 25(OH)D and measured plasma 25(OH)D levels to determine how well the predicted score performed in ranking individuals with respect to plasma levels. Between 24·8 % (NHS) and 29·9 % (NHSII) of individuals fell into identical quintiles of predicted and measured 25(OH)D. Using the predicted scores, the majority of individuals were classified in either the same quintile or the adjacent quintile of actual plasma 25(OH)D concentration (NHS: 59·8 %, NHSII: 66·5 %, HPFS: 61·4 %; Fig. 2). Only 5 % or less of participants in each cohort were in extreme opposite quintiles according to predicted and actual 25(OH)D. Among women in the lowest quintile (Q1) of actual plasma 25(OH)D in the NHS, 33 % were categorised in Q1 of the predicted score, 57 % were categorised in either Q1 or Q2 and 13 % were categorised in Q5. Among women in Q1 of actual plasma 25(OH)D in the NHSII, 44 % were categorised in Q1 of the predicted score, 66 % were categorised in either Q1 or Q2 and 8 % were categorised in Q5. Among men in Q1 of actual plasma 25(OH)D in the HPFS, 37 % were categorised in Q1 of the predicted score, 57 % were categorised in either Q1 or Q2 and 10 % were categorised in Q5.
Based on data from a previously published case–control study of colorectal cancer in the NHS and HPFS(Reference Wu, Feskanich and Fuchs4), the pooled multivariable OR for a 10 ng/ml difference in measured 25(OH)D was 0·82 (95 % CI 0·66, 1·03). Using the average predicted 25(OH)D score in these analyses yielded an OR of 0·78 (95 % CI 0·41, 1·48).
In our reproducibility substudy in the NHS, the ICC for plasma 25(OH)D measured over 10–11 years was 0·50 (95 % CI 0·43, 0·57). Among these 443 women, the age- and season-adjusted Spearman correlation coefficient between average measured 25(OH)D based on two blood samples and long-term average predicted 25(OH)D over the same time period was 0·23. We corrected for within-person variation in plasma 25(OH)D to obtain a deattenuated correlation coefficient of 0·28.
Using data from three US cohorts, we derived predicted 25(OH)D scores based on various factors that influence circulating levels. The determinants of circulating 25(OH)D we identified generally were consistent with the predictors reported by others(Reference Liu, Meigs and Pittas27–Reference Chan, Jaceldo-Siegl and Fraser29, Reference McCullough, Weinstein and Freedman48–Reference Brock, Huang and Fraser55). The set of predictors included in the final models explained only a proportion of the total variability in plasma 25(OH)D levels (i.e. 25–33 %). The r 2 for our prediction models were generally consistent with previously published models(Reference Liu, Meigs and Pittas27–Reference Chan, Jaceldo-Siegl and Fraser29). Millen et al. (Reference Millen, Wactawski-Wende and Pettinger28) reported a similar multivariable regression model with a comparable r 2 (0·21) and correlation between predicted and actual 25(OH)D (0·45) for the Women's Health Initiative. In the Framingham Offspring Study, Liu et al. (Reference Liu, Meigs and Pittas27) developed a model to predict a 25(OH)D score based on a similar set of predictors (r 2 0·26), and in their validation study observed a correlation of 0·51 between predicted and actual levels. In the Adventist Health Study-2, Chan et al. (Reference Chan, Jaceldo-Siegl and Fraser29) reported r 2 of 0·22 and 0·33 for White and Black populations, respectively (0·42 combined); however, they did not compare predicted and actual 25(OH)D levels in an independent sample. Because only a small proportion of the total variability in plasma 25(OH)D levels is explained by identified predictors, predicted 25(OH)D scores cannot be interpreted as direct blood measurements of 25(OH)D to determine an individual's vitamin D sufficiency, insufficiency or deficiency status.
Vitamin D prediction models have potential strengths and limitations as exposure assessment tools. Our models and others’ have substantial unexplained variability, which probably can be attributed to error in the measurement of predictor variables and plasma 25(OH)D levels and lack of information about other important determinants of vitamin D status such as genetic factors(Reference Shea, Benjamin and Dupuis49, Reference Engelman, Fingerlin and Langefeld56) and actual UV exposure. While sun sensitivity characteristics (e.g. ability to tan, susceptibility to burn and number of lifetime sunburns) were not predictive of 25(OH)D in the NHS, data on personal sun exposure and sun behaviours (such as time spent outdoors and use of sunscreen or protective clothing), important determinants of circulating 25(OH)D, were not regularly collected in these cohorts. We examined leisure-time physical activity as a proxy for time spent outdoors and found this to be a significant predictor. The prediction models also include an estimate of average annual UV-B flux, a composite measure of mean UV-B radiation level based on latitude, altitude and cloud cover, which also was a significant determinant of circulating 25(OH)D.
Millen et al. (Reference Millen, Wactawski-Wende and Pettinger28) concluded that predicted 25(OH)D scores ‘do not adequately reflect serum 25(OH)D concentrations’. While we agree that predicted scores cannot substitute for blood measures in assessing current 25(OH)D level, we view the results of both studies as providing reasonable evidence that predicted 25(OH)D score is an acceptable marker of vitamin D status for the purposes of distinguishing a substantial range of vitamin D exposure in a given study population. In chronic disease epidemiology, the actual contrast between high and low exposure level over years or decades is particularly relevant. We calculated differences in measured 25(OH)D between extreme deciles of predicted score of 9–12 ng/ml, which represents the actual contrast in long-term 25(OH)D that can be studied in these populations. This difference corresponds to differences in vitamin D intakes of approximately 25–37·5 μg (i.e. 1000–1500 IU)/d(Reference Heaney, Davies and Chen57), and is considerably larger than what may be estimated using single surrogates of vitamin D exposure, such as dietary vitamin D intake, which explains a contrast of approximately 3 ng/ml in 25(OH)D between high and low intake (Table 1).
A single blood measurement of 25(OH)D has the advantage of being a direct measure of circulating 25(OH)D; however, it is substantially influenced by recent and acute exposures (e.g. beach vacation, season), which contributes to measurement error in estimating long-term 25(OH)D. Correlations between two direct 25(OH)D measures taken 2–14 years apart range from 0·42 to 0·72(Reference Kotsopoulos, Tworoger and Campos17–Reference Jorde, Sneve and Hutchinson20), reflecting that a single 25(OH)D measurement is not a true ‘gold standard’ of long-term 25(OH)D level. In the NHS, the ICC for plasma 25(OH)D measured 2–3 years apart was 0·72(Reference Kotsopoulos, Tworoger and Campos17); over 10–11 years, the ICC was 0·50. While an ICC of 0·50 indicates fair to good reproducibility of a biomarker(Reference Rosner58), the difference between the 2–3-year ICC and the 10–11-year ICC reflects lower reproducibility over a longer time period. Therefore, in our analyses and those by Millen et al. (Reference Millen, Wactawski-Wende and Pettinger28) and Liu et al. (Reference Liu, Meigs and Pittas27), correlation coefficients comparing predicted and actual 25(OH)D are probably underestimated because measured 25(OH)D is not a true ‘gold standard’ and because random within-person error in the measurement of both variables attenuates correlation coefficients(Reference Willett32). Because circulating plasma 25(OH)D is an imperfect measure of long-term 25(OH)D status, the comparison of mean actual 25(OH)D level by category of predicted score in validation analyses may better reflect the utility of predicted 25(OH)D scores to assess long-term 25(OH)D status. Although we assumed that the average of two plasma 25(OH)D measurements taken 10–11 years apart would better represent long-term 25(OH)D status, correlation coefficients were similar in the NHS sample with repeated measurements.
Another objection commonly raised about the predicted 25(OH)D score is that it may be confounded by its predictors (e.g. physical activity or BMI), which could be independent risk factors for disease(Reference Millen, Wactawski-Wende and Pettinger28). This criticism would also be true for plasma 25(OH)D levels, which inherently incorporate these factors. Importantly, including predictors of vitamin D status as covariates in multivariable models may represent over-adjustment because these variables are important determinants of 25(OH)D. Therefore, adjusting for these factors may be inappropriate. A potential advantage of using predicted 25(OH)D scores over measured 25(OH)D in analytical epidemiology is that a sensitivity analysis could be performed in which physical activity (or other predictor) is excluded from the score, thereby removing potential confounding by this factor. In practice, however, we did not observe evidence of confounding of predicted 25(OH)D by BMI or physical activity in previous analyses of colorectal cancer risk in the HPFS(Reference Giovannucci, Liu and Rimm18).
Predicted scores were derived based on data not collected for assessing vitamin D status; the predictive ability of derived scores would probably improve if additional determinants of circulating 25(OH)D, such as personal sun exposure behaviours, were incorporated. Random measurement error in predicted 25(OH)D is expected to attenuate measures of association with disease(Reference Willett32, Reference Thomas, Stram and Dwyer59); however, predicted scores should allow investigators to test a sizeable contrast in 25(OH)D between ‘low’ and ‘high’ exposure categories and will still be useful to detect moderate to strong vitamin D–disease associations. In our cohorts, we observe similar associations for various disease endpoints using plasma 25(OH)D and predicted 25(OH)D as the exposure variable, including hypertension(Reference Forman, Giovannucci and Holmes8), colorectal cancer incidence(Reference Giovannucci, Liu and Rimm18) and survival(Reference Ng, Wolpin and Meyerhardt34, Reference Ng, Meyerhardt and Wu60), pancreatic cancer incidence(Reference Bao, Ng and Wolpin33) (E Giovannucci, unpublished results) and prostate cancer incidence(Reference Giovannucci, Liu and Rimm18) (E Giovannucci, unpublished results). For example, although statistical power was reduced when we used average predicted 25(OH)D, we observed similar OR of approximately 0·8 for a 10 ng/ml difference in either plasma or predicted 25(OH)D for colorectal cancer based on data from a previous case–control study in the NHS and HPFS(Reference Wu, Feskanich and Fuchs4). In a much larger HPFS cohort analysis with 691 colorectal cancer cases, the relative risk for the same increment of predicted 25(OH)D was 0·63 (95 % CI 0·48, 0·83)(Reference Giovannucci, Liu and Rimm18), demonstrating that the loss in precision may be recovered by increasing the sample size in analyses using predicted scores.
For analyses of vitamin D and chronic diseases in these specific cohorts, predicted 25(OH)D scores can be derived for each participant at each questionnaire cycle. An advantage of longitudinal data is the availability of updated predictor information, allowing the predicted 25(OH)D score to change over time and potentially better capturing long-term average vitamin D status. Such studies would use data available from the full cohorts and complement biomarker analyses with smaller sample sizes. As noted by others(Reference Chan, Jaceldo-Siegl and Fraser29), prediction models developed in the NHS, NHSII and HPFS may not apply to other study populations because of underlying population differences and/or availability of data; however, similar models may be developed using the general approach described here and could be useful for investigating hypothesised associations between vitamin D status and disease. It is also possible, however, that the prediction models developed in these cohorts could perform well in populations with similar demographics (e.g. male and female populations with similar age, race and residential latitude distributions as the NHS, NHSII and HPFS); such applications would benefit from additional validation. In conclusion, predicted 25(OH)D scores may be a practical alternative for studying such associations in international and other settings where large-scale biomarker studies are not feasible.
The present study was supported by the National Institutes of Health (CA87969, CA49449, CA050385, CA67262 and CA55075). K. A. B. was supported by the Training Program in Environmental Health Sciences (T32 ES007155) and the Nutritional Epidemiology of Cancer Education and Career Development Program (R25 CA098566). E. G. secured funding, and with K. A. B., D. F., M. D. H. and F. L., conceived and designed this study. D. F. and E. G. oversaw the study's implementation and analytic strategy. K. A. B., A. H. E., K. W., D. F. and E. G. were involved in data collection while K. A. B., D. F., Y. L. and S. M. conducted the data analyses, with additional statistical support from A. H. E. and K. W. All authors contributed to interpretation of the results. K. A. B. wrote the first draft of the manuscript, which was critically revised and approved by all authors. The authors assert that they have no conflicts of interest. Finally, the authors thank Dr Walter Willett for his scientific input on this paper.