The relationship between dietary factors and cancer and other chronic diseases is best studied in prospective cohort studies, where diet and other environmental exposures are examined in large populations and pre-disease dietary patterns can be observed. The food-frequency questionnaire (FFQ) has become the primary instrument for dietary assessment to measure usual intake of foods, energy and nutrients in large-scale epidemiologic studiesReference Willett1. Although the FFQ may not be the most accurate method to estimate absolute intake, there is general agreement that FFQ data provide a sufficiently accurate means to evaluate relative intake within epidemiological studiesReference Byers2. In studying diet and disease relationships, comparison of disease outcomes by quartiles or quintiles of nutrient or energy intake is often the approach taken to measure relative riskReference Kushi3. Therefore, the dietary assessment instrument has to rank individuals according to intake levels rather than accurately estimate the exact amount of individual foods or nutrients consumed.
In creating and scoring an FFQ, information about portion sizes for each food item is needed to estimate usual intake of nutrients. The two possibilities are either to elicit information about portion size from the participants or to assign a standard portion size uniformly to each food. The primary goal of this study was to assess the effect of separate portion size questions on the rank ordering in estimates of food servings and nutrient intake in our newly developed FFQ for use in the Southern Community Cohort Study (SCCS).
Materials and methods
FFQ
To develop an FFQ for the SCCS, we used an empirical strategy to identify a small set of food items that could discriminate between African-Americans and Whites and account for a major portion of total energy intake and key nutrientsReference Buchowski, Schlundt, Hargreaves, Hankin, Signorello and Blot4. The questionnaire development strategy involved: (1) extracting food intake data for a population-representative sample of adult Southerners from the Third National Health and Nutrition Examination Survey (NHANES-III) dataset; (2) developing a food group coding system to allow us to select and evaluate a reasonable number of candidate foods; (3) examining the frequency of food use and the ability of each food category to discriminate between African-Americans and Whites; (4) assessing the selected foods for their contribution of important nutrients to dietary intake; (5) developing the FFQ, nutrient database and scoring software; and (6) evaluating the developed FFQ's performance in a pilot studyReference Buchowski, Schlundt, Hargreaves, Hankin, Signorello and Blot4. The resulting instrument was an FFQ with 104 food items (where items could be single foods, e.g. banana, or a group of similar foods, e.g. steak and roast beef). To aid subject recall, these 104 items were organised into groups either by function, such as ‘breakfast foods’, ‘desserts and snacks’, ‘spreads and dressings’ and ‘beverages’, or by food group such as ‘fruits and fruit juices’, ‘vegetables’, ‘rice, beans and potatoes’, ‘pasta, pizza and soup’, ‘dairy foods’, ‘breads’ and ‘meats’.
The FFQ elicited information on usual frequency of consumption during the past year using the following scale: never, rarely, once a month, 2–3 times a month, once a week, 2–3 times a week, 4–6 times a week, once a day and two or more times a day. The FFQ also obtained estimates of portion size (small, medium or large) for each food item. We used the mean portion size consumed by the NHANES-III sample rounded to the nearest common unit to define a medium portion. Small and large portion sizes were then defined usually as one-half (small) or twice (large) the medium portion size. When completing the FFQ, subjects would be asked to report on both their usual frequency of consumption and usual portion size as small, medium or large. Portion sizes for each food were anchored to a specific amount of food (e.g. small = 1/2 cup, medium = 1 cup, large = 2 cups).
The FFQ was part of a 23-page self-administered questionnaire developed to obtain information on potential cancer risk factors as well as the utilisation of health services. Questions were derived from those used in other surveys and/or developed anew to reflect issues of particular interest in the Southeast. Many questions were modelled after those used previously in epidemiological surveys conducted by the investigators, or by the National Cancer Institute and the National Center for Health StatisticsReference Buchowski, Schlundt, Hargreaves, Hankin, Signorello and Blot4, enabling comparison of exposure data across surveys.
SCCS pilot study sample
Our sample for analysis consisted of FFQs completed by 239 adults aged 40–79 years participating in the SCCS pilot study in Tennessee, Mississippi and FloridaReference Buchowski, Schlundt, Hargreaves, Hankin, Signorello and Blot4. Subjects were recruited from drivers' licence, voter registration and commercial rosters, and from Community Health Centre patients. Prior to participation, all pilot study participants provided written informed consent using forms approved by the Institutional Review Boards of Vanderbilt University and Meharry Medical College.
Statistical analyses
For each individual, numbers of servings per day were calculated for each of 13 food groups (fruits; breakfast foods; vegetables; potatoes, rice and pasta; soups; legumes; meats; breads; condiments, dressings and spreads; dairy products; snacks, desserts and treats; beverages; and alcoholic beverages). Estimates of daily intakes of total energy and 18 nutrients were calculated based on the estimated nutrient content of the foods consumed multiplied by the numbers of servings per day. A medium portion size was considered as a serving for the purposes of computing food groups, with a small portion usually contributing one half of a serving and a large portion usually contributing two servings to the food group total. Energy-adjusted nutrients were calculated by taking the residual after regressing total energy intake onto the nutrient value.
Subjects were ranked with respect to food group, total energy and nutrient intakes using two algorithms. In the first algorithm, the estimates were calculated using portion size provided by the respondents in the FFQ. In the second, reported portion size was ignored in calculating food group servings per day, and nutrient estimates were calculated assuming a medium portion size for all subjects. Correlations between the ranks (Spearman's) from the two methods of estimation were then computed along with 95% confidence intervals (CIs). The degrees of association between the two estimates were also computed by Pearson correlations (r) with 95% CIs. CIs, standard deviations (SDs) and means of correlations were computed using the Fisher r–z transformation. The percentage variation in the dietary indices that is unexplained [(1 − r 2) × 100] by omitting portion size in the FFQ was also calculated for the entire sample, then by race and gender subgroup. Because distributions of food groups and nutrients were not necessarily normal, we transformed the data using natural logarithms (to normalise the distribution) and recalculated the Pearson correlations to understand how departures from normality might influence our interpretation of the results. We also compared the two methods with respect to the use of nutrients to classify participants into quintiles using percentage agreement, and κ coefficient. SPSS for Windows v. 14.0 (SPSS Inc.) was used to conduct all statistical analyses.
Results
Of the 239 participants enrolled in the pilot study, 209 (88%) had food frequency responses that were considered valid. We eliminated participants for leaving >85 of the 104 items blank, scoring at < 2511 kJ (600 kcal) or >33 488 kJ (8000 kcal) per day, or having a total energy intake of < 62.8 kJ kg− 1 (15 kcal kg− 1) or >418.6 kJ kg− 1 (100 kcal kg− 1) of body weight.
Table 1 shows the breakdown of the sample by ethnicity and gender, and gives mean age, body mass index and estimated values for total energy, macronutrient intake and percentage of energy from each macronutrient by gender and ethnicity, calculated using portion size ratings. Notable are the higher total energy and macronutrient intakes among African-Americans than Whites, although the percentage of calories derived from each of protein, carbohydrates and fats showed only minor differences by race.
SCCS – Southern Community Cohort Study; SD – standard deviation; BMI – body mass index.
Tables 2 and 3 show the Spearman rank order correlations (r s) between servings per day of the different food groups (Table 2) and nutrients (Table 3) computed with and without portion size information along with 95% CIs. The mean ( ± SD) Spearman rank correlation among the food groups was 0.87 ( ± 0.09), with the correlations between individual food groups ranging from 0.66 to 0.94. The mean Spearman correlation between nutrient intakes estimated with versus without portion size was 0.90 ( ± 0.02), ranging from 0.81 to 0.94. Tables 2 and 3 also show Pearson correlations with 95% CIs. The mean ( ± SD) Pearson correlation was 0.84 ( ± 0.15) for food groups (Table 2) and 0.90 ( ± 0.01) for nutrients (Table 3). After transforming the data using the natural logarithm to normalise the nutrient distributions, the mean correlation for food groups was 0.85 ( ± 0.07) and the mean correlation for nutrients was 0.92 ( ± 0.01). The mean percentage loss of explained variation [(1 − r 2) × 100] for food groups was 33% overall and ranged from 28% for black females to 31% for white females, to 35% for black males and 41% for white males. The mean percentage loss of explained variation for nutrients was 19% overall, being 18% for black and white females, 22% for black males and 28% for white males.
CI – confidence interval.
* Spearman rank order correlation between food group servings calculated with and without portion size.
† 95% confidence interval around the Spearman correlation.
‡ Pearson correlations between food group servings calculated with and without portion size.
§ 95% confidence interval around the Pearson correlation.
CI – confidence interval.
* Spearman rank order correlation between food group servings calculated with and without portion size.
† 95% confidence interval around the Spearman correlation.
‡ Pearson correlations between food group servings calculated with and without portion size.
§ 95% confidence interval around the Pearson correlation.
Table 4 presents an analysis of the stability of sorting participants into quintiles of nutrient intake. On average across a set of selected macronutrients and micronutrients, quintiles formed with and without portion sizes agreed exactly 61.2% of the time. Since misclassification by one quintile is less of a problem than larger classification errors, the percentage of cases that were placed into the same or an adjacent quintile was calculated. On average, this less stringent level of agreement was achieved 95.5% of the time. A κ coefficient was computed to assess the extent to which agreement in quintile classification is better than chance. The mean κ was 0.52, which was significantly better than chance (P < 0.0001) for all nutrients.
* The percentage of cases assigned to exactly the same quintile.
† The percentage of cases assigned to the same quintile or to an adjacent quintile.
‡ κ coefficient measuring the degree to which the cases are assigned to the same quintile above and beyond chance.
§ P-value of the test that κ is equal to zero (agreement is just by chance).
Discussion
The SCCS will help identify potential causes of disparities between African-Americans and Whites with regard to cancer and other chronic illnesses. Diet is an important risk factor for many of these diseases. It is therefore important to develop a relatively simple dietary assessment tool that will be sensitive to gender and racial differences and that will be able to predict risk of morbidity and mortality. To accomplish these objectives, we adopted an empirical strategy to identify a set of foods that would form the core of our FFQReference Buchowski, Schlundt, Hargreaves, Hankin, Signorello and Blot4.
The primary test of agreement between the two methods, scoring the FFQ with and without portion size information, was to examine the similarity in rank ordering or classification into quintiles. Rank ordering was very consistent across a variety of food groups and estimated nutrients. The ability of the FFQ to form identical quintiles was good, and it was able to classify people into the same or an adjacent quintile very well.
A further look at the impact of omitting portion data was obtained through the analysis of Pearson correlations and percentage loss of explained variation by omitting portion sizes. The results were unchanged after transforming the data to normalise the distributions of food groups and nutrients. These analyses showed that omitting portion size resulted in greater loss of ability to explain variation for food groups than for nutrient estimates. Food groups were computed from just a few items (e.g. only the fruits or meats) and therefore variations in portion size in a few foods can have a larger impact on the overall variability of the estimated score than for nutrients which were computed using all 104 foods. A few food groups such as alcoholic beverages showed much greater loss of explained variability when portion size information was omitted. When a person does not consume alcohol, their response adds no variability to the overall score. When there are many who do not consume an item, variation in portion size among the people who do consume the item makes a greater contribution to the overall variation in scores. The loss in variation probably occurs through two processes. First, we assume that questions about portion size add at least some meaningful variability to the data by increasing accuracy of nutrient estimates. Secondly, portion size estimates may also contribute additional random variability to the data. Without a reference standard to assess true food intake, it is not possible to determine precisely the relative contribution of these two processes to the reduction in variability.
Our findings using a racially diverse sample are similar to those reported in predominantly Caucasian populations elsewhere. Block et al.Reference Block, Hartman and Naughton5 showed correlations between the same FFQ, scored with and without portion sizes, of around 0.9 for nutrients, and Clapp et al.Reference Clapp, McPherson, Reed and His6 found correlation coefficients of 0.73–0.92 for retinol and folacin, respectively. In a Danish study, FFQ data with and without individually estimated portion sizes were compared with weighed diet recordsReference Tjonneland, Haraldsdottir, Overvad, Stripp, Ewertz and Jensen7. Mean correlation coefficients for food groups and nutrients changed only slightly, indicating that little extra information could be obtained by additional questions about portion size.
The results of our analyses also suggest that there may be differences in the importance of portion size to overall variability by gender. The loss of variability by omitting portion size was greater for males than for females, suggesting that men show more between-person variability in portions than women. We believe that it may be possible to compensate for not having portion size data by creating and using gender-specific nutrient tables when scoring the FFQ. Stratification of standard portion sizes according to age and sex has been suggestedReference Clapp, Midthune, Kulldorff, Brown, Thompson and Kipnis8. In a recent validation study, the authors stated that low correlation coefficients for nutrient intake could be due to assignment of an overall portion size instead of gender-specific portion sizesReference Subar, Thompson, Kipnis, Midthune, Hurwitz and McNutt9. In addition, use of gender and race as covariates in epidemiological analysis can also be used partially to reduce the confounding effects of gender and race differences in portion sizeReference Willett10. We plan to explore gender differences in a subsequent calibration study on a more representative, random sub-sample of participants recruited to the main study. Data from this calibration study will be used to evaluate strategies for scoring the FFQ that will recapture some of the variability sacrificed by not having separate portion size estimates.
The collection of valid individual portion size data requires that individuals are able to provide estimates of the amount of each food that is typically consumed. That people can do this well appears to be a questionable assumption. One study to validate individual portion size estimates compared FFQ using photographs with 14-day weighed food records and revealed only a weak relationship between estimated and measured portion sizeReference Haraldsdottir, Tjonneland and Overvad11. Participants selecting small portion sizes seemed to underestimate, and those selecting large portion sizes seemed to overestimate, amounts actually consumedReference Haraldsdottir, Tjonneland and Overvad11, Reference Faggiano, Vineis, Cravanzola, Pisani, Xompero and Riboli12.
The idea that there is a usual portion size for an individual is a further assumption that is implicitly made when inquiring about consumed amounts. However, the data on the proportion of intra- and inter-person variability of portion sizes shed doubts on this assumptionReference Noethlings, Hoffmann, Bergmann and Boeing13. In a study by Hunter et al.Reference Hunter, Sampson, Stampfer, Colditz, Rosner and Willett14, the intra-individual variability in food intake in 61 of 68 items exceeded the inter-individual variability.
The inclusion of separate questions inquiring about portion sizes in an FFQ introduces one additional question for each food item into the questionnaire, thus effectively doubling the length of the FFQ. In addition to the accuracy of information on food and nutrient intake, questionnaire length and respondent burden have to be considered. In the ongoing SCCS, the FFQs are being obtained using computer-assisted personal interviews, telephone interviews or self-administered questionnaires. Having a separate question about usual portions for each food would greatly extend the length of the interview or time spent by a participant filling out the questionnaire. In one study looking at the effect of questionnaire design on completion rates, questionnaires extended in length by extra non-dietary questions and portion size questions resulted in a 20% higher total non-response rate compared with short formsReference Kuskowska-Wolk, Holte, Ohlander, Bruce, Holmberg and Adami15. A short FFQ including 97 items without questions on portion size except for a few items resulted in a 20 min completion timeReference Wakai, Egami, Kato, Lin, Kawamura and Tamakoshi16, and response rates for a semi-quantitative FFQ were higher than for questionnaires inquiring about portion sizeReference Subar, Thompson, Kipnis, Midthune, Hurwitz and McNutt9. On the other hand, Subar et al.Reference Subar, Ziegler, Thompson, Johnson, Weissfeld and Reding17, who designed a questionnaire to be cognitively easier for study participants, concluded that shorter questionnaires may not always improve response rates. However, depending on the purpose of the data collection, the omission of separate portion size questions in favour of a simplified FFQ can be an advantage, especially in large epidemiological studies in which the questionnaire must be kept simple.
A potential limitation of this and other studies is that data on portion sizes and frequency of intake were collected simultaneously. If necessary, participants could interchange larger portion size with a higher frequency of intake, and vice versa. We do not know how often participants made use of such substitutions, but we assume that substitution was rarely present because portion size and frequency options were very detailed. Another limitation was that the percentage (12%) of FFQs that were eliminated due to suspected invalid results was somewhat higher than desirable. Some participants just left many of the questions blank. While this resulted in their being eliminated from further analysis, leaving them in the analysis would have contributed to low estimates and erroneously increased variability. Some participants provided estimates of how often they eat various foods which resulted in unrealistically high or low estimates of usual nutrient intake. Finding such outlier values is not uncommonReference Willett and Willet18, since answering items on an FFQ is a complex psychological process. The participant has to take each food item, which sometimes represents several foods of a similar type, and try to think about how often he or she has eaten each food over the past year. Coming up with a frequency estimate involves knowledge of food, memory of recent events and the ability to integrate these memories.
In summary, this report describes an evaluation of how important portion sizes were in estimating the amount of food consumed and estimating usual nutrient intakes. We conclude that, to reduce the respondents' burden and to increase data completeness, the assignment of a constant portion size seems to be acceptable in the SCCS. The rank order and assignment of individuals to quintiles was not adversely affected by omitting portion sizes. The frequency with which individuals consume various foods remains the single most important component of usual energy and nutrient intakeReference Heady19.
Acknowledgements
Sources of funding: This project was supported by NIH Grants RO1 CA92447, P60 DK20593 and 5P20 MD000516. M.S.B. was supported by HL67715.
Conflict of interest declaration: No author has any financial or other conflict of interest with the research contained in this publication.
Authorship responsibilities: All authors contributed to the design of the FFQ; D.G.S., M.S.B., M.K.H., L.B.S. and W.J.B. contributed to implementing the FFQ, analysing and interpreting the data, and writing the manusript.
Acknowledgements: We thank Dr Jin-Mann Lin, Director of Bioinformatics Services of the Clinical Research Center at Meharry Medical College for help with interpretation of some biostatistical analyses. We also thank the late Ms M Marrs and Dr K Junior from the Matthew Walker Comprehensive Health Center for help with the pilot study, Sandra Goring, RD, MSPH, for conducting interviews with study participants, and John Drake, Data Manager, and Zudi Takizala for assistance in preparing the manuscript.