Handling missing data in an FFQ: multiple imputation and nutrient intake estimates

Mari Ichikawa; Akihiro Hosono; Yuya Tamai; Miki Watanabe; Kiyoshi Shibata; Shoko Tsujimura; Kyoko Oka; Hitomi Fujita; Naoko Okamoto; Mayumi Kamiya; Fumi Kondo; Ryozo Wakabayashi; Taiji Noguchi; Tatsuya Isomura; Nahomi Imaeda; Chiho Goto; Tamaki Yamada; Sadao Suzuki

doi:10.1017/S1368980019000168

Handling missing data in an FFQ: multiple imputation and nutrient intake estimates

Published online by Cambridge University Press: 26 February 2019

Yuya Tamai ,

Kyoko Oka ,

Naoko Okamoto and

Mari Ichikawa: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Akihiro Hosono: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Yuya Tamai: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Miki Watanabe: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Kiyoshi Shibata: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan Nagoya University of Economics, Inuyama, Japan
Shoko Tsujimura: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Kyoko Oka: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Hitomi Fujita: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Naoko Okamoto: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Mayumi Kamiya: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Fumi Kondo: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Ryozo Wakabayashi: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Taiji Noguchi: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
Tatsuya Isomura: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan Clinical Study Support Inc., Nagoya, Japan Institute of Medical Science, Tokyo Medical University, Tokyo, Japan
Nahomi Imaeda: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan Department of Nutrition, Faculty of Wellness, Shigakkan University, Obu, Japan
Chiho Goto: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan Department of Health and Nutrition, School of Health and Human Life, Nagoya-bunri University, Inazawa, Japan
Tamaki Yamada: Affiliation:
Okazaki City Medical Association, Public Health Center, Okazaki, Japan
Sadao Suzuki*: Affiliation:
Department of Public Health, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan
*: *Corresponding author: Email ssuzuki@med.nagoya-cu.ac.jp

Article contents

Abstract
Objective
Design
Setting
Participants
Results
Conclusions
Methods
Results
Discussion
Conclusion
Supplementary material
References

Rights & Permissions

Abstract

Objective

We aimed to examine missing data in FFQ and to assess the effects on estimating dietary intake by comparing between multiple imputation and zero imputation.

Design

We used data from the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study. A self-administered questionnaire including an FFQ was implemented at baseline (FFQ1) and 5-year follow-up (FFQ2). Missing values in FFQ2 were replaced by corresponding FFQ1 values, multiple imputation and zero imputation.

Setting

A methodological sub-study of the Okazaki J-MICC study.

Participants

Of a total of 7585 men and women aged 35–79 years at baseline, we analysed data for 5120 participants who answered all items in FFQ1 and at least 50% of items in FFQ2.

Results

Among 5120 participants, the proportion of missing data was 3·7%. The increasing number of missing food items in FFQ2 varied with personal characteristics. Missing food items not eaten often in FFQ2 were likely to represent zero intake in FFQ1. Most food items showed that the observed proportion of zero intake was likely to be similar to the probability that the missing value is zero intake. Compared with FFQ1 values, multiple imputation had smaller differences of total energy and nutrient estimates, except for alcohol, than zero imputation.

Conclusions

Our results indicate that missing values due to zero intake, namely missing not at random, in FFQ can be predicted reasonably well from observed data. Multiple imputation performed better than zero imputation for most nutrients and may be applied to FFQ data when missing is low.

Keywords

FFQ Multiple imputation Missing data Item non-response

Information

Type: Research paper
Information: Public Health Nutrition , Volume 22 , Issue 8 , June 2019 , pp. 1351 - 1360

DOI: https://doi.org/10.1017/S1368980019000168 [Opens in a new window]
Copyright: © The Authors 2019

The FFQ is a dietary assessment tool commonly used in epidemiological studies. However, data collection via FFQ is associated with a serious issue. A long, self-administered, mailed questionnaire provides the opportunity of non-response. Missing values directly influence the estimation of dietary intake from the questionnaire. Several approaches have been used to deal with this problem, including deletion of cases with one or more missing values from the analysis (complete-case analysis) or filling in the missing data with single values, such as the mean, median or mode (single imputation method). Although zero imputation is a common strategy for handling missing data in FFQ⁽ Reference Lamb, Olstad and Nguyen ¹ ⁾, it is doubtful whether the use of this method is acceptable in obtaining unbiased estimates⁽ Reference Ahn, Paik and Ahn ² ^– Reference Parr, Hjartaker and Scheel ⁵ ⁾.

Two studies reported that missing food items did not represent zero intake when the foods were frequently consumed⁽ Reference Fraser, Yan and Butler ⁶ ^, Reference Michels and Willett ⁷ ⁾. In another study, some food items were left blank because of inattention or carelessness⁽ Reference Kuskowska-Wolk, Holte and Ohlander ⁴ ⁾. Additionally, because of single imputation methods, such as zero imputation, where the subject’s response and imputed value for non-response are treated in the same way, an underestimation of variance occurs⁽ Reference Schafer and Graham ⁸ ⁾. Therefore, we believe that although the use of zero imputation is reasonable in some situations, it may give misleading results.

In the last decade, statistical methods for the analysis of missing data and the importance of not ignoring missing values in the analysis have gained attention among researchers. Rubin classified missing data into three categories: (i) missing at random (MAR); (ii) missing completely at random (MCAR); and (iii) missing not at random (MNAR)⁽ Reference Schafer and Graham ⁸ ^– Reference Rubin ¹⁰ ⁾. Data are considered MAR when the probability that the data are missing depends on the observed data but not on the missing data. MCAR is an important, special case of MAR when the probability that the data are missing does not depend on any data. In contrast, data are considered MNAR when the probability that the data are missing depends on the missing values themselves as well as on the observed data. For instance, MAR is when older subjects tend to skip more questions than younger subjects (age is known) and MNAR is when subjects with a higher or lower salary than the average leave questions pertaining to their salary unanswered because of their reluctance to answer. Some researchers have noted that missing data in FFQ are likely to be MNAR because some missing values might represent zero intake⁽ Reference Lamb, Olstad and Nguyen ¹ ^, Reference Parr, Hjartaker and Scheel ⁵ ⁾. When considerable missing values due to zero intake are removed from missing data in FFQ, the remnant data might approach MAR. We considered how to predict missing values due to zero intake in the imputation process, because we cannot remove them from data in practice.

In the literature, missing data are indirect MNAR when the observed data can predict the reason for missing data⁽ Reference Schafer and Graham ⁸ ^– Reference Rubin ¹⁰ ⁾. For example, when subjects with a higher or lower salary than the average are likely to skip a question because of their salary, the missing data are MNAR. When we obtain correlates of their salary, like job and career, subsequently, indirect MNAR data are treated as MAR because the missing salary becomes predictable. In FFQ, missing values for food items that are infrequently consumed tend to represent zero intake; however, this phenomenon may vary with each food item⁽ Reference Hansson and Galanti ³ ^, Reference Parr, Hjartaker and Scheel ⁵ ^– Reference Michels and Willett ⁷ ⁾. Additionally, more information can help to predict missing values in multiple imputation⁽ Reference Fraser and Yan ¹¹ ⁾. The observed data explain how frequently each food item is consumed or is not. When the missing level is relatively low, the observed data can be used to predict the probability of zero intake to some degree, or can provide support to predict missing values by the distribution and pattern of intake. For these reasons, data may be indirect MNAR.

In the case of MAR or MCAR, multiple imputation is the preferred method to account for missing data and is recommended by several journals and medical research guidelines⁽ Reference Klebanoff and Cole ¹² ^, Reference Ware, Harrington and Hunter ¹³ ⁾. The use of this method provides several advantages over conventional methods, such as avoiding loss of sample size and considering the uncertainty of imputed values, but only few nutritional epidemiological studies have applied this method for the missing data in FFQ⁽ Reference Barzi, Woodward and Marfisi ¹⁴ ^, Reference Rizzo, Sabate and Jaceldo-Siegl ¹⁵ ⁾.

We aimed to examine missing data in FFQ, especially missing values due to zero intake, and to assess the effects on total energy and nutrient estimates. We used data from the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study to verify the missing data mechanism of FFQ and to compare nutrient estimates obtained with the two imputation methods. We did not conduct a resurvey and treated the baseline values as reference.

Methods

Okazaki J-MICC study

The present study was conducted as a part of the J-MICC study⁽ Reference Hamajima ¹⁶ ⁾. We recruited 7580 men and women (aged 35–79 years) between 2007 and 2011 from the Okazaki Public Health Center, Okazaki, Aichi, Japan. The self-administered health and lifestyle questionnaire, which contained an FFQ, was mailed prior to the health check-up at the centre. At baseline, all participants received the health check-up and submitted the questionnaire. The submitted questionnaires were checked three times by the investigators. When participants did not answer completely (e.g. leaving blanks or inconsistent answers), we queried them as possible. Between 2013 and 2017, we conducted the second survey for the 5-year follow-up. Mean period from baseline to the second survey was 5·3 years. At the second survey, some participants received the health check-up and submitted the questionnaire in the same way, whereas others provided the questionnaire only by mail. In this case, the questionnaires were neither checked nor resurveyed. Of the total participants, 5321 completed both the baseline (FFQ1) and second surveys (FFQ2). Data pertaining to 5151 participants (96·8%) who fully completed FFQ1 were included in the present analysis.

The study protocol was approved by the institutional review board of the Nagoya University Graduate School of Medical Science and Medical School. Written informed consent was obtained from each participant.

FFQ

The Department of Public Health, Nagoya City University Graduate School of Medical Science and Medical School, developed and validated the FFQ⁽ Reference Tokudome, Goto and Imaeda ¹⁷ ^– Reference Imaeda, Goto and Tokudome ²⁰ ⁾. Participants were asked how often on average over the previous year they consumed forty-seven foods and beverages. The alcohol item was divided into ten subtypes and had six possible responses ranging from ‘never/rarely’ to ‘every day’. Because energy intake is derived mainly from staple foods in the Japanese dietary style, each staple food item (rice, bread and noodles) was divided into each meal (breakfast, lunch and dinner), and six possible responses similar to those included for alcohol subtypes were also included. For other food and beverage items, there were eight possible responses ranging from ‘never/rarely’ to ‘3 or more times per day’; a portion size was specified for each staple food by meal and for the alcohol subtype (e.g. 1 cup or 1 slice of bread). The nutrient intake was calculated from a total of eighty-one questions (62 frequencies= 10 alcohols+9 staple foods+43 other foods; 19 portion sizes=10 alcohols+9 staple foods) by determining the daily intake frequency (‘never/rarely’=0; ‘1–3 times per month’=0·1; ‘1–2 times per week’=0·2; ‘3–4 times per week’=0·5; ‘5–6 times per week’=0·8; ‘once per day’=1·0; ‘twice per day’=2·0; ‘3 or more times per day’=3·0).

In the present analysis, we calculated energy intake derived from alcohol only for current alcohol drinkers. For nutrient calculations, participants were excluded if answers to more than forty of the eighty-one questions in FFQ2 were missing or if their total energy intake in each FFQ was below 2510 kJ/d (600 kcal/d) or above 14 644 kJ/d (3500 kcal/d), according to previous studies⁽ Reference Parr, Hjartaker and Scheel ⁵ ^, Reference Michels and Willett ⁷ ⁾. The purpose of these criteria was to exclude, for example, those who are unwilling to answer the FFQ or who over-/under-report their dietary intake.

Missing values and imputation

Missing alcohol items were permitted if the participants were not current alcohol drinkers; missing portion size was permitted if the participants answered with the frequency ‘never/rarely’ for the item. These permitted missing values were replaced with zero before imputations.

We filled the missing values in FFQ2 using the following imputation methods.

1. FFQ1 imputation (baseline value): missing values were imputed with the baseline value of the same individual in FFQ1.
2. Zero imputation: missing values were imputed with the frequency ‘never/rarely’ and portion size ‘1 unit’ (standard size).
3. Multiple imputation: we conducted multiple imputation by chained equations⁽ Reference van Buuren ²¹ ^, Reference Van Buuren and Oudshoorn ²² ⁾. Missing values for frequency and portion size in the FFQ were imputed as continuous variables (daily intake frequency and portion size per unit, respectively). Sixty data sets were created with a relative efficiency of >0·99 for each variable. Relative efficiency is an indicator of whether a sufficient number of data sets is created⁽ Reference Enders ²³ ⁾. Although the usual practice or recommended method is to create ten to twenty data sets, a large number of data sets is expected to achieve stable estimates and to improve the validity of the significance tests. The imputation model included the following known or suspected variables associated with missing values or outcomes⁽ Reference Fraser, Yan and Butler ⁶ ^, Reference Michels and Willett ⁷ ⁾: age at the second survey (year, continuous); sex (men, women); alcohol drinking (current, former, never); smoking (current, former, never); questionnaire submission (at the centre, by mail); work (full-time workers, other workers, non-workers); education (elementary school/junior high school, high-school graduate, college or more); BMI (kg/m², continuous) and physical activity at leisure time (MET-h/week, continuous; where MET is metabolic equivalent of task). Missing values for these covariates were also imputed. After nutrient calculations, we combined the results into a single estimate with an se.

Statistical analysis

We calculated the mean number of missing values in FFQ2 by category of background variables and evaluated the difference between categories, excluding the missing, by testing with one-way ANOVA. For each food item, the number and proportion of participants who responded with the frequency ‘never/rarely’ and portion size ‘1 unit’ in FFQ2 are presented. For sensitivity analysis, we calculated the mean total energy and nutrient intakes for complete cases and all of FFQ2. We also calculated mean total energy and nutrient intakes by stratifying according to how the questionnaire was submitted (at the centre or by mail). Differences in total energy and nutrient intakes from zero imputation and multiple imputation compared with FFQ1 imputation were tested by paired t tests. We performed all data analyses and imputations using the statistical software package SAS version 9.4. PROC MI and PROC MIANALYZE procedures were used for multiple imputation and combining the results.

Results

Of the 5151 participants who completely answered FFQ1, 3673 participants (71·3%) responded to all the questions in FFQ2 (Table 1). In FFQ2, the mean number of missing values among the 5151 participants was 3·3 per participant (4·1% of data matrix), and 99·4% of them had forty or fewer missing values in the eighty-one questions after allowing for the permitted missing values. The proportion of participants with one missing value was the largest in that of participants with at least one missing value. The number of participants decreased with increasing number of missing values, but a slight increment of participants with nine to twelve missing values was observed. Thirty-one participants were excluded because their FFQ2 had more than forty missing values.

Table 1

Distribution of missing values in FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

Table 2 shows the personal characteristics that might be associated with the number of missing values in FFQ2. The mean number of missing values in FFQ2 increased with age. Similarly, a lower education level and a higher physical activity level indicated more questions being missed in FFQ2 than those indicated by higher education level and lower physical activity. In addition, men, current alcohol drinkers, former smokers and non-workers were more likely to have a larger number of missing values in FFQ2. Obviously, participants who submitted the questionnaires by mail left many more blanks in FFQ2 than those who submitted the questionnaires at the centre.

Table 2

Characteristics related to missing values in FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

† P value when the missing category is excluded.

We investigated the proportion of frequency ‘never/rarely’ in: (i) FFQ2; and (ii) FFQ1 among those who had a missing value in FFQ2 (Table 3). When proportion (i) was over 70%, it was observed that proportion (ii) was more than 80%. Most of the food items had almost the same degree of proportion (i) and (ii), but some food items (e.g. chu-hi, bread at lunch, broccoli) had a higher proportion (ii) than proportion (i).

Table 3

Frequency of intake: response of ‘never/rarely’ in FFQ1 and FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

† Proportions when missing values in FFQ2 are excluded for each item.

‡ Proportions to the number of missing values in FFQ2.

§ Alcoholic drinks are only for current alcohol drinkers at the second survey (n 2702).

We also investigated the proportion of portion size ‘1 unit’ in: (i) FFQ2; and (ii) FFQ1 among those who had a missing value in FFQ2 (Table 4). We excluded the participants who answered ‘never/rarely’ for each item from the analysis of Table 4 because the distribution of portion size was divided into zero intake and others. Most of the participants responded with ‘1 unit’ in FFQ2. When proportion (i) was over 70%, proportion (ii) was observed to be more than 60%. Proportions (ii) of food items excluding rice (bowl at dinner) and whiskey (double) were a little lower than proportions (i).

Table 4

Portion size of intake: response of ‘1 unit’ in FFQ1 and FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

† Excluded when the frequency of each item is ‘never/rarely’. Proportions when missing values in FFQ2 are excluded for each item.

‡ Proportions to the number of missing values in FFQ2.

§ Alcoholic drinks are only for current alcohol drinkers at the second survey (n 2702).

Table 5 shows the estimated dietary intakes of total energy and selected nutrients from FFQ1 and FFQ2. In complete-case analysis, the total energy decreased from 7082 kJ/d (1693 kcal/d) in FFQ1 to 6834 kJ/d (1633 kcal/d) in FFQ2. There were slight changes in the intakes of protein, fat, carbohydrate and dietary fibre. In all participants, zero-imputed FFQ2 presented the smallest total energy, whereas FFQ1 presented the highest total energy. The results of multiple imputation were almost identical to estimates from FFQ1 imputation. Particularly, there were the smaller differences of total energy and carbohydrate (g) from multiple imputation to FFQ1 imputation than those from zero imputation. No significant differences between multiple imputation and FFQ1 imputation were observed for total energy, protein (% energy), fat (% energy), carbohydrate (g and % energy) and dietary fibre (g/1000 kcal). Less than forty participants were excluded from the total of 5120 participants because their total energy intake was <2510 kJ/d (<600 kcal/d) or >14 644 kJ/d (>3500 kcal/d).

Table 5

Estimated dietary intakes of total energy and selected nutrients from FFQ1 and FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

n indicates the number of participants who had forty or fewer missing values and total energy intake between 2510 kJ/d (600 kcal/d) and 14 644 kJ/d (3500 kcal/d). n _alc indicates the number of participants who consumed >0 g alcohol/d in n participants. Energy from alcohol was not included in the total energy.

*P<0·05, **P<0·01, ***P<0·001 for paired t test between imputation methods.

As a sensitivity analysis, participants were stratified by questionnaire submission (see online supplementary material). Obviously, the results of participants who submitted a questionnaire at the centre were exactly similar between the imputation methods, because the proportion of missing values was too small (0·3% of data matrix; Supplemental Table 1). In participants who submitted by mail, where the proportion of missing values was 11·5% of the data matrix, multiple imputation presented smaller differences of total energy and nutrient intake except for alcohol to FFQ1 imputation than zero imputation, although these differences were larger than the results of all participants (Supplemental Table 2).

Discussion

From the population of the second survey of the Okazaki J-MICC study, we examined the missing data in FFQ and then compared the results of some imputation methods for total energy and nutrient intakes. Missing food items that are not eaten often were likely to represent zero intake, and the number of missing values was associated with personal characteristics. The estimated total energy and nutrient intakes of the multiple imputation method were found to be slightly different from those of the single imputation method (zero or baseline value). In particular, multiple imputation showed almost the same estimates as FFQ1 imputations (imputing baseline values). Our findings suggest that the application of multiple imputation to missing data in FFQ is a reasonable choice when the proportion of missing data is relatively low.

To the best of our knowledge, only two studies have investigated the potential effects of multiple imputation on missing data in FFQ. Barzi et al. applied repeated measures of the consumption of five food items in the GISSI-Prevenzione study and suggested that the use of multiple imputation is likely to provide valid estimates⁽ Reference Barzi, Woodward and Marfisi ¹⁴ ⁾. Fraser and Yan introduced guided multiple imputation to fill in a random sub-sample of initially missing data and to adjust the imputation model using this extra information as a guide⁽ Reference Fraser and Yan ¹¹ ⁾. Compared with other methods (complete-case analysis, zero imputation and multiple imputation), the food intake frequencies showed a moderate difference. Although ascertaining whether the missing data in FFQ were actually MAR was a common problem in these studies, a lower proportion of missing values is likely to have minimized bias.

We investigated the mechanism of missing data in FFQ by comparing with baseline values. Most food items showed that the proportion of zero intake in the observed data was likely to be similar to the probability of zero intake in missing data; whereas for some food items this did not hold. The former indicates that missing values due to zero intake can be predicted from the observed distribution of how rarely a particular food is consumed in the population, and the data can be treated as MAR and not MNAR. The latter seems to indicate that there were other factors that influenced the missingness than just a zero intake; that is, missing data in FFQ may be MNAR. These findings are consistent with the results of previous studies that showed that each food item that is not eaten often is likely to represent zero⁽ Reference Parr, Hjartaker and Scheel ⁵ ^– Reference Michels and Willett ⁷ ⁾. For this reason, it would be reasonable to impute zero at least for rarely eaten foods. However, zero imputation can introduce an element of bias because it is impossible that all missing values are zero.

Additionally, the estimates of total energy and nutrient intakes determined with multiple imputation were equivalent to those determined with FFQ1 imputation, which suggests that multiple imputation can predict the missing values due to zero intake. Even if we determine the ‘true’ value for the missing food items, it might lie between the values determined with zero imputation and FFQ1 imputation. In the present study, the proportion of missing values in our data was relatively low, and a comparison of imputation methods might be much more difficult because of little freedom for any moderate difference to be displayed. To deal with this problem, we carried out stratified analysis and presented that multiple imputation had the smaller differences to FFQ1 values in total energy and nutrient intakes, except for alcohol, than zero imputation. The application of multiple imputation might provide a more accurate estimate when the probability of missing data is less than 11·5%. For alcohol, several possible reasons exist. First, it seemed to be more difficult to predict and impute missing values than for the other items because the alcohol item has detailed questions. Additional covariates for more accurate imputation may be needed. Second, missing alcohol values were more likely to represent zero intake than other food items.

Furthermore, imputing a single value ignores the uncertainty of imputed values and can lead to biased results because of underestimating the variance⁽ Reference Schafer and Graham ⁸ ⁾. Regarding the current trend in medical research, it is important to appropriately handle missing data and to avoid misleading results. Our findings suggest that multiple imputation may be employed for handling missing data in future studies involving FFQ.

The present study has several limitations. First, we treated baseline values in FFQ1 as a reference and did not perform a resurvey. Some studies have shown that total energy and food intakes decrease with age⁽ Reference Otsuka, Kato and Nishita ²⁴ ^, Reference Wakimoto and Block ²⁵ ⁾; further, our results of complete-case analysis showed a decrease in total energy and nutrient intakes over a mean follow-up period of 5·3 years. To improve the accuracy of multiple imputation, the investigation of additional variables in the imputation model or of item-level and score-level multiple imputation⁽ Reference Gottschall, West and Enders ²⁶ ^, Reference Plumpton, Morris and Hughes ²⁷ ⁾ may be needed. However, because of the slight difference in nutrient estimates among imputation methods, this was not a problem in the present study.

Second, we used the complete-case data in FFQ1. Complete-case analysis may lead to biased results unless the mechanism of missing data is MCAR⁽ Reference Schafer and Graham ⁸ ⁾. However, all questionnaires of the baseline survey were reviewed and the percentage of our participants who completely answered FFQ1 was 96·8%. Thus, the missing data of FFQ1 were considered to be MCAR and had no impact on our results.

Third, the questionnaires used in the second survey were submitted by two different approaches. Approximately 70% of the questionnaires were reviewed at the time of health check-up of the participants, whereas approximately 30% were submitted by mail and did not receive a review. Consequently, a higher proportion of our participants completely answered FFQ2 compared with that in some previous studies (34–41%)⁽ Reference Michels and Willett ⁷ ^, Reference Caan, Hiatt and Owen ²⁸ ⁾, which would contribute to a more plausible imputation rather than any undesirable results from multiple imputation⁽ Reference White, Royston and Wood ²⁹ ⁾.

Conclusion

Zero imputation was a reasonable choice for a limited number of food items only when the missing values were due to ‘zero intake’. To appropriately handle missing data in FFQ, we recommended multiple imputation, taking account of the uncertainty of imputed values. The present study confirmed that multiple imputation would be applicable to missing data in FFQ when missing level is relatively low. Particularly, if missing level exceeds about 10% of data, one should carefully consider the effect of the imputation method on epidemiological risk analysis with a disease outcome, at least for dietary intake.

Supplementary material

To view supplementary material for this article, please visit https://doi.org/10.1017/S1368980019000168

Acknowledgements

Acknowledgements: The authors would like to thank all the staff and participants for supporting this study. Financial support: This study was supported in part by the Grants-in-Aid for Scientific Research on Priority Areas of Cancer (No. 17015018), on Innovative Areas (No. 221S0001), and JSPS KAKENHI Grant Number JP (No. 16H06277) from the Japanese Ministry of Education, Culture, Sports, Science and Technology. The Japanese Ministry of Education, Culture, Sports, Science and Technology had no role in the design, analysis or writing of this article. Conflict of interest: None declared. Authorship: S.S. was responsible for the design and funding of the present study. T.Y. is the director of the Public Health Center of the Okazaki City Medical Association. N.I. and C.G. developed and provided the calculation program for the FFQ. A.H. took the lead role in collecting data. K.S., S.T., M.I., Y.T., M.W., K.O., H.F., N.O., M.K., F.K., R.W. and T.N. also contributed to data collection. M.I. performed statistical analysis. S.S. and T.I. advised on drafting the manuscript. All authors have checked and approved the final version of the manuscript submitted and any revised version. Ethics of human subject participation: The study protocol was approved by the institutional review board of the Nagoya University Graduate School of Medical Science and Medical School. Written informed consent was obtained from each participant.

References

1. Lamb, KE, Olstad, DL, Nguyen, C et al. (2017) Missing data in FFQs: making assumptions about item non-response. Public Health Nutr 20, 965–970.Google Scholar

2. Ahn, Y, Paik, HY & Ahn, YO (2006) Item non-responses in mailed food frequency questionnaires in a Korean male cancer cohort study. Asia Pac J Clin Nutr 15, 170–177.Google Scholar

3. Hansson, LM & Galanti, MR (2000) Diet-associated risks of disease and self-reported food consumption: how shall we treat partial nonresponse in a food frequency questionnaire? Nutr Cancer 36, 1–6.Google Scholar

4. Kuskowska-Wolk, A, Holte, S, Ohlander, EM et al. (1992) Effects of different designs and extension of a food frequency questionnaire on response rate, completeness of data and food frequency responses. Int J Epidemiol 21, 1144–1150.Google Scholar

5. Parr, CL, Hjartaker, A, Scheel, I et al. (2008) Comparing methods for handling missing values in food-frequency questionnaires and proposing k nearest neighbours imputation: effects on dietary intake in the Norwegian Women and Cancer study (NOWAC). Public Health Nutr 11, 361–370.Google Scholar

6. Fraser, GE, Yan, R, Butler, TL et al. (2009) Missing data in a long food frequency questionnaire: are imputed zeroes correct? Epidemiology 20, 289–294.Google Scholar

7. Michels, KB & Willett, WC (2009) Self-administered semiquantitative food frequency questionnaires: patterns, predictors, and interpretation of omitted items. Epidemiology 20, 295–301.Google Scholar

8. Schafer, JL & Graham, JW (2002) Missing data: our view of the state of the art. Psychol Methods 7, 147–177.Google Scholar

9. Enders, CK (2010) Applied Missing Data Analysis. New York: Guilford Press.Google Scholar

10. Rubin, DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91, 473–489.Google Scholar

11. Fraser, G & Yan, R (2007) Guided multiple imputation of missing data: using a subsample to strengthen the missing-at-random assumption. Epidemiology 18, 246–252.Google Scholar

12. Klebanoff, MA & Cole, SR (2008) Use of multiple imputation in the epidemiologic literature. Am J Epidemiol 168, 355–357.Google Scholar

13. Ware, JH, Harrington, D, Hunter, DJ et al. (2012) Missing data. N Engl J Med 367, 1353–1354.Google Scholar

14. Barzi, F, Woodward, M, Marfisi, RM et al. (2006) Analysis of the benefits of a Mediterranean diet in the GISSI-Prevenzione study: a case study in imputation of missing values from repeated measurements. Eur J Epidemiol 21, 15–24.Google Scholar

15. Rizzo, NS, Sabate, J, Jaceldo-Siegl, K et al. (2011) Vegetarian dietary patterns are associated with a lower risk of metabolic syndrome: the Adventist Health Study 2. Diabetes Care 34, 1225–1227.Google Scholar

16. Hamajima, N & J-MICC Study Group (2007) The Japan Multi-Institutional Collaborative Cohort Study (J-MICC Study) to detect gene–environment interactions for cancer. Asian Pac J Cancer Prev 8, 317–323.Google Scholar

17. Tokudome, S, Goto, C, Imaeda, N et al. (2004) Development of a data-based short food frequency questionnaire for assessing nutrient intake by middle-aged Japanese. Asian Pac J Cancer Prev 5, 40–43.Google Scholar

18. Tokudome, Y, Goto, C, Imaeda, N et al. (2005) Relative validity of a short food frequency questionnaire for assessing nutrient intake versus three-day weighed diet records in middle-aged Japanese. J Epidemiol 15, 135–145.Google Scholar

19. Goto, C, Tokudome, Y, Imaeda, N et al. (2006) Validation study of fatty acid consumption assessed with a short food frequency questionnaire against plasma concentration in middle-aged Japanese people. Scand J Food Nutr 50, 77–82.Google Scholar

20. Imaeda, N, Goto, C, Tokudome, Y et al. (2007) Reproducibility of a short food frequency questionnaire for Japanese general population. J Epidemiol 17, 100–107.Google Scholar

21. van Buuren, S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16, 219–242.Google Scholar

22. Van Buuren, S & Oudshoorn, C (2000) Multivariate Imputation by Chained Equations: MICE V1.0 User’s Manual. TNO Report no. PG/VGZ/00.038. Leiden: TNO Prevention and Health.Google Scholar

23. Enders, CK (2010) Applied Missing Data Analysis, pp. 212–213. New York: Guilford Press.Google Scholar

24. Otsuka, R, Kato, Y, Nishita, Y et al. (2016) Age-related changes in energy intake and weight in community-dwelling middle-aged and elderly Japanese. J Nutr Health Aging 20, 383–390.Google Scholar

25. Wakimoto, P & Block, G (2001) Dietary intake, dietary patterns, and changes with age: an epidemiological perspective. J Gerontol A Biol Sci Med Sci 56, Spec. No. 2, 65–80.Google Scholar

26. Gottschall, AC, West, SG & Enders, CK (2012) A Comparison of item-level and scale-level multiple imputation for questionnaire batteries. Multivariate Behav Res 47, 1–25.Google Scholar

27. Plumpton, CO, Morris, T, Hughes, DA et al. (2016) Multiple imputation of multiple multi-item scales when a full imputation model is infeasible. BMC Res Notes 9, 45.Google Scholar

28. Caan, B, Hiatt, RA & Owen, AM (1991) Mailed dietary surveys: response rates, error rates, and the effect of omitted food items on nutrient values. Epidemiology 2, 430–436.Google Scholar

29. White, IR, Royston, P & Wood, AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30, 377–399.Google Scholar

Table 1 Distribution of missing values in FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

Table 2 Characteristics related to missing values in FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

Table 3 Frequency of intake: response of ‘never/rarely’ in FFQ1 and FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

Table 4 Portion size of intake: response of ‘1 unit’ in FFQ1 and FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

Table 5 Estimated dietary intakes of total energy and selected nutrients from FFQ1 and FFQ2 of the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study

Ichikawa et al. supplementary material

Tables S1 and S2

File 51.4 KB

Article contents

Handling missing data in an FFQ: multiple imputation and nutrient intake estimates

Abstract

Keywords

Information

Methods

Okazaki J-MICC study

FFQ

Missing values and imputation

Statistical analysis

Results

Discussion

Conclusion

Supplementary material

Acknowledgements

References

Ichikawa et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests