Skip to main content Accessibility help
×
×
Home

A comparison of heuristic and model-based clustering methods for dietary pattern analysis

  • Benjamin Greve (a1) (a2), Iris Pigeot (a1) (a2), Inge Huybrechts (a3) (a4), Valeria Pala (a5) and Claudia Börnhorst (a1)...

Abstract

Objective

Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward’s method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data.

Design

The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice.

Results

The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward’s methods, the performance of k-means was better in 64–100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a ‘non-processed’ cluster with a high consumption of fruits, vegetables and wholemeal bread, a ‘balanced’ cluster with only slight preferences of single foods and a ‘junk food’ cluster.

Conclusions

The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      A comparison of heuristic and model-based clustering methods for dietary pattern analysis
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      A comparison of heuristic and model-based clustering methods for dietary pattern analysis
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      A comparison of heuristic and model-based clustering methods for dietary pattern analysis
      Available formats
      ×

Copyright

Corresponding author

*Corresponding author: Email boern@bips.uni-bremen.de

References

Hide All
1. Weinsier, RL, Hunter, GR, Heini, AF et al. (1998) The etiology of obesity: relative contribution of metabolic factors, diet, and physical activity. Am J Med 105, 145150.
2. Bowman, SA, Gortmaker, SL, Ebbeling, CB et al. (2004) Effects of fast-food consumption on energy intake and diet quality among children in a national household survey. Pediatrics 113, 112118.
3. Nicklas, TA, Webber, LS, Srinivasan, SR et al. (1993) Secular trends in dietary intakes and cardiovascular risk factors of 10-y-old children: the Bogalusa Heart Study (1973–1988). Am J Clin Nutr 57, 930937.
4. Gonzalez, CA & Riboli, E (2010) Diet and cancer prevention: contributions from the European Prospective Investigation into Cancer and Nutrition (EPIC) study. Eur J Cancer 46, 25552562.
5. Moeller, SM, Reedy, J, Millen, AE et al. (2007) Dietary patterns: challenges and opportunities in dietary patterns research. J Am Diet Assoc 107, 12331239.
6. Newby, PK & Tucker, KL (2004) Empirically derived eating patterns using factor or cluster analysis. Nutr Rev 62, 177203.
7. Everitt, BS, Landau, S & Leese, M (2001) Cluster Analysis, 4th ed. New York: Wiley.
8. Fahey, MT, Thane, CW, Bramwell, GD et al. (2007) Conditional Gaussian mixture modelling for dietary pattern analysis. J R Stat Soc Ser A Stat Soc 170, 149166.
9. Fraley, C & Raftery, AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41, 578588.
10. Celebi, ME, Kingravi, HA & Vela, PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Applic 40, 200210.
11. Milligan, G (1980) An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45, 325342.
12. Dempster, AP, Laird, NM & Rubin, DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Stat Methodol 39, 138.
13. McLachlan, GJ & Chang, SU (2004) Mixture modelling for cluster analysis. Stat Methods Med Res 13, 347361.
14. Biernacki, C, Celeux, G & Govaert, G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41, 561575.
15. Banfield, JD & Raftery, AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803821.
16. Fraley, C & Raftery, AE (2003) Enhanced software for model-based clustering, density estimation, and discriminant analysis: MCLUST. J Classif 20, 263286.
17. Core Team, R (2012) R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
18. Qiu, W & Joe, H (2013) R Package ‘clusterGeneration’: random cluster generation (with specified degree of separation), version 1.3.1. http://cran.r-project.org/web/packages/clusterGeneration/clusterGeneration.pdf
19. Qiu, W & Joe, H (2006) Generation of random clusters with specified degree of separation. J Classif 23, 315334.
20. Hubert, L & Arabie, P (1985) Comparing partitions. J Classif 2, 193218.
21. Ahrens, W, Bammann, K, Siani, A et al. (2011) The IDEFICS cohort: design, characteristics and participation in the baseline survey. Int J Obes (Lond) 35, Suppl. 1, S3S15.
22. Lanfer, A, Hebestreit, A, Ahrens, W et al. (2011) Reproducibility of food consumption frequencies derived from the Children’s Eating Habits Questionnaire used in the IDEFICS study. Int J Obes (Lond) 35, Suppl. 1, S61S68.
23. Bel-Serrat, S, Mouratidou, T, Pala, V et al. (2014) Relative validity of the Children’s Eating Habits Questionnaire-food frequency section among young European children: the IDEFICS Study. Public Health Nutr 17, 266276.
24. Huybrechts, I, Börnhorst, C, Pala, V et al. (2011) Evaluation of the Children’s Eating Habits Questionnaire used in the IDEFICS study by relating urinary calcium and potassium to milk consumption frequencies among European children. Int J Obes (Lond) 35, Suppl. 1, S69S78.
25. Cole, TJ, Bellizzi, MC, Flegal, KM et al. (2000) Establishing a standard definition for child overweight and obesity worldwide: international survey. BMJ 320, 12401243.
26. Cole, TJ, Flegal, KM, Nicholls, D et al. (2007) Body mass index cut offs to define thinness in children and adolescents: international survey. BMJ 335, 194.
27. Fransen, HP, May, AM, Stricker, MD et al. (2014) A posteriori dietary patterns: how many patterns to retain? J Nutr 144, 12741282.
28. Börnhorst, C, Huybrechts, I, Ahrens, W et al. (2013) Prevalence and determinants of misreporting among European children in proxy-reported 24 h dietary recalls. Br J Nutr 109, 12571265.
29. Carroll, RJ, Freedman, LS & Kipnis, V (1998) Measurement error and dietary intake. Adv Exp Med Biol 445, 139145.
30. Kipnis, V, Subar, AF, Midthune, D et al. (2003) Structure of dietary measurement error: results of the OPEN biomarker study. Am J Epidemiol 158, 1421.
31. Hunt, L & Jorgensen, M (1999) Mixture model clustering using the multimix program. Aust N Z J Stat 41, 153171.
32. Hunt, L & Jorgensen, M (2011) Clustering mixed data. WIREs Data Mining Knowl Discov 1, 352361.
33. Lee, G & Scott, C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56, 28162829.
34. Gaio, AR, da Costa, JP, Santos, AC et al. (2012) A restricted mixture model for dietary pattern analyis in small samples. Stat Med 31, 21372150.
35. Oh, MS & Raftery, AE (2007) Model-based clustering with dissimilarities: a Bayesian approach. J Comput Graph Stat 16, 559585.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Public Health Nutrition
  • ISSN: 1368-9800
  • EISSN: 1475-2727
  • URL: /core/journals/public-health-nutrition
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed