Cluster analysis of polyphenol intake in a French middle-aged population (aged 35–64 years)

Polyphenols have been suggested as protective factors for a range of chronic diseases. However, studying the impact of individual polyphenols on health is hindered by the intrinsic inter-correlations among polyphenols. Alternatively, studying foods rich in specific polyphenols fails to grasp the ubiquity of these components. Studying overall dietary patterns would allow for a more comprehensive description of polyphenol intakes in the population. Our objective was to identify clusters of dietary polyphenol intakes in a French middle-aged population (35–64 years old). Participants from the primary prevention trial SUpplementation en VItamines et Minéraux AntioXydants (SU.VI.MAX) study were included in the present cross-sectional study (n 6092; 57·8 % females; mean age 48·7 (sd 6·4) years). The fifty most consumed individual dietary polyphenols were divided into energy-adjusted tertiles and introduced in a multiple correspondence analysis (MCA), leading to comprehensive factors of dietary polyphenol intakes. The identified factors discriminating polyphenol intakes were used in a hierarchical clustering procedure. Four clusters were identified, corresponding broadly to clustered preferences for their respective food sources. Cluster 1 was characterised by high intakes of tea polyphenols. Cluster 2 was characterised by high intakes of wine polyphenols. Cluster 3 was characterised by high intakes of flavanones and flavones, corresponding to high consumption of fruit and vegetables, and more broadly to a healthier diet. Cluster 4 was characterised by high intakes of hydroxycinnamic acids, but was also associated with alcohol consumption and smoking. Profiles of polyphenol intakes allowed for the identification of meaningful combinations of polyphenol intakes in the diet.

authors have bypassed this limitation by assessing the impact of polyphenol-rich foods (e.g. tea or cocoa) on health (27,28) . However, studying specific foods is subject to the same kind of limitations, as foods are eaten in combination, within meals (29) . Therefore synergistic or interactive combinations of polyphenols within one food source are likely to also interfere between various foods eaten together (13) . To overcome the limitation of investigating individual polyphenols or foods, some have argued for the investigation of overall dietary patterns (30,31) .
Studying overall dietary patterns would allow for a more comprehensive description of combinations of individual polyphenols within an individual's diet. Our objective was to identify mutually exclusive groups of dietary polyphenol intakes in a French middle-aged population, in order to identify meaningful combinations of individual polyphenol intakes within the diet.

Population
Subjects included in the present study were selected from participants in the SUpplementation en VItamines et Minéraux AntioXydants (SU.VI.MAX) study. Briefly, middle-aged participants from the general population (35-64 years old) were included in 1994-1995 in a randomised, double-blind, placebo-controlled, primary prevention trial (Trial Registration clinicaltrials.gov no. NCT00272428) designed to evaluate the effect of a planned 8-year supplementation in antioxidant vitamins and minerals at nutritional doses on the incidence of CVD and cancer (32) . This study is a cross-sectional observational study using baseline data from the SU.VI.MAX study.

Ethics
The SU.VI.MAX study was approved by the Ethics Committee for Studies with Human Subjects of Paris-Cochin Hospital (no. 706) and the Commission Nationale Informatique et Liberté (no. 334641). All subjects gave written informed consent to participate in the study.

Dietary data assessment
Dietary assessment was carried out via repeated 24 h records (1994)(1995)(1996), collected by computerised questionnaires using the Minitel Telematic Network loaded with study-specific software, as described before (32) . The Minitel was a small terminal widely used in France as an adjunct to the telephone. Dietary collection dates were randomised and fixed for each participant so that each day of the week and all seasons were covered. A validated instruction manual was used to code food portions, including more than 250 generic items, corresponding to 1000 specific foods (33) . Foods were classified into thirtytwo food groups. A French published food composition table was used to calculate nutrient intakes from 24 h dietary records (34) . A specific food composition table was used to compute dietary polyphenol intake, based on the published Phenol-Explorer Database (www.phenol-explorer.eu) (1) . The database contains food-composition data for all known polyphenols (flavonoids, phenolic acids, lignans, stilbenes and other minor polyphenols) in foods. Moreover, it includes data on glycosides and esters. It contains data on a total of 502 polyphenols (1) . Individual polyphenols' contents in foods were determined by chromatography (most often reversephase HPLC and gas chromatography), except for proanthocyanidins > 4mers, for which content data obtained by normal-phase HPLC were used.
Subjects having at least six dietary records available in the first 2 years of the study (1994)(1995)(1996), with at least three records during the autumn-winter months and three during the spring-summer months, were included in the present study. The number of dietary records retained and the balance of dietary records between seasons were chosen in order to take into account day-to-day and seasonal intra-individual variability in food intake.

Sociodemographic and anthropometric data
Educational level (primary, secondary, superior), physical activity (irregular, <1 h equivalent walking/d, ≥1 h equivalent walking/d) and smoking status, including cigarettes, cigars and pipes (never smoked, former smoker, current smoker) data were obtained through self-administered questionnaires at baseline.
Anthropometric measurements were taken at a clinical examination 1 year after inclusion in the SU.VI.MAX study. Weight was measured in subjects in light clothing and with no shoes to the nearest 0·1 kg and height was measured to the nearest 1 cm with a wall-mounted stadiometer under the same conditions. When measured weight and height were not available, self-reported weight and height were used instead of measured data (n 988; 16·2 %).

Statistical analysis
BMI was calculated as weight (in kg) divided by the square of height (in m).
Adherence to the traditional Mediterranean diet was computed using the Mediterranean Diet Score, as described by Trichopoulou et al. (35) .
Mean daily intake of each nutrient and polyphenol was calculated for each subject across their 24 h dietary records. Then, median intakes of individual polyphenols were computed for the whole population, and the fifty most consumed polyphenols (according to median intake) were considered for the subsequent analysis of clusters of polyphenol intakes. The objective of the analysis was to group individuals in mutually exclusive groups according to their overall intakes in the fifty selected polyphenols.
Polyphenol intakes, nutrient intakes and food group consumption were considered in terms of energy-adjusted intakes using the residual method (36) . Energy-adjusted intakes of the fifty selected individual polyphenols were divided in tertiles and then introduced as input variables in a multiple correspondence analysis (MCA). Factors extracted from the MCA were selected for a subsequent cluster analysis. The number of dimensions used for cluster analysis was selected using the explained inertia (% of the initial variability) that they represented. The dimensions were retained in the analyses if they represented >7 % of total inertia and the number of clusters to include in the model was selected using the plot of semipartial R 2 and the cubic clustering criterion by the number of clusters.
The identified clusters of polyphenol intakes were described in terms of individual polyphenol intakes, sociodemographic, lifestyle and anthropometric data and finally dietary intake (nutrients and food groups). All results for individual polyphenol intakes, nutrient intakes or food group consumption are presented as mean energy-adjusted variables. Clusters were compared using χ 2 tests for categorical variables and ANOVA for continuous variables, given the normal distribution of the variables, in particular energy-adjusted residuals.
All tests were two-sided and P < 0·001 was considered statistically significant, correcting for multiple comparisons. SAS version 9.3 (SAS Institute, Inc.) was used for analyses

Results
Among the 13 017 subjects included in the initial SU.VI.MAX study, 6092 had at least six dietary records available (Fig. 1), with at least three in the spring-summer months and three in the autumn-winter months and were included in the study (mean number of dietary records = 11·0 (SD 2·1); mean number of spring-summer dietary records = 5·0 (SD 1·2); mean number of autumn-winter months dietary records = 6·0 (SD 1·6)). The sample included 57·8 % of women, with a mean age of 48·7 (SD 6·4) years.
The fifty most consumed polyphenols belonged mostly to flavonoids and phenolic acids. Selected individual polyphenols included flavanols (six catechins, nine proanthocyanidins (six individual trimers or dimers, three measured using normalphase HPLC by degree of polymerisation)), one dihydroflavonol, four anthocyanins, three flavanones, three flavones, eight flavonols, two hydroxybenzoic acids, thirteen hydroxycinnamic acids and one other polyphenols (tyrosol). No lignans or stilbens were represented in the selection.
Four main factors were extracted from the MCA procedure, explaining 39 % of total inertia. Plots of the cubic clustering criteria and semi-partial R 2 by the number of clusters allowed us to identify four clusters of dietary polyphenol intakes as the best solution.
Sociodemographic and lifestyle characteristics of participants according to clusters of dietary polyphenol intakes are shown in Table 3. Compared with other clusters, cluster 1 had the lowest percentage of subjects with no diploma and primary education and had the highest mean age. Cluster 2 had the highest percentage of subjects with irregular physical activity, as well as subjects with the highest BMI. Cluster 3 had the highest percentage of subjects with university education and the lowest percentage of smokers, the lowest mean age and lowest BMI. Cluster 4 had the highest percentage of subjects with no diploma or primary education, the highest percentage of smokers and the highest percentage of subjects with ≥1 h equivalent walking/d.
Consumption of food groups across clusters is shown in Table 4. Cluster 1 was characterised by very high consumption of tea and low consumption of coffee, sweetened beverages, starchy foods (pasta, rice, potatoes), meat, processed meat and snacks (sweet and savoury). Cluster 2 was characterised by a high consumption of wine, and low consumption of bread and legumes, dairy products, fruit and vegetables, fish and low Mediterranean Diet Scores. Cluster 3 was characterised by high consumption of almost all food groups, but more importantly of fruit, vegetables, fish, milk and dairy products and starchy foods, but also sweetened beverages and snacks. It was also characterised by a high Mediterranean Diet Score. It also had the lowest consumption of wine. Cluster 4 was characterised by high consumption of coffee, spirits and beer, meat and processed meat, and low consumption of tea, milk and breakfast cereals.
Nutrient intakes across clusters are shown in Table 5. Cluster 1 had the lowest energy intake, and the lowest intakes of saturated and polyunsaturated fat, as well as lowest intakes of vitamins E and B 9 , Ca, Na and Fe. Cluster 2 had the lowest energy intake from carbohydrates, highest energy intake from proteins and alcohol; cluster 2 had the lowest intake of dietary fibres, β-carotene, vitamin C and vitamin D. Cluster 3 had the highest energy intake, highest energy intake from carbohydrates and fat, and the lowest energy intake from proteins and alcohol and the highest intakes of all nutrients: fibres, saturated and polyunsaturated fat, vitamins and trace elements. Participants in cluster 4 did not have particularly high or low nutrient intakes compared with other clusters.

Discussion
Using detailed dietary data from a large sample from the general population, we were able to identify specific profiles of polyphenol intake. Given the fact that most polyphenol compounds share food sources, ascertaining independent effects of dietary components is often subject to multicollinearity and therefore challenging. Thus, it is difficult to disentangle the potential effect of each individual polyphenol and of overall dietary polyphenols from the effect of other food constituents (such as antioxidant vitamins and minerals) provided by the same vector food.
Our results show that individual polyphenols from a single subclass can be associated with different patterns of intakes DP-HPLC, normal-phase HPLC by degree of polymerisation. * Mean intakes are calculated from residuals after taking into account energy intake. † P value from mean comparison by ANOVA. ‡ Clusters with the highest mean intake of individual polyphenols. § Clusters with the lowest mean intake of individual polyphenols.     (for example, individual flavonols are associated with three different clusters). However, to date, polyphenol intakes have been mostly investigated as classes and subclasses (flavonoids, anthocyanins or 'coffee polyphenols'), focusing on major contributing foods or relating their intake to specific health outcomes (7,(9)(10)(11)(12)(13)(14)37,38) . In the light of our results, such an approach could lead some elements of differing dietary behaviours to be integrated into a single indicator. Our study shows that taking into account dietary patterns of polyphenols is crucial, given the observed associations of polyphenol intakes within individuals' diets. To our knowledge, this is the first study investigating clusters of polyphenol intakes in a general population sample. A holistic approach combining polyphenol intakes in a single a priori score has been recently developed, and has been shown to be associated with the Mediterranean diet and low-grade inflammation (39,40) . Such an approach is complementary to a posteriori patterns, as it builds on current knowledge on the relationships between polyphenol intakes and health, while our approach aimed at investigating natural occurring associations of dietary polyphenols within the population's diets. Interestingly, tea appeared as a major discriminant factor for cluster identification. Polyphenols contained in tea include catechins (tea contributing from 15 to 63 % of intake), procyanidin dimers and trimers (tea contributing from 12 to 48 % of intake), flavonols (kaempferol and quercetin compounds, tea contributing from 3 to 56 % of intake) and theaflavins (tea contributing to 100 % of their intake). However, only four catechins out of the fifty selected individual polyphenols for analyses had tea as their main food source; for these five, tea contributed to about 55 % of intake for each (see Supplementary Tables S1 and S2). Besides, cluster 1 was characterised by high levels of other polyphenol intakes (proanthocyanidins and quercetin compounds) for which tea appeared but as a minor contributor. Conversely, specific flavonoid compounds of tea, theaflavins, which are obtained through tea leaf processing, were not consumed in the population in sufficiently high amounts to be included in the individual polyphenols selected for the MCA procedure (41) . Tea is one of the major sources of polyphenols in Western diets (7,9,12,37) . In a study using a sample of subjects from the National Health and Nutrition Examination Survey (NHANES), 21·3 % of the population consumed tea (37) , yet tea was the major polyphenol source for the whole sample (9) . Consistent with our results, tea consumers had higher intakes of flavonols and catechins (37) . Moreover, sociodemographic profiles identified in the NHANES were similar to those from our study, as tea consumers were more likely to be women and older; however, they also tended to have lower levels of physical activity, which was not observed in our cluster 1 (37) . Similarly, in the European Prospective Investigation into Cancer and Nutrition (EPIC) study, determinants of intake of theaflavins were higher diplomas and lower BMI (11) .
Opposing consumptions were observed between tea and coffee. Coffee and tea compete with each other at a world level, countries having a preference for either one beverage or the other (42) . However, competing patterns of consumption have not always been identified at the individual level. Frary et al. analysed patterns of beverage consumption in the USA, and among the six identified clusters, one included consumers of both tea and coffee (43) . However, although the USA shares Europe's preference for coffee over tea (42) , overall beverage consumption patterns differ (44) . Cluster 4 was characterised by high intakes of coffee, beer and spirits associated with smoking. Paired associations between alcohol and tobacco have long been identified (45) , as well as a clustering of risky behaviours in smokers: smokers tend to have higher intakes of alcohol, unhealthier diets and lower levels of physical activity (46,47) . Paired associations have also been observed between coffee intake and smoking, and alcohol consumption and coffee intake (45,48) . Consistent with our results, associations were observed in the French prospective cohort study E3N-EPIC (Etude Epidémiologique auprès des femmes de la Mutuelle Générale de l'Education Nationale -European Prospective Investigation into Cancer and Nutrition) between alcohol intake, smoking status and consumption of coffee (49) . Moreover, as in cluster 4, alcohol consumption was associated with intakes of processed meat (49) . This clustered association between coffee, smoking and alcohol, could in part explain the conflicting results observed in the associations between coffee intake and CVD (50,51) . Cluster 2 was characterised by high consumption of wine, intermediate consumption of both coffee and tea and low consumption of other food groups. Cluster 3 was characterised by high consumption in almost all food groups considered. Our results confirm that results from single sources of polyphenols, such as tea, coffee or wine, should be considered with caution, given the high level of correlations observed in intakes of these polyphenols. Approaching these correlations through the identification of profiles of intake should allow for complementary information as to the association between polyphenols and mortality and health events, taking into account interactions and confounding with sociodemographic factors (sex in particular).
The most consumed polyphenols identified in our sample were in accordance with previous results from the same population (13) . In the EPIC study, which used also in part data from the Phenol-Explorer Database, intakes of catechins in the French sample were somewhat lower, but comparable with intakes in our study (11) . Comparison with other populations is, however, difficult, due to the heterogeneity in the food composition data used (for example, in the Finnish cohort, only aglycone compounds were considered in the analyses) (12) .
Strengths of our study include the use of very detailed and validated dietary information, from repeated 24 h dietary records. Seasonality in intakes was taken into account, through the balanced number of dietary records available for each subject. Moreover, we used comprehensive data from the Phenol-Explorer Database, which builds on current scientific literature to expand data on individual polyphenol content of foods (2) .
The present study is subject to limitations. Dietary assessment is based on self-reported data, and therefore subject to some subjectivity on portion size estimation or subject to desirability or memory bias. Polyphenol content of foods can vary considerably depending on various factors such as the degree of ripeness of fruits and vegetables, the degree of sunlight exposure, storage conditions or culinary methods which could not be assessed in our sample (7) . Moreover, the Phenol-Explorer Database is based on scientific publications on the content in polyphenols of foods in the English language, which could lead to a selection bias of data excluding some from European countries. Finally, our sample population was selected from a study including middle-aged participants that started in 1994-1996. Profiles of polyphenol intakes could have changed in the general population since then. Moreover, the analyses were conducted on a sample of the total population in the cohort, thus limiting the representativeness of our analyses. Repeated analyses on a different sample population could allow us to confirm our results.

Conclusion
Patterns of polyphenol intakes were identified from a large sample from the general population. Future studies should investigate the association between such patterns and subsequent health outcomes, in order to identify meaningful combinations of polyphenols within the diet.

Supplementary material
The supplementary material for this article can be found at http://dx.doi.org/10.1017/jns.2016. 16