Health behaviours as predictors of the Mediterranean diet adherence: a decision tree approach

Objective: This study aimed to identify health behaviours that determine adolescent’s adherence to the Mediterranean diet (MD) through a decision tree statistical approach. Design: Cross-sectional study, with data collected through a self-fulfilment questionnaire with five sections: (1) eating habits; (2) adherence to the MD (KIDMED index); (3) physical activity; (4) health habits and (5) socio-demographic characteristics. Anthropometric and blood pressure data were collected by a trained research team. The Automatic Chi-square Interaction Detection (CHAID) method was used to identify health behaviours that contribute to a better adherence to the MD. Setting: Eight public secondary schools, in Algarve, Portugal. Participants: Adolescents with ages between 15 and 19 years (n 325). Results: According to the KIDMED index, we found a low adherence to MD in 9·0 % of the participants, an intermediate adherence in 45·5 % and a high adherence in 45·5 %. Participants that regularly have breakfast, eat vegetable soup, have a second piece of fruit/d, eat fresh or cooked vegetables 1 or more times a day, eat oleaginous fruits at least 2 to 3 times a week, and practice sports and leisure physical activities outside school show higher adherence to the MD (P < 0·001). Conclusions: The daily intake of two pieces of fruit and vegetables proved to be a determinant health behaviour for high adherence to MD. Strategies to promote the intake of these foods among adolescents must be developed and implemented.

adherence to the MD with a higher degree of protection against all-cause mortality, highlighting its protective role against the development of diseases (17)(18)(19)(20) , adherence to MD has decreased as a result of the incorporation of Western country habits, following globalisation (21) . The decreased adherence to MD observed in Mediterranean populations is also occurring in Portugal (22) .
Among children and adolescents, epidemiological evidence suggests that dietary patterns in Mediterranean countries are changing rapidly, with an increase in the consumption of animal products, saturated fat, highly processed/energy-dense foods, ready-to-eat products and a decline in the intake of plant-based foods (23)(24)(25) . Adolescence is a key period in life, and it involves multiple physiological and psychological changes that affect nutritional needs and habits (26) . Adolescents have increased energy and nutrient needs, including amino acids for growth of striated muscle, as well as Ca and vitamin D to accommodate bone growth (27,28) . Adherence to MD, it is associated with an enhanced nutritional adequacy (11) and is also recognised for preventing overweight, obesity (29) and in reducing waist circumference in adolescents (30,31) In Portugal, the latest data from the National Food and Physical Activity Survey, IAN-AF 2015-2016, reports that 78 % of adolescents (aged between 10 and 17 years) does not meet the WHO recommendation to consume more than 400 g/d of fruit and vegetables (equivalent to five or more servings/d). The daily consumption of more than 50 g of processed meat is observed in 6·3 % of the population (11·6 % in adolescents). Only 36 % of people between 15 and 21 years of age are considered physically active, complying with the current recommendations for the practice of 'health-promoting physical activity' (32) .
Machine learning is a method focused on the development of algorithms that are particularly useful for data mining (33) . Data mining is a term to describe the process of analysing large databases in search of interesting and previously unknown patterns (34) . Decision tree methodology is a commonly used data mining method to establish classification systems based on multiple covariates or for developing prediction algorithms for a target variable. The algorithm is non-parametric and can efficiently deal with large, complex datasets without imposing a complicated parametric structure (35) . Decision tree structure comprises a hierarchically organised set of data groups, called nodes, which are interconnected by tree branches. The beginning of the hierarchical structure is the root of the tree (i.e. root node). It refers to the dependent variable and includes all the observed data that are to be divided into classes during the process of model development (36) . Machine learning uses the combination of artificial intelligence algorithm with statistical analysis for decisionmaking, which can be useful to support dietary behaviours understanding and improvement.
This study aims to identify which health behaviours contribute to a higher adherence to MD and provide a framework for the development of food education interventions aimed at adolescents through decision-making processes using classification trees.

Sampling and participants
This study focused on tenth-grade students, recruited through a randomised, multi-stage, stratified sample of schools, constructed from a sampling frame comprising all secondary schools in the region of the Algarve, Portugal. After following the procedures proposed by Fleiss et al. (37) for calculating minimum sample size, with 90 % power and 0·05 statistical significance, and considering official data provided by the Regional Directorate of Education regarding the number of registered tenth-grade students in the Algarve, we selected, in the first stage, a random sample of eight schools, stratified by their typescience or humanities curriculum schools and technological-professional schools; in the second stage, we randomly selected fourteen classes from science or humanities curriculum and nine classes from technologicalprofessional schools. All students from these classes were invited to be a part of the study and a formal, written consent, was sent to their legal guardians.
The calculations for sample size suggested a minimum of 454 participants to achieve representativity. The classes randomly selected for recruitment had a total of 545 enrolled students. From the original sample of 545 students, 325 (59 6 %) were authorised by their guardian to be a part of the study and agreed to proceed to the data collection phase. Exclusion criteria were (1) pregnancy and (2) physical disability, mental illness, or other condition that affected the ability to fulfil the data collection questionnaire or the validity of anthropometric measurements. Data collection was conducted from April to June 2018. No participants were excluded on account of these criteria.
Data assessment and variables Data were collected through a self-fulfilment questionnaire regarding food, lifestyle and health. This tool has five sections: (1) dietary habits; (2) adherence to the MD (KIDMED index); (3) physical activity; (4) sleep and oral hygiene; and (5) socio-demographic characteristics. Except for section 2, the questions were developed specifically for this study, based on best nutritional epidemiology practices and also on national and international tools, namely the data collection questionnaires and protocols used in the Health Behaviour in School-Aged Children (HBSC) (38) , Childhood Obesity Surveillance Initiative (COSI) (39) , and the Portuguese National Food, Nutrition and Physical Activity Survey (IAN-AF) (40) . These tools were consistently found reliable in diverse settings and different populations.
The data collection tool was pre-tested in a homologous, non-random sample of 22 volunteers that were debriefed after fulfilling the questionnaire. This pre-test resulted in small edits and minor corrections to the questionnaire that were incorporated in the final version used in this study.
Anthropometric and blood pressure data were also collected, using procedures documented in a field manual and after specific training for all members of the team. Data included weight, height, waist and hip circumference, and systolic and diastolic blood pressure.
In all questions regarding dietary habits, physical activity, sleep and oral hygiene, participants were asked to consider the last 12 months as a frame of reference for their answers.

Dietary habits
Dietary habits were assessed through close-ended questions, either multiple choice, 'yes/no' questions, Likert-type sentences for assessing agreement or questions for quantification (e.g. number of daily meals). As the goal for this assessment was to determine specific dietary habits, to be analysed per question and not by the means of a composite scale or index, questions were focused on partaking specific meals, their setting (e.g. 'Do you usually have breakfast?'; 'Where do you usually have breakfast?' and 'Who usually has breakfast with you?') and the usual foods included in each meal, selected from a list and with the ability of participants adding their own foods. The adoption of a specific kind of diet (vegan, meat-free, Halal, etc.) and the amount of water usually ingested during the day were also assessed with multiple choice questions, with the ability of participants adding other answers. The intake of alcoholic beverages was assessed using a standard, semi-quantitative, food frequency scale.

Adherence to the Mediterranean Diet
The adherence to the MD was assessed with the Portuguese version of the KIDMED index proposed by Mateus, MP (41) . The score for this index ranges from 0 to 12 and it is based on 16 'yes/no' questions. Questions denoting a negative connotation with the MD are assigned a score of -1 and those positively associated with the MD are assigned a score of þ1. The sum of the scores is then categorised in three levels: level 1high adherence to MD (≥ 8 points); level 2intermediate adherence to MD (4-7 points) and level 3low adherence to MD (≤ 3 points), as proposed by the original authors of the KIDMED index (42,43) .
Physical activity, sleep and oral hygiene Physical activity was assessed through nineteen multiple choice questions, aimed at gaining information regarding the usual means of transportation to and from school, the distance between home and school, the type of physical activity in school hours and as a form of leisure. We did not intend to quantify physical activity but instead identify participants who engage in any kind of physical activity and identify the usual forms of physical activity.
Participants were also asked to about their usual time of waking up, usual time of going to bed and usual frequency of brushing their teeth.

Socio-demographic variables and anthropometrics
For the socio-demographic characterisation, we collected information regarding date and place of birth, gender, nationality, area of residence, as well as household composition, age, nationality, place of birth, education and profession of parents/guardians. Anthropometric measurements were made using reference methodologies (44) by trained researchers. Weight was assessed to the nearest decigram (0·1 kg) with a Seca 877 ® digital scale. Height was measured to the nearest millimetre (0·1 cm) with a portable Seca 217 ® stadiometer, with the participant's head in the Frankfurt plane. Weight and height were measured twice, with participants in bare feet and light clothes. The waist circumference was measured up to the nearest millimetre (0·1 cm), at the midpoint between the iliac crest and the lower costal margin, using a flexible and non-deformable Teflon tape. BMI was calculated using the formula Weight/Height 2 (kg/m 2 ). The BMI was used to determine the BMI/age percentile, and participants were categorised as underweight (≤5th percentile), normal weight (>5th to ≤85th percentile), overweight (>85th percentile) and obese (≥95th percentile), according to the WHO growth standards for children and adolescents (45) .
Statistical analysis IBM-SPSS ® software version 25.0 was used for statistical analysis. Data were described by absolute and relative frequencies, and mean, median (MD), standard deviation and interquartile range were computed whenever appropriate. Data were normalised by logarithmic or arcsine transformation when results were expressed as percentages. To evaluate the impact of MD adherence on each gender, while considering anthropometric data and health behaviours, we performed a one-way ANOVA multiple comparison test (Tukey, P < 0·05). For the variables that did not follow a normal distribution, non-parametric tests were computed, such as Kruskal-Wallis, Chi-square, Mann-Whitney and Spearman's correlation coefficient (P < 0·05).
We performed a machine learning procedure as a complement to other statistical analyses in order to further study the relationship between adherence to MD and other variables. Since the variables are potentially correlated with each other, a decision tree was applied through the algorithm CHAID (Automatic Chi-square Interaction Detection) (46) . These tree models classify cases into groups or predict values of a dependent variable (criterion), based on values or categories of the independent variables (predictors). The criterion used was the classification of individuals having low, intermediate or high adherence to MD in relation to the following factors: dietary habits; items of KIDMED index; sleep and oral hygiene; school sports; out of school sports; leisure physical activity; daily hours of sedentary activities; socio-demographic data; and anthropometric and blood pressure measures. A maximum tree depth of three levels was generated by the algorithm with a minimum number of fifty initial (parent) nodes, thirty terminal (child) nodes, and Splitting Nodes criteria of 0·05. The algorithm was able to differentiate the groups with low, intermediate or high adherence to MD obtained by KIDMED index categories (dependent variable in node 0). All other questionnaire variables were considered independent variables. Variables were not present in the decision tree if a regression could not be generated by the algorithm; therefore, no homogenous groups were formed, and the variable was not represented in the decision tree (e.g. sleep duration). Therefore, when the algorithm can generate a new level of ramification from a node, it suggests that the new groups formed have significantly different levels of adherence to MD.

Sample characteristics
The participants in our study (n 325) were between 15 and 19 years old, with a mean age of 16·4 years (SD 0·89). Fifty-three per cent (n 172) reported as female and 47 % (n 153) as male.
Regarding BMI/age percentile categories, 79·4 % (n 258) of the participants had normal weight and 19·7 % (n 64) were overweight. Girls had higher mean values for hip circumference than boys (Table 1). Boys were taller, heavier, with higher waist circumference and systolic blood pressure than girls (Table 1). Table 2 shows data on dietary habits. Boys reported a higher frequency for breakfast and after dinner meals (P < 0·05); girls showed a higher frequency of mid-morning meals, higher consumption of salad, vegetables and fruit at lunch and dinner, and higher median number of intermediate meals (P < 0·05). Bread is the most consumed food item at breakfast, mid-morning and at mid-afternoon meals, in both genders ( Table 2). Table 3 presents data regarding physical activity. On the overall, when not at school, boys spend more time in sports activities than girls.

Mediterranean diet and health behaviours
The distribution of the KIDMED index categories shows a low adherence to MD group in 9·0 % (n 29), intermediate adherence in 45·5 % (n 148) and high adherence in 45·5 % (n 148) of the participants (Table 4). We did not find gender differences in adherence to MD (P = 0·306), but some diet and health behaviours were positively associated with MD. Participants that regularly have breakfast, eat vegetable soup at lunch or dinner, and practice sports and leisure physical activities outside school show a higher median score in the KIDMED index (P < 0·001) ( Table 5).
Further analyses using the KIDMED index score show a positive correlation between this variable and the number of daily meals (r spearman = 0·195, P < 0·001) and a negative correlation with sedentary hours per week (r spearman = -0·175, P = 0·002). No correlations between KIDMED index score and anthropometric characteristics or blood pressure were found (P > 0·05).
No statistically differences between gender and adherence to the MD were found when the information is considered altogether (Table 1). However, due to the thoroughly investigated metabolic differences between genders, data were split, and an ANOVA test was performed in each gender separately by level of MD Adherence (Fig. 1), which clearly reveals significant differences in the effect of MD adherence in the analysed parameters in each gender. Males with low adherence to MD had significantly higher waist and hip circumference ( Fig. 1(c) and (d)). Anthropometric variables and MD were not associated in girls (P > 0·05), but girls with high adherence to the MD showed significantly higher healthy behaviours, such as sports outside school and leisure activities ( Fig. 1(e) and (f)). In both genders, higher adherence to MD (intermediate and high) is associated with higher adherence to sports outside school and leisure activities ( Fig. 1(e) and (f)).
To further analyse the association between adherence to MD and health behaviours, all of the variables related to health behaviours (dietary habits, sleep and oral hygiene, physical activity, socio-demographic data, anthropometric measures and blood pressure) were used in a decision tree analysis through the CHAID method (Fig. 2). This model translated 72 % of correct ratings, which reveals predictive robustness. Considering the categories of the KIDMED index (dependent variable), dietary habits were the only factors that discriminate the categories of adherence to the MD (Fig. 2). The CHAID nodes showed that the daily intake of a second piece of fruit (node 2) is more prevalent in participants with a high adherence to MD (80·5 %, n 99). All terminal nodes (6, 12 and 13) derived from node 2 showed a higher prevalence of individuals with high adherence to the MD. The intake of nuts at least 2 to 3 times/week (node 6) and the intake of fresh or cooked vegetables more than once a day (node 13) showed a prevalence of 94·3 % (n 50) and 88·3 % (n 30), respectively, in participants in high adherence to MD. In the absence of the daily intake of a second piece of fruit,  only participants who simultaneously consume fresh fruit for breakfast and vegetable soup for lunch (64·5 %, n 20) show high adherence to MD (node 9). In the remaining terminal nodes, participants who do not consume a second piece of fruit daily, despite other intakes, show intermediate adherence to the MD (Fig. 2). The terminal nodes obtained by the decision tree analysis through CHAID are summarised in Table 7.

Discussion
Our data show a lower prevalence for overweight (19·7 %) and obesity (2·5 %) than the data reported in the Portuguese national inquiry on food and physical activity, IAN AF 2015-2016, which suggests a national prevalence in adolescents of 23·6 % for overweight and 8·7 % for obesity (32) . On the overall, our sample showed an intermediate and high adherence to the MD, a characteristic that is associated in the literature with better nutritional adequacy (11) , lower rates of overweight, obesity (29) and lower waist circumference in adolescents (30,31) . Our data and the cross-sectional nature of this study do not allow the analysis of cause and effect, but our results add to the body of evidence suggesting a positive statistical association between the MD and health. Some of the consequences of obesity are gendered, with some evidence that the risk of mortality is higher in adult  males who were obese during their adolescence (47) . Male adolescents with low adherence to MD had significantly higher waist and hip circumference (P < 0·001). As waist circumference provides a simple and practical anthropometric measure to assess central adiposity (48) , these data suggest that, in our sample, heavier males have a higher abdominal adiposity and may also suggest that low adherence to a healthy diet can lead to an early onset of overweight and increase the cardiometabolic risk in male adolescents (49) . Voltas et al. (50) have found no relationship between anthropometric data and MD adherence during adolescence and state that this association may be reported later in life due to the low adherence to MD that may lead to negative anthropometric effects in the long term, such as higher BMI (50) . On the other hand, the MediLIFE Index study suggests that a better adherence to a Mediterranean lifestyle was associated with lower likelihood of being overweight, obese or having abdominal obesity, particularly in those with the highest adherence (51) . Physical activity must also be considered in interpreting these data. In our study, participants with higher adherence to MD (intermediate and high) were more active and these  results are in accordance with other studies carried out in southern Italy and Spain, which found that greater adherence to MD was associated with a healthier lifestyle and a higher level of physical activity in adolescents (52,53) . Our results reinforce the role of adequate nutrition and regular physical activity for preventing obesity and shaping the health-related quality of life in adolescents, especially when most recently reviewed studies conducted in southern European countries report that approximately half the children and adolescents show a low adherence to MD (54) .
Our study showed low adherence to MD in 9·0 % of individuals, higher than the value found by Adelantado-Renau et al. in Spanish adolescents (5·2 %) and, lower than the prevalence found in a study conducted with Sevillian adolescents (16·8 %) (55) .
The use of the CHAID method allowed us to find that adherence to MD is more associated with dietary behaviours than all other health behaviours. Our results show that a daily intake of at least two pieces of fruit is the best predictor of 'high adherence' to the MD. When a daily second piece of fruit is not consumed, 'high adherence' is Regarding the consumption of fruit, 31·4 % of our sample do not consume fruit daily. This rate is higher than the one found in the HBSC 2018 study, which reports a prevalence of 11·5 % for rarely or never consuming fruit (56) , but lower than the prevalence of 78 % for low consumption of fruit and vegetables by Portuguese adolescents, reported in the IAN-AF (32) .
The daily consumption of a second piece of fruit, the variable with the highest differentiating power in our CHAID analysis, was reported by 37·8 % of our sample (n 123). This prevalence was lower than the one reported by Rito et al. (47·9 %) for the same variable, in another recent study also developed in Portugal (57) .
Our study shows that 54·3 % (n 153) of the individuals eat fruit at breakfast and 48·9 % (n 159) eat vegetable soup at lunch. Fruit consumption is one of three elements, together with consumption of cereals and dairy products, that constitute a healthy breakfast (58) which is associated with higher adherence to the MD (59) . Vegetables have a strong presence in MD and vegetable soup, a tradition in Portuguese food, has an important contribution to the daily intake of vegetables (60) , allowing to achieve the WHO Fig. 2 Decision tree obtained through CHAID method to predict which health behaviours contribute to better adherence to the MD in secondary school students. Statistical significance is represented in each tree node, when the tree ramification stops no significant differences are observed within the group. Each node is divided into a group with a significantly higher presence of the prementioned characteristic (e.g. consumption of a second piece of fruit every day) referred as 'Yes' or significantly lower presence of individuals with the same characteristic referred as 'No'. When the tree does not grow from a terminal or a characteristic is not mentioned, means that there are no statistical differences among the analysed categories of KIDMED. Node 5 <absent> represent the individuals that eat a second piece of fruit daily but do not take breakfast Aconsumption of the second piece of fruit every day; Bfruit consumption at breakfast; Coleaginous fruits consumption at least 2 to 3 times/week; Dsoup consumption at lunch; Epulses consumption more than once a week; Fconsumption of fresh or cooked vegetables more than once a day.
recommendation for daily consumption of 400 g of fruit and vegetables to help prevent chronic disease and nutritional deficiencies (12) . Our results are in accordance with the ones from other studies, which suggest that the promotion of healthy eating habits among adolescents should be a priority. In a study by Chacón-Cuberos et al. (52) , adolescents who followed healthy eating habits showed academic benefits, such as better organisational habits, critical thinking, effort and study habits. Boing et al. (61) highlight that school environment interventions may be effective for promoting healthy behaviours and reinforce the importance of the school context on adolescent health. School can facilitate the engagement in healthy behaviour by offering and stimulating opportunities to be healthy (e.g. offer healthy options in the cafeteria, etc) and by promoting barriers to unsafe/unhealthy behaviour (e.g. not selling energy-dense foods in the cafeteria) (61,62) . Aarestrup AK et al. (63) found that schools that promote barriers to unsafe/unhealthy behaviours, while having teachers and students who value that policy, show higher rates of programme implementation and acceptance for change. In Portugal, the results of the Eat Mediterranean programme showed a better adherence to the MD after an intervention of educational sessions promoting MD, which proved successful in changing dietary patterns among adolescents. This programme showed an increase in the number of individuals who eat a second piece of fruit every day, as well as an increase in intake of fresh or cooked vegetables and oleaginous fruits (57) .

Strengths and limitations of this study
Our aim was to identify behaviours associated with the MD through a decision tree model, in a representative sample of Algarve adolescents. We consider the strength of this work to be the use of the CHAID method and our results suggest that CHAID can be used in risk analysis and target segmentation for the pre-detection and management of low adherence to MD. According to Song and Lu (35) 'the decision tree method is a powerful statistical tool for classification, prediction, interpretation and data manipulation that has several potential applications in health research'. The use of decision tree models based on machine learning techniques is a powerful and robust statistically tool for big data analysis, and to support decision-making in the health and human nutrition field.
The main limitations of our study are related with the self-report nature of dietary behaviours and with the limited dietary data collected besides the KIDMED index. Our methods for assessing dietary data do not allow for a proper food frequency or quantitative analysis, and we did not collect information regarding other variables widely known to determine food choice in adolescents, such as parental and peer social support, or other home environment variables.
Unfortunately, we also did not achieve sample size representativity. Our calculations for sample size suggested a minimum of 454 participants and 545 students were invited to be a part of the study. We collected written authorisations from legal guardians of the potential participants in a number far exceeding the minimum sample size. Nevertheless, on the day that data collection was scheduled, a significant number of adolescents, despite having written authorisation from their guardian and having previously agreed to be a part of the study, declined the procedure needed to collect anthropometric data. An analysis (not included in this paper) on the socio-demographic characteristics of the students that declined anthropometric assessment, showed that there were no statistically significant differences between these students and those in the final sample. As per our study protocol, students that did not complete the anthropometric assessment were excluded from the final sample, which totalled 325 participants. This can limit the generalisation of our data, but the non-parametric statistical decision tree approach in this paper can safely be used with our final sample size and variables. Furthermore, the similarities in socio-demographic profile between our final sample and both the non-participants and the overall adolescent population in the Algarve (based on census data from the National Statistics Institute, not show, as it is not within the scope of this paper) suggest that, despite the limitation, our data can contribute to an overall assessment of this region's adolescents.

Conclusions
The consumption of nuts and fresh or cooked vegetables was associated with a high adherence to the MD. Through the decision tree methodology, the daily consumption of two portions of fruit and vegetables proved to be the best determinant for high adherence to the MD.
Although adherence to a MD can also be influenced by factors such as parental and peer social support, or by the overall healthfulness of home food environment, schools can provide the framework to a healthy, Mediterranean way of eating, if adequate interventions can be put in place.
The use of the CHAID multivariate tree analysis technique, when accompanied by other statistical analyses, is a promising tool in nutrition-related studies.
The Algarve is a region with traditionally Mediterranean eating habits, and our results allowed us to identify specific habits of MD adherence that should be encouraged in adolescents. Future interventions can be tailored considering the promotion of the daily consumption of two portions of fruit, and vegetables, in school's settings, to promote a higher adherence to MD and contribute to improving the health and quality of life of adolescents.