Assessing food and nutrition literacy in children and adolescents: a systematic review of existing tools

Objective: Food literacy (FL) and nutrition literacy (NL) are concepts that can help individuals to navigate the current food environment. Building these skills and knowledge at a young age is important for skill retention, confidence in food practices and supporting lifelong healthy eating habits. The objectives of this systematic review were to: (i) identify existing tools that measure FL and NL among children and/or adolescents and (ii) describe the psychometric properties. Design: A 4-phase protocol was used to systematically retrieve articles. The search was performed in May 2021. Study characteristics and psychometric properties were extracted, and a narrative synthesis was used to summarise findings. Risk of bias was assessed using the COSMIN checklist. Setting: Six databases were searched to identify current tools. Participants: Children (2–12 years) and adolescents (13–18 years) participated in this study. Results: Twelve tools were identified. Three tools measured FL, 1 tool measured NL, 4 tools measured both FL and NL, and 4 tools measured subareas of NL—more specifically, critical NL, food label and menu board literacy. Most tools were self-reported, developed based on a theoretical framework and assessed some components of validity and/or reliability for a specific age and ethnic group. The majority of tools targeted older children and adolescents (9–18 years of age), and one tool targeted preschoolers (3–6 years of age). Conclusions: Most widely used definitions of FL and NL do not acknowledge life-stage specific criterion. Continued efforts are needed to develop a comprehensive definition and framework of FL and NL appropriate for children, which will help inform future assessment tools.

determine intake' (6) . In this sense, being more food and nutrition literate provides one with the necessary aptitude and abilities to help navigate our current food environment. Indeed, FL and NL have been identified as key components in the promotion and maintenance of healthy dietary practices (7)(8)(9) . Both FL and NL are included in this review to provide a comprehensive assessment of existing tools (8) .
Supporting children in developing FL and NL, starting as early as in preschool years, can shape their food choicessince building these aptitudes and capabilities at a young age may be important for skill retention (10) , confidence in food practices (11) and supporting healthy eating habits later in life (12,13) . A 10-year longitudinal study observed that those with higher cooking skills in emerging adulthood prepared meals with vegetables more frequently and consumed fast food less often as adults (12) . Research has also shown that, among a sample University students, the strongest predictor of food skills was meal preparation as a teenager (14) . While these studies underscore the importance of acquiring FL and NL early in life, there is currently no universally accepted tool to measure either FL or NL among children and adolescents.
NL and FL have been studied more extensively in the adult population, and a systematic review by Yuen, Thompson & Gardiner (15) identified 13 tools targeting adults. In addition, a scoping review by Amouzandeh and colleagues (16) summarised the psychometric properties of 12 instruments that were used to assess FL among adults. While these reviews identified existing measures of FL and NL in adults, to our knowledge, no systematic review has been conducted to identify tools for children and adolescents. To ensure they are appropriate for use, NL and FL assessment tools must undergo validity testing, that is, to assess the extent to which the measure accurately reflects the intended variable, in addition to reliability testing, which refers to the consistency of a measure. Identifying valid and reliable tools that assess FL and NL in childhood can help facilitate research that examines the long-term impact of building these skills, abilities and aptitudes early in life.
The objectives of this systematic review were to: (i) identify existing tools that measure FL and NL among children and/or adolescents and (ii) describe the validity and reliability of these tools. Findings from this review will help to identify tools for use in FL and NL research with children as well as identify potential gaps in existing FL and NL measurement.

Methods
This systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (17) . This systemic review was registered with PROSPERO (CRD42021241819).

Search strategy
A systematic search of the literature was completed by 2 review authors (N.C. and M.P.). The search was last performed in May 2021. The following 6 databases were used: PubMed, Ovid MEDLINE, ScienceDirect, Web of Science, CINAHL plus and PsycInfo. Key words were identified through the expert guidance of a research librarian in addition to being based on previous literature reviews (15,16,18) . The search terms included: preschool* OR child* OR adolescen* OR teen* AND 'food literacy' OR 'food skills' OR 'nutrition literacy'. The complete search strategy for each database is provided on the Online Supplementary Table S1. Forward citation searching was also used on the articles included in the final review.

Eligibility criteria
Articles were included if they were primary research, described the development of a tool explicitly assessing FL or NL and assessed some form of validity (including content validity, face validity or structural/construct validity) or reliability (including internal consistency or test-retest reliability) testing. Only tools targeting children (2-12 years old) and/or adolescents (13-18 years old) were included. In the case that the ages extended beyond these ranges, if the mean age of the sample was ≥2 or ≤18 years, articles were included. Publications were excluded if an English language version was not available and if they were grey literature (e.g. theses, reports).

Study selection
A 4-phase screening process was applied to identify articles to be included in the review. After completing the search, articles were exported and saved into Mendeley (Version 1.19.4). Duplicates were removed. The titles and abstracts of the remaining articles were then screened by 2 reviewers (N.C. and M.P.), independently. After excluding articles based on the eligibility criteria in review of the titles and abstracts, the remaining studies were then further examined by a review of the full text. Any differences across reviewers (N.C. and M.P.) throughout the screening and selection process were discussed and resolved.

Data extraction and analysis
From each article that was included in the review, the following information was extracted into a table: (1) key characteristics of the identified tools (author, year of publication, country of origin, tool name, purpose, conceptual framework, number of items and constructs assessed, scoring details, method of development, method of administration, sample characteristics); and (2) psychometric properties (content validity, face validity, structural/construct validity, reliability). Missing or unclear information was also identified and reported. One review author (N.C.) extracted the data and another author (M.P.) reviewed the data. A narrative synthesis of the included studies was conducted to summarise and analyse findings.

Risk of bias assessment
Risk of bias was assessed using the COSMIN risk of bias checklist (19) . The assessment was completed independently by 2 reviewers (N.C. and M.P). Answers were compared and discussed to reach consensus. The COSMIN checklist consists of ten areas of consideration: outcome measure development, content validity, structural/construct validity, internal validity, cross-cultural validity, reliability, measurement error, criterion validity, hypothesis testing for construct validity and responsiveness (19) . Each area includes different items that are assessed using the following scoring: 'Very Good', 'Adequate', 'Doubtful', 'Inadequate' or 'Not Applicable' (19) . The score 'Not Reported' was indicated when the details could not be found in the article. For each area of consideration, the lowest score across the area items was used to evaluate the overall quality of the measurement property (19) . Because the COSMIN checklist was developed for evaluating tools designed to measure symptom or functional health status, the detail and rigour described for the outcome measure development and content validity areas of the checklist exceeded what is typically presented for assessment of health behaviours, such as FL and NL. Thus, these areas of the checklist were not assessed, and were instead narratively summarized.

Results
After the 4-phase protocol, a total of 12 tools were included in this systematic review (Fig. 1). In the initial search, 1203 articles were identified, of which 747 remained once duplicates were removed. Upon screening the titles and abstracts, 716 were excluded and the remaining 31 were assessed for eligibility by reviewing the full text. After the full-text review, 19 articles were excluded. The reasons for exclusion included: the article was outside the scope of the review (n 445), not a primary research article (n 128), did not include children or adolescents as the study population (n 102), did not provide information on the development or validation of the FL or NL measure (n 45), or not in English (n 15). One article was retrieved in the forward citation step (n 1). Some studies appeared to meet the inclusion criteria, but were excluded, such as the study by Gartaula et al. (20) . While FL was evaluated among students, their definition only considered knowledge relating to cultivation practices and the processing of finger millet (20) . In addition, multiple publications (21)(22)(23)(24) have been released using the Preschool-FLAT and the FNLIT; where, only the validation studies (25,26) were included in this review.

Summary of existing tools
The 12 tools that assessed FL and NL among children and adolescents are summarized in Table 1. Three of the 12 tools measured FL (26)(27)(28) , 4 tools measured both FL and NL (25,(29)(30)(31) , and the other 5 tools measured NL or subareas of NL (32)(33)(34)(35) . More specifically, 2 of the NL tools assessed critical NL (34,35) , 1 of the tools measured menu board literacy (33) and the other tool assessed food label literacy (36) . Critical NL focuses specifically on critically appraising and understanding nutrition information (34) , while menu board literacy targets understanding of menu board nutrition information (33) and food label literacy measures one's ability to interpret the information on food labels (36) . The studies describing the 12 tools were published between 2012 and 2021 and were developed in 7 geographical regions: 2 from Norway (34,35) , 3 from the USA (28,33) , 1 from Denmark (27) , 1 from Thailand (32) , 3 from Iran (25,30,31) , 1 from Italy (26) and 1 from China (29) . The purpose of most of the studies was to develop and test the validity and/or reliability of a tool that assesses either FL or NL. However, 2 of the studies (27,35) tested the validity and reliability of a previously developed tool, while one study (30) modified and updated a previously existing tool (25) . The tools were generally   tailored to a specific population. For instance, the TFLAC (28) was developed for low-to-middle income, ethnically and racially diverse school-aged children living in the USA (grades 4-5). One tool was intended for preschoolers (3-6 years of age) (26) , 5 tools were designed for school-aged children (7-12 years of age) (25,28,30,33,36) , and 5 tools were aimed at adolescents (12-18 years of age) (27,31,32,34,35) . One tool was developed for school-aged children and adolescents (7-17 years of age) but was only validated among adolescents between ages 13-15 years (29) . While 3 of the studies did not specify a conceptual framework or model on which the measure was based (28,33,36) , 7 of the tools assessing NL (25,(29)(30)(31)(32)34,35) were based on Nutbeam's health literacy model (37) and 2 were based on the models of FL (26,27) by Benn (38) and Vidgen and Gallegos (6) , respectively. Among the 5 tools that focused on NL, the number of constructs assessed in the tools ranged from 2 to 5 and the number of items ranged from 5 to 61. The Thai-NLAT (32) was the most comprehensive in assessing NL with 5 constructs: macronutrientsmicronutrients and health, nutrition and energy balance, decision-making on nutrition information, food processing and food safety. The number of constructs assessed in the FL tools and the 4 FL and NL tools ranged from 5 to 6 and included from 20 to 60 items. Most of the tools included similar constructs including nutrition-related knowledge as well as functional skills, like cooking skills. An exception was the Preschool-FLAT (26) , which assessed concepts that are not covered in other measures, such as assessing the relationship of body weight and food and children's ability to assemble traditional foods to create a meal.
Four tools (25,29,31,32) did not report the method of administration, 5 tools (26,28,30,33,36) used a paper and pen format, while 3 tools (27,34,35) were administered electronically. All tools were self-reported by the students except for the Preschool-FLAT (26) which was interview led by the child's educator.
Scoring differed across tools including a combination of either Likert-scaled or true or false, and correct or incorrect questions. Five out of 11 tools incorporated task-based items, e.g. creating a healthy meal using images (26,27,29,33,36) . Some of the tools did not indicate how the score was interpreted (25,31,33,35) but for the most part, higher scores were indicative of higher FL or NL.
Ratings for the risk of bias assessment are presented in Supplemental Table S2. Most of the included studies showed a 'Very Good' methodological quality for the psychometric properties which were tested, including all 12 studies sufficiently evaluating internal consistency and 8 studies assessing structural/construct validity (25)(26)(27)(29)(30)(31)34,35) . Moreover, 7 studies (25,27,28,30,31,33,36) tested reliability; however, 3 articles (27,28,33) had a narrow time window between test and retest (e.g. 1 week), where a 'Doubtful' score was subsequently given. Furthermore, Preschool-FLAT (26) received an 'Inadequate' score for responsiveness, since there was insufficient information about the intervention. No studies reported cross-cultural validity, measurement error or criterion validity. Overall, all 12 studies generally showed low risk of bias.

Discussion
This systematic review provides a comprehensive overview of published tools designed to measure FL and NL among children and adolescents. Twelve tools assessed some components of validity and/or reliability for specific target populations. The majority of tools targeted older children and adolescents (9-18 years of age), and one tool targeted preschoolers (3-6 years of age). FL and NL have Not reported Structural/Construct validity assessed using exploratory factor analysis to explore whether the statements in the questionnaire reflected the conceptual framework, in addition to confirmatory factor analysis The Kaiser-Meyer-Olkin test showed sampling adequacy (KMO = 0·738), and Bartlett's test confirmed that factor analysis was appropriate (P < 0·001) All 50 items: Cronbach α = 0·70 Internal consistency across 5 subscales ranged from α = 0·15-0·45 Not reported

FL Tool Stjernqvist (2021)
Content validity assessed and validated by a group of experts based on Benn's (2014) model of FL (38) Face validity assessed using 2 focus group interviews with 12 schoolchildren Structural/Construct validity assessed using confirmatory factory analysis in reducing items and comparing models Convergent validity assessed using a health literacy instrument for schoolaged children (2016) (40) ; significant positive association for the total FL scale (β = 9·82, P < 0·001) and its 5 competencies Convergent validity assessed using food intake as an outcome with a food frequency index (2014); significant association for the total FL scale (β = 2·32, P < 0·001) and its 5 competencies All 37 items: Cronbach α = 0·85 Internal consistency across 5 subscales ranged from α = 0·50-0·73 All 60 items: Cronbach α = 0·84 Internal consistency across all subscales ranged from α = 0·71-0·82, except for the subscales 'critical analysis of information' (α = 0·64) and 'food label reading skills' α = 0·56) Test re-test all 60 items: ICC = 0·93 Test re-test of 6 subscales: ICC = 0·59-0·90  (9) concepts for NL in addition to assessing itemcontent validity Face validity assessed using cognitive interviews among 10 Thai adolescents Structural/Construct validity assessed using Known Group Technique; the healthy group (based on energy distribution and sugar intake) had nutrition literacy scores significantly higher than the unhealthy group Convergent validity assessed using the coefficient correlation analysis between the THAI-NLAT and the Thai Healthy Eating Index; significant positive relation between energy distribution and THAI-NLAT score of energy balance (r = 0·131, P = 0·021) All 61 items: KR-20 = 0·83 Internal consistency assessed through item-subscale correlation, and ranged from r = 0·627-0·781(P < 0·01)  primarily been studied among adults (15,16) and research examining the use and applicability of these concepts among children and adolescents is still emerging. To our knowledge, this review is the first to systematically assess what FL and NL tools have been developed among children and adolescents. All the tools underwent some testing of validity. Five tools were validated for most types of validity, including content, face and construct (25,27,(30)(31)(32) , showing a rigorous process in designing and testing these tools. Most of the tools demonstrated acceptable to good internal consistency (44) . However, 6 tools (25,(27)(28)(29)31,32) had subscales with either poor or questionable internal consistency scores. This suggests that further adaptions may be needed to improve consistency among these subscales. Six out of 12 tools assessed test-rest reliability (25,27,28,30,31,36) and were able to show that these tools could sufficiently replicate a similar result over time. Only 3 tools assessed whether they could detect change after an intervention or a nutrition education program (26,33,36) and 2 of these tools focused specifically on sub-areas of NL. The Thai-NLAT (32) , the FNLAT (31) , the FNLIT (25) and its adaptions (30) have been shown to have the strongest psychometric properties. However, important factors need to be considered when using these tools, such as the context and culture within which these tools were developed and tested.
FL and NL are highly contextual (3) , influenced by the geography and its food system as well as the cultural and social context. The current tools have been developed for specific populations from countries around the world. For instance, the TFLAC (28) has one item that focuses on food groups specific to their national food guidelines, whereas the FNLIT (25) includes foods that are contextually and culturally appropriate such as Tafi, which is a type of chocolate. As suggested by Beaton et al. (45) , a cross-cultural adaptation and validation of these existing tools is needed to ensure tools are appropriate for specific populations who might differ in language, cultural background as well as public health messaging.
In addition, FL and NL need to be conceptualised in an age-dependent manner, where younger children are not expected to develop the same level of complex skills as older teens or adults given their stages of cognitive development and reduced level of independence to make nutrition decisions. As mentioned, most of the existing tools targeted older children (9-18 years old), while only one tool targeted the preschoolers (3-6 years old). Tools developed for younger children such as the Preschool-FLAT (26) are centered around general concepts like recognizing foods and building balanced plates. Using task-based approaches, specific questions that address these general concepts include: 'Paint the foods suitable for lunch/dinner', and 'Compose a typical Sicilian meal by choosing foods among those shown'. On the other hand, tools developed for older teenagers such as the FNLAT (31) focus on more complex nutrition knowledge and food choices that are more appropriate for this age group. Examples of questions include: 'Which of the following foods are good sources of Calcium?', and 'If I go to grocery stores independently, I can easily ask the seller for the information I need.' Given the small number of current tools available and the increase in FL and NL programs focusing on children and adolescents in recent years (46) , research should focus on developing and validating tools assessing FL and NL among children and/or adolescents, especially in younger children.
While FL and NL are related, there are important distinctions between their respective definitions. NL focuses largely on the knowledge and aptitude to obtain, interpret and use nutrition information whereas, FL is more comprehensive and encompasses a collection of inter-related knowledge, skills and behaviours, and extends to other determinants that may influence food decisions, like social and cultural factors (4) . This is reflected in the type of items and constructs included in the FL and NL tools. For example, 4 of the 12 tools included in this review measured both FL and NL (25,(29)(30)(31) , their constructs assessed 2 universal domains, including both knowledge and skills. In comparison, the tools that only measured NL (32)(33)(34)(35)(36) , focused on cognitive competenciesfor instance, Thai-NLAT (32) included knowledge-based questions evaluating one's understanding of nutrition information, food safety and energy balance.
Most definitions of FL and NL do not acknowledge lifestage specific criterion. The widely used definition of FL, conceptualised by Vidgen and Gallegos (6) , consists of 4 domains, some of which might not be applicable for children and adolescents. For instance, the plan and manage domain assesses one's ability to make feasible food decisions in balancing personal needs, like nutrition, taste and hunger, with available resources (6) . This domain may not be appropriate for young children who have minimal independence over their food decisions. The Preschool-FLAT was informed by FL definition described by Vidgen and Gallegos (6) and the authors stated they modified this definition to incorporate knowledge and skills suitable for preschoolers but provided limited detail on how those concepts were derived. While the emerging framework developed by Slater et al. (47) identified FL dimensions suitable for youth, none of the identified tools used this youthfocused framework. Continued efforts are needed to develop a comprehensive definition and framework of FL and NL appropriate for children. These efforts will help facilitate and inform developmentally appropriate assessment tools in the future.
An important limitation in the current tools assessing FL and NL among adults (15,16) is the lack of objective, i.e. taskbased, assessment of FL or NL. Five of the tools included in this current review used task-based assessment approach. The Preschool-FLAT (26) incorporated visuals to assess certain FL constructsfor instance, one item asks the respondents to select which picture of food is the healthier option or which portion size is either small, medium or big. Similarly, the FLLANK assesses one's ability to choose healthier food options given information from a nutrition facts label or an ingredients list (36) . The respondent is given 2 choices and is asked to determine which option is more nutritious. Incorporating task-based items can help reduce reporting and social desirability bias (48) and, thus, future FL and NL measures for children should include both subjective, i.e. self or proxy reports and objective, i.e. task-based activities.
Findings from this review should be interpreted with some limitations. First, only articles that explicitly developed FL or NL assessment tools were included. It is possible that relevant articles may have been missed if they did not directly mention assessing literacy. However, our search strategy did include the term 'food skills' to help further yield an extensive pool of studies. Due to the variability and specificity across target audiences and the various frameworks used to inform their development, comparability across tools was limited. Despite these limitations, this review provides a current snapshot of existing tools assessing FL and NL among children and adolescents.

Conclusion
Our systematic review identified 12 tools that measure FL or NL among children and adolescents in which 4 of these tools assess specific subareas of NL. The majority of tools targeted older children and adolescents (9-18 years of age) and only one tool targeted preschoolers (3-6 years of age). Most of the tools assessed validity or reliability within a specific target population. In addition, most FL and NL frameworks have been contextualised among adults. Considering cognitive development as well as constraints on children's food decisions due to their limited independence, a comprehensive and developmentally appropriate definition and framework of FL and NL is needed for children. These efforts will help inform FL and NL assessment tools in the future (9,(39)(40)(41) .