Hostname: page-component-77c78cf97d-9lb97 Total loading time: 0 Render date: 2026-05-04T20:23:36.948Z Has data issue: false hasContentIssue false

Developing and validating machine learning algorithms to predict various indices of diet quality among a socio-economically disadvantaged group

Published online by Cambridge University Press:  04 March 2026

Mélina Côté
Affiliation:
Centre Nutrition, Santé et Société (NUTRISS), Institut sur la nutrition et les aliments fonctionnels (INAF), Université Laval, Québec, QC G1V 0A6, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec, QC G1V 0A6, Canada
Marianne Rochette
Affiliation:
Centre Nutrition, Santé et Société (NUTRISS), Institut sur la nutrition et les aliments fonctionnels (INAF), Université Laval, Québec, QC G1V 0A6, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec, QC G1V 0A6, Canada
Catherine Laramée
Affiliation:
Centre Nutrition, Santé et Société (NUTRISS), Institut sur la nutrition et les aliments fonctionnels (INAF), Université Laval, Québec, QC G1V 0A6, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec, QC G1V 0A6, Canada
Annie Lapointe
Affiliation:
Centre Nutrition, Santé et Société (NUTRISS), Institut sur la nutrition et les aliments fonctionnels (INAF), Université Laval, Québec, QC G1V 0A6, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec, QC G1V 0A6, Canada
Sharon I. Kirkpatrick
Affiliation:
School of Public Health Sciences, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Simone Lemieux
Affiliation:
Centre Nutrition, Santé et Société (NUTRISS), Institut sur la nutrition et les aliments fonctionnels (INAF), Université Laval, Québec, QC G1V 0A6, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec, QC G1V 0A6, Canada
Sophie Desroches
Affiliation:
Centre Nutrition, Santé et Société (NUTRISS), Institut sur la nutrition et les aliments fonctionnels (INAF), Université Laval, Québec, QC G1V 0A6, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec, QC G1V 0A6, Canada
Ariane Bélanger-Gravel
Affiliation:
Centre Nutrition, Santé et Société (NUTRISS), Institut sur la nutrition et les aliments fonctionnels (INAF), Université Laval, Québec, QC G1V 0A6, Canada Département d’information et de communication, Faculté des lettres et des sciences humaines, Université Laval, Québec, QC G1V 0A6, Canada
Benoît Lamarche*
Affiliation:
Centre Nutrition, Santé et Société (NUTRISS), Institut sur la nutrition et les aliments fonctionnels (INAF), Université Laval, Québec, QC G1V 0A6, Canada École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec, QC G1V 0A6, Canada
*
Corresponding author: Benoît Lamarche; Email: benoit.lamarche@fsaa.ulaval.ca
Rights & Permissions [Opens in a new window]

Abstract

Public health research faces challenges in recruiting socio-economically disadvantaged groups. This study evaluated whether machine learning (ML) algorithms developed using data from a general population could predict indices of diet quality among a socio-economically disadvantaged group. Data from 5367 adults (77·5 % females) in the NutriQuébec project and on 122 variables potentially associated with dietary intakes were used. Dietary intakes were measured using a web-based 24-h recall. Participants were categorised by fifths of a deprivation score based on income, education and material and social deprivation. Participants in the first four fifths formed the general NutriQuébec sample (n 4180) and those above the fifth quintile formed the high deprivation sample (n 1187). Three indices of diet quality defined as ‘high’ or ‘low’ were used: vegetable and fruit consumption (VFC, ≥ 5·0 reference amounts (RA)/d), ‘other foods’ consumption, meaning, foods not recommended in Canada’s Food Guide 2019 (OFC, > 5·0 RA/d) and overall diet quality measured using the Healthy Eating Food Index-2019 (HEFI-2019, > 48·9 points). The algorithms developed and tested in the general NutriQuébec sample predicted high VFC, OFC and HEFI-2019 with accuracies of 0·60 (95 % CI 0·58, 0·62), 0·58 (95 % CI 0·56, 0·60) and 0·61 (95 % CI 0·59, 0·63), respectively. In the high deprivation sample, the algorithms predicted the diet quality indices with comparable accuracies (VFC, 0·69, 95 % CI 0·67, 0·71; OFC, 0·56, 95 % CI 0·54, 0·58; HEFI-2019, 0·66, 95 % CI 0·65, 0·67). ML algorithms trained to predict three diet quality indices in the general NutriQuébec sample were applicable to a high deprivation group.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of The Nutrition Society
Figure 0

Figure 1. Study flow chart and schematic representation of the data modelling steps. In step 1, the random forest (RF) algorithms are trained to predict the three diet quality indices using data from 75 % of participants with low deprivation scores in the general NutriQuébec sample. In step 2, the performance of the RF algorithms is evaluated using data from the remaining 25 % of participants with low deprivation scores in the general NutriQuébec sample. In step 3, the RF algorithms are validated in the sample of participants with high deprivation scores (validation set).

Figure 1

Table 1. Sociodemographic characteristics of the participants in the overall NutriQuébec sample, the general NutriQuébec sample (train-test sets) and the high deprivation sample (validation set)

Figure 2

Figure 2. Mean (95 % CI) points of the total deprivation score (/20) and subscores per fifth of the total deprivation score (see methods for details).

Figure 3

Table 2. Prediction performance of ML algorithms to predict high vegetable and fruit consumption, ‘other foods’ consumption and overall diet quality in the test and validation sets

Figure 4

Figure 3. Ten most discriminant predictor variables among all the variables retained by the random forest (RF) algorithms to predict the three diet quality indices and their corresponding relative contribution to the model. Relative contribution to the model corresponds to the relative contribution of each predictor to reducing model impurity across 100 bootstrap RF models. Higher percentages indicate greater relative reduction in model impurity and thus had greater influence on the model predictions. All variables contributed independently in each algorithm. Each of the variables retained in the RF algorithm was generated by different questions in the different questionnaires. HEFI-2019, Healthy Eating Food Index-2019; RA, reference amounts, see methods for details.

Supplementary material: File

Côté et al. supplementary material 1

Côté et al. supplementary material
Download Côté et al. supplementary material 1(File)
File 1.2 MB
Supplementary material: File

Côté et al. supplementary material 2

Côté et al. supplementary material
Download Côté et al. supplementary material 2(File)
File 172.9 KB
Supplementary material: File

Côté et al. supplementary material 3

Côté et al. supplementary material
Download Côté et al. supplementary material 3(File)
File 24.8 KB
Supplementary material: File

Côté et al. supplementary material 4

Côté et al. supplementary material
Download Côté et al. supplementary material 4(File)
File 30.6 KB
Supplementary material: File

Côté et al. supplementary material 5

Côté et al. supplementary material
Download Côté et al. supplementary material 5(File)
File 24.4 KB