Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-10T09:53:44.449Z Has data issue: false hasContentIssue false

Automatic food detection in egocentric images using artificial intelligence technology

Published online by Cambridge University Press:  26 March 2018

Wenyan Jia*
Affiliation:
Department of Neurosurgery, University of Pittsburgh, 3520 Forbes Avenue, Suite 202, Pittsburgh, PA 15213, USA
Yuecheng Li
Affiliation:
Department of Neurosurgery, University of Pittsburgh, 3520 Forbes Avenue, Suite 202, Pittsburgh, PA 15213, USA
Ruowei Qu
Affiliation:
Department of Neurosurgery, University of Pittsburgh, 3520 Forbes Avenue, Suite 202, Pittsburgh, PA 15213, USA Department of Biomedical Engineering, Hebei University of Technology, Tianjin, People’s Republic of China
Thomas Baranowski
Affiliation:
Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
Lora E Burke
Affiliation:
School of Nursing, University of Pittsburgh, Pittsburgh, PA, USA
Hong Zhang
Affiliation:
Image Processing Center, Beihang University, Beijing, People’s Republic of China
Yicheng Bai
Affiliation:
Department of Neurosurgery, University of Pittsburgh, 3520 Forbes Avenue, Suite 202, Pittsburgh, PA 15213, USA
Juliet M Mancino
Affiliation:
School of Nursing, University of Pittsburgh, Pittsburgh, PA, USA
Guizhi Xu
Affiliation:
Department of Biomedical Engineering, Hebei University of Technology, Tianjin, People’s Republic of China
Zhi-Hong Mao
Affiliation:
Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA
Mingui Sun
Affiliation:
Department of Neurosurgery, University of Pittsburgh, 3520 Forbes Avenue, Suite 202, Pittsburgh, PA 15213, USA Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
*
*Corresponding author: Email jiawenyan@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Objective

To develop an artificial intelligence (AI)-based algorithm which can automatically detect food items from images acquired by an egocentric wearable camera for dietary assessment.

Design

To study human diet and lifestyle, large sets of egocentric images were acquired using a wearable device, called eButton, from free-living individuals. Three thousand nine hundred images containing real-world activities, which formed eButton data set 1, were manually selected from thirty subjects. eButton data set 2 contained 29 515 images acquired from a research participant in a week-long unrestricted recording. They included both food- and non-food-related real-life activities, such as dining at both home and restaurants, cooking, shopping, gardening, housekeeping chores, taking classes, gym exercise, etc. All images in these data sets were classified as food/non-food images based on their tags generated by a convolutional neural network.

Results

A cross data-set test was conducted on eButton data set 1. The overall accuracy of food detection was 91·5 and 86·4 %, respectively, when one-half of data set 1 was used for training and the other half for testing. For eButton data set 2, 74·0 % sensitivity and 87·0 % specificity were obtained if both ‘food’ and ‘drink’ were considered as food images. Alternatively, if only ‘food’ items were considered, the sensitivity and specificity reached 85·0 and 85·8 %, respectively.

Conclusions

The AI technology can automatically detect foods from low-quality, wearable camera-acquired real-world egocentric images with reasonable accuracy, reducing both the burden of data processing and privacy concerns.

Information

Type
Research paper
Copyright
© The Authors 2018 
Figure 0

Fig. 1 (colour online) Examples of the tags generated by Clarifai for eButton-acquired images. Some descriptions are correct, although not all of them. Red tags appear to be questionable ones

Figure 1

Fig. 2 (colour online) Word clouds of a set of tag histograms from fifty food images (a) and fifty non-food images (b)

Figure 2

Table 1 Definitions of true positive (TP), true negative (TN), false positive (FP) and false negative (FN)

Figure 3

Fig. 3 (colour online) Typical images in the Food-5K data set. Left four images are labelled as food images; the right four are non-food images

Figure 4

Fig. 4 (colour online) Misclassified images in the Food-5K data set. Left four were misclassified as food images; the right nine were misclassified as non-food images

Figure 5

Table 2 Classification results on the Food-5K data set

Figure 6

Fig. 5 (colour online) Examples in eButton data set 1. Images in the top four rows are labelled as food images, and those in the bottom four rows are non-food images. Compared with the images in Fig. 3, it can be seen that the egocentric images were more difficult to classify

Figure 7

Table 3 Categories and number of images in the eButton food/non-food data set

Figure 8

Fig. 6 (colour online) The effect of threshold k (, k=1; , k=2; , k=3; , k=4) on the sensitivity, specificity and precision in the cross data-set evaluation of the eButton data set 1: (a) case 1 (training: session 2, testing: session 1); (b) case 2 (training: session 1, testing: session 2)

Figure 9

Fig. 7 (colour online) Tag-based classification results in the cross data-set evaluation of the eButton data set 1 and misclassified examples (top images were misclassified as non-food images and bottom images were misclassified as food images): (a) case 1 (training: session 2, testing: session 1); (b) case 2 (training: session 1, testing: session 2)

Figure 10

Table 4 Accuracy of food image detection in different categories of the eButton food/non-food data set

Figure 11

Table 5 Durations of recording and numbers of images in the seven-day study

Figure 12

Fig. 8 (colour online) (a) Burden index, (b) sensitivity and (c) specificity in the eButton one-week data set with changing k (, day 1; , day 2; , day 3; , day 4; , day 5; , day 6; , day 7)

Figure 13

Fig. 9 (colour online) Several images in the eButton one-week data set annotated as ‘drink’ because a cup can be seen on the table or in hand

Figure 14

Fig. 10 (colour online) Examples of blurred images (top row) and images with a small portion of food (bottom row)

Figure 15

Fig. 11 (a) (colour online) (a) Results of food detection for eButton one-week data set. The red bars represent true ‘food’ images, the green bars represent true ‘drink’ images and grey bars represent the detected food images. (b) A zoomed-in picture of the last meal in each day

Figure 16

Fig. 12 (colour online) Classification results in the eButton one-week data set when only ‘food’ images were considered as food images (, day 1; , day 2; , day 3; , day 4; , day 5; , day 6; , day 7)

Supplementary material: PDF

Jia et al. supplementary material 1

Jia et al. supplementary material

Download Jia et al. supplementary material 1(PDF)
PDF 654.9 KB