Hostname: page-component-76fb5796d-2lccl Total loading time: 0 Render date: 2024-04-30T02:18:49.990Z Has data issue: false hasContentIssue false

The use of sentiment and emotion analysis and data science to assess the language of nutrition-, food- and cooking-related content on social media: a systematic scoping review

Published online by Cambridge University Press:  30 March 2023

Annika Molenaar
Affiliation:
Department of Nutrition, Dietetics and Food, Monash University, Level 1, 264 Ferntree Gully Road, Notting Hill, VIC 3168, Australia
Eva L Jenkins
Affiliation:
Department of Nutrition, Dietetics and Food, Monash University, Level 1, 264 Ferntree Gully Road, Notting Hill, VIC 3168, Australia
Linda Brennan
Affiliation:
School of Media and Communication, RMIT University, 124 La Trobe St, Melbourne VIC 3004, Australia
Dickson Lukose
Affiliation:
Monash Data Futures Institute, Monash University, Level 2, 13 Rainforest Walk, Monash University, Clayton VIC 3800, Australia
Tracy A McCaffrey*
Affiliation:
Department of Nutrition, Dietetics and Food, Monash University, Level 1, 264 Ferntree Gully Road, Notting Hill, VIC 3168, Australia
*
*Corresponding author: Tracy A McCaffrey, email: tracy.mccaffrey@monash.edu
Rights & Permissions [Opens in a new window]

Abstract

Social media data are rapidly evolving and accessible, which presents opportunities for research. Data science techniques, such as sentiment or emotion analysis which analyse textual emotion, provide an opportunity to gather insight from social media. This paper describes a systematic scoping review of interdisciplinary evidence to explore how sentiment or emotion analysis methods alongside other data science methods have been used to examine nutrition, food and cooking social media content. A PRISMA search strategy was used to search nine electronic databases in November 2020 and January 2022. Of 7325 studies identified, thirty-six studies were selected from seventeen countries, and content was analysed thematically and summarised in an evidence table. Studies were published between 2014 and 2022 and used data from seven different social media platforms (Twitter, YouTube, Instagram, Reddit, Pinterest, Sina Weibo and mixed platforms). Five themes of research were identified: dietary patterns, cooking and recipes, diet and health, public health and nutrition and food in general. Papers developed a sentiment or emotion analysis tool or used available open-source tools. Accuracy to predict sentiment ranged from 33·33% (open-source engine) to 98·53% (engine developed for the study). The average proportion of sentiment was 38·8% positive, 46·6% neutral and 28·0% negative. Additional data science techniques used included topic modelling and network analysis. Future research requires optimising data extraction processes from social media platforms, the use of interdisciplinary teams to develop suitable and accurate methods for the subject and the use of complementary methods to gather deeper insights into these complex data.

Type
Review Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of The Nutrition Society

Introduction

Poor nutritional status and the associated consequences such as the development of non-communicable diseases (NCDs) contribute to the overall global burden of disease(Reference Vos, Abajobir and Abate1). Beyond the potential physical consequences, consuming a nutritionally poor diet has links to poor mental wellbeing and mental health(Reference Firth, Gangwisch and Borisini2,Reference Owen and Corfe3) . To encourage the uptake of healthy eating behaviours, the environment in which people are influenced, including the physical built environment, social environment and the online environment, needs to make healthy eating the desirable and attainable option(Reference Schwartz, Just and Chriqui4). There has been increasing public dismissal of the credibility of nutrition information from experts(Reference Penders5,Reference Penders, Wolters and Feskens6) . People are alternatively using social media as a source of nutrition and health information(Reference Adamski, Truby and Klassen7,Reference Lynn, Rosati and Leoni Santos8) or motivation(Reference Vaterlaus, Patten and Roche9) and often trust this information more than expert sources(Reference Jenkins, Ilicic and Barklamb10). Social media can be defined as ‘web-based services that allow individuals, communities and organisation to collaborate, connect, interact and build community by enabling them to create, co-create, modify, share and engage with user-generated content that is easily accessible’(Reference Sloan and Quan-Haase11). Commonly, the people and accounts sharing nutrition information (often referred to as social media influencers) promote an idealised lifestyle and unrealistic body types and eating habits(Reference Carrotte, Prichard and Lim12), such as following a restricted diet (e.g. keto, paleo or clean eating)(Reference Amidor13). Much of this information is not evidence based and does not follow dietary guidelines(Reference Ramachandran, Kite and Vassallo14), consequently perpetuating misinformation and providing conflicting information about nutrition(Reference Carpenter, Geryk and Chen15). Additionally, this information is often being created by individuals without formal nutrition, dietetic or health qualifications(Reference Adamski, Truby and Klassen7) and is being spread through a range of distinct sub-communities on social media composed of people from a range of backgrounds(Reference Lynn, Rosati and Leoni Santos8). As dietary advice varies by demographic and medical conditions, there is a benefit of sharing information within the specific sub-communities; however, it is not possible to predict how they interpret or act on this advice on social media. With little or no regulation of the content on social media around nutrition and food(Reference Freeman, Kelly and Vandevijvere16), it is imperative that evidenced-based information be amplified to counter the spread of misinformation and encourage healthy eating behaviours(Reference Ramachandran, Kite and Vassallo14,Reference Lofft17,Reference Wang, McKee and Torbica18) . It is also important to understand the extent of the information that is being spread and the conversations that are being had on social media about nutrition and food to develop strategies to counter it and promote healthy eating.

Social media content (Table 1: Glossary of terms) from social media platforms such as Twitter, Instagram and Facebook, is one form of non-traditional real-time data that is being used in addition to, or as a replacement for, traditional research data collection methods such as randomised controlled trials, particularly to gather patients’ perspectives(Reference McDonald, Malcolm and Ramagopalan34). Traditional research methods are costly, time consuming and burdensome on participants, whereas social media is habitually used by participants to express their opinions and its use in research can reduce the burden on both participant and researcher(Reference McDonald, Malcolm and Ramagopalan34). Social media usage is prolific, with approximately 70% of American adults(35), 56% of European adults(36) and 79% of Australian adults(37) using social media sites in 2018–2020. Behaviours, attitudes and perceptions of the public are readily available on social media and can be used to understand complex problems(Reference Sloan and Quan-Haase11). Social media has been previously used as a part of intervention studies which aimed at promoting and encouraging healthy eating(Reference Chau, Burgermaster and Mamykina38,Reference Klassen, Douglass and Brennan39) . Pre-existing social media data have been collected and analysed to investigate dietary behaviours(Reference Stirling, Willcox and Ong40) and to determine the types of social media posts and users who post that receive the most engagement by social media users(Reference Klassen, Borleis and Brennan29,Reference Barklamb, Molenaar and Brennan41) . Social media can also be used in real time for surveillance monitoring in areas such as disease outbreaks, medication safety, individual wellbeing and diet success(Reference Paul, Sarker and Brownstein42). However, previous research on social media in relation to nutrition has largely focused on output metrics of engagement online (e.g. likes, comments, shares) on a small scale (between nine social media profile pages and 736 social media posts) with use of manual analysis(Reference Lynn, Rosati and Leoni Santos8,Reference Ramachandran, Kite and Vassallo14,Reference Klassen, Borleis and Brennan29,Reference Barklamb, Molenaar and Brennan41) . Nutrition research has less frequently explored large social media datasets and the breadth of the public’s opinions and emotions expressed in social media posts.

Table 1. Glossary of terms

Natural language processing (NLP) methods (Table 1) allow the analysis of large amounts of social media data to a deep level that goes beyond engagement and explores the opinions and ‘real life’ experiences of the social media users(Reference Farzindar and Inkpen43). Social media data are often text based and written by human users, therefore comprising their ‘natural language’. The number of social media posts about a certain topic and consequently the number of words in all those social media posts combined is vast. Thus, it is important to find a technique to analyse the data in a way that reduces time and human burden. Methods utilising NLP use computational techniques to learn, understand and produce human language content(Reference Hirschberg and Manning26). These NLP methods can use machine learning techniques (Table 1) to perform a range of textual analyses, such as tracking trending topics and identifying opinions and beliefs around different topics through topic modelling (Table 1) and identifying different social networks of people through social network analysis (Table 1)(Reference Hirschberg and Manning26). To gather social media information to be analysed through NLP techniques, the researchers need to mine or use web-scraping techniques to gather the data. This may be done through an application programming interface (API) (Table 1), which is a software intermediary that allows two applications to talk to each other in order to exchange information(19), and in this case gather social media data. These applications allow researchers to gather amounts of data that would be otherwise unavailable in large quantities or in an automated and efficient way(Reference Hirschberg and Manning26).

One NLP technique that has been used to analyse opinions and attitudes on social media is sentiment or emotion analysis (Table 1). Sentiment or emotion analysis, sometimes referred to as opinion mining, uses written natural language to analyse the opinions, sentiments, attitudes and emotions embodied within the text(Reference Liu28). Sentiment or emotion analysers can be based on machine learning or rule-based techniques. A machine learning approach typically uses either a subset of the sentiment or emotion coded text data, or a lexicon (Table 1) with words assigned to their corresponding sentiment or emotion, which is used to build and train a machine learning model to classify the sentiment of the text(Reference Liu28). Other sentiment or emotion analysers use rule-based techniques or pattern libraries where patterns of sentiment and words are matched. Words and symbols within the natural language text are assigned a polarity, often on a scale of positive/very positive to negative/very negative. Sentiment or emotion analysis can be performed with a range of NLP and machine learning tools, from lexicon and rule-based tools such as Valence Aware Dictionary and Sentiment Reasoner (VADER) Sentiment(Reference Hutto and Gilbert44), support vector machine algorithms(Reference Zainuddin and Selamat45) and Naïve Bayes algorithms(Reference Narayanan, Arora and Bhatia46) to models based on convolutional and deep neural networks(Reference Ain, Ali and Riaz47) (for glossary of terms, see Table 1). Once the system for collection of data and analysis is set up, it can be a relatively quick way of interpreting large amounts of natural language data, typically tens of thousands or millions of posts, which would traditionally be a very time-consuming and labour-intensive process.

The use of sentiment or emotion analysis has increased with the popularity of social media, as social media data provide a never-before-seen amount of information about a range of different people’s and communities’ opinions, attitudes and experiences(Reference Liu28). Sentiment or emotion analysis techniques are constantly evolving and have the potential to use the vast amount of nutrition- and food-related information that is present on social media, with over 113 million posts on Instagram using the hashtag #healthyfood as of 28 February 2023. Sentiment or emotion analysis helps to understand the sentiment and emotion behind the social media conversations in a consistent systematised way and to a scale that manual text or language could not achieve. Sentiment or emotion analysis provides another perspective beyond social media analytics by considering what was said about the topic. Sentiment or emotion analysis has been applied in many areas, including product or service reviews(Reference Fang and Zhan48), politics and political events such as elections(Reference Ramteke, Shah and Godhia49), healthcare(Reference Gohil, Vuik and Darzi50) and health and wellbeing(Reference Zunic, Corcoran and Spasic51). However, it is currently unclear how well sentiment or emotion analysis techniques that have been used in other contexts apply to nutrition and food and how sentiment or emotion analysis has been used to analyse nutrition and food related social media data. Therefore, the aim of this scoping literature review is to explore the use of the NLP technique of sentiment or emotion analysis to analyse social media content related to nutrition, food and cooking. The key objectives of this scoping review were to:

  1. 1. Classify the areas of nutrition, food and cooking that have been explored using sentiment or emotion analysis to assess healthy eating habits and dietary patterns.

  2. 2. Classify the techniques used to undertake sentiment or emotion analysis.

  3. 3. Determine the potential efficacy of using sentiment or emotion analysis on nutrition-, food- and cooking-related content.

  4. 4. Identify other data science techniques used alongside sentiment or emotion analysis and future research directions for sentiment and emotion analysis in the area of nutrition, cooking and food.

Methods

This systematic scoping review was conducted according to the updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement(Reference Page, McKenzie and Bossuyt52) and PRISMA extension for scoping reviews(Reference Tricco, Lillie and Zarin53). A systematic scoping review was chosen as a more appropriate method than a systematic review due to the aims of identifying the areas that the technique of sentiment analysis has been used in and to identify the key characteristics of the papers including the types of methods and outcomes(Reference Munn, Peters and Stern54). Through initial searching, there were no previous literature reviews or literature review protocols identified with the same purpose related to nutrition, food and cooking social media sentiment analysis. Following the PRISMA statement and PRISMA extension for scoping reviews the scoping review was conducted using the following steps: (1) development of rationale and objectives; (2) determining eligibility criteria; (3) developing, testing and iterating a literature database search strategy; (4) screening papers for eligibility; (5) charting/extraction of the data; (6) synthesis of results. This review was registered with Open Science Framework (DOI: 10·17605/OSF.IO/2UW3E).

Inclusion and exclusion criteria

Types of studies

Quantitative and mixed-methods studies were considered for inclusion. Academic research in the form of journal articles and conference papers was considered eligible.

Types of intervention(s)/phenomena of interest

PICOTS was used to determine inclusion and exclusion criteria and subsequent search terms due to PICOTS often being used alongside PRISMA and in the area of nutrition. For PICOTS table for details on inclusion criteria, see Table 2. Studies which used sentiment and/or emotion analysis to classify sentiment or emotions of social media data related to nutrition, food and cooking were considered eligible. Sentiment analysis methods should involve computational classification of sentiment into different polarities (e.g. positive, negative and neutral) and not solely manual sentiment or emotion analysis. Data analysed in the studies must have been from a social networking (e.g. Facebook), media sharing (e.g YouTube, Pinterest), social news (e.g. Reddit), blogs and forums (e.g. Wordpress) or microblogging (e.g. Twitter, Tumblr) social media platform as defined by Sloan et al.(Reference Sloan and Quan-Haase11). The social media data needed to be related to nutrition, food, healthy eating or cooking.

Table 2. PICOTS summary table

Studies which looked at social media data related to food product, food delivery, restaurant or brand reviews and marketing of food were not included as they did not specifically relate to healthy eating or eating habits. Studies around weight loss, obesity or health conditions were not included unless they focused on a related diet or nutrition aspect as well. Social media data that focused solely on dietary supplements were not included. Studies which looked at social media data related to foodborne illness and food safety (e.g. genetically modified food and safety) were considered out of scope for this review. Papers had to be published in English. No date limit was applied.

Types of outcomes

To be eligible, papers could report outcomes related to the number or percentage of social media posts that were classified as different sentiments or emotions. If studies focused on the development of a sentiment or emotion analysis engine or method, eligible outcomes included the accuracy of that developed method to classify the sentiment or emotion of the social media data. Studies which included outcomes which compared the accuracy of multiple sentiment or emotion analysis methods were also eligible for inclusion.

Literature search strategy

Nine databases from both health and computer science were searched using the same search terms for relevant papers (Ovid MEDLINE, PubMed, Scopus, Emerald, INSPEC, Compendex, ACM Digital Database, IEEE and Computer Science Database) on 5 November 2020 and an updated search on 18 January 2022. These databases were chosen due to their coverage and popularity for use in both the areas of nutrition and computer science and as they contained key papers identified as eligible for inclusion from initial test searches.

Search terms included terms for sentiment analysis (e.g. ‘sentiment analysis’, ‘sentiment classification’, ‘emotion analysis’, ‘opinion mining’ combined with OR) AND terms related to social media (e.g. ‘Social media’, ‘Social network*’, ‘Facebook’, ‘Instagram’ combined with OR) AND terms related the nutrition and food (e.g. ‘Nutr*’, ‘Healthy eating’, ‘Diet*’ combined with OR). These search terms were chosen after multiple iterations to cover the three key aspects necessary for a paper to be included being: social media data, nutrition, food or cooking related and using sentiment or emotion analysis. Synonyms and related techniques for sentiment analysis were identified and test searches were used to see the scope and relevance of papers included using different terms. Searches were restricted to English language only. For the full search strategy, see Appendix 1.

Data management

Results from each of the databases were imported into Endnote. The Endnote file was then imported into Covidence software for duplicate removal, title and abstract and full text screening (Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia).

Data screening/study selection

Two reviewers (A.M. and E.J.) independently screened the titles and abstracts of each article for potential eligibility. The full text of those that were considered potentially eligible in title and abstract screening were independently screened by the same two reviewers (A.M. and E.J.). Any disagreement between the reviewers was either discussed until a consensus was reached or was resolved by a third reviewer (T.A.M.).

Data extraction

Data from each eligible study were extracted, charted and stored in an Excel spreadsheet (Appendix 2). The Excel spreadsheet used for data charting was developed and iterated on the basis of feedback from authors and from information that was presented in the included studies. Data charting was undertaken independently by the lead author (A.M.). Data extracted included details about the types of articles, author disciplines, aims of the study, social media platform, social media data extraction methods, amount of social media data collected, sentiment analysis procedures, other analysis methods, results for sentiment analysis and other analyses and outcomes of significance to the research question of this review. Additionally, due to the research objective of identifying other data science techniques that can be used for social media data analysis in this area, data were extracted related to other analysis techniques used and the overall results of these analyses.

Data synthesis

A narrative synthesis was undertaken to summarise findings of the included studies. Quality appraisal was not conducted due to this being a systematic scoping review and of an exploratory nature, and therefore we were not evaluating the clinical effectiveness or assessing feasibility of an intervention(Reference Munn, Peters and Stern54).

Results

A total of 7325 papers were collected from the nine databases (Fig. 1). Of the 4303 papers included in title and abstract screening after duplicate removal, 4232 were considered irrelevant after first-pass screening. Papers that were excluded included those that used data from websites that were not social media platforms, papers focusing on health data that was not specifically nutrition, food or cooking related, papers which used other NLP methods but did not conduct sentiment analysis or papers that focused on specific food products and the marketing of those products. The full texts of seventy-one papers were screened, of which thirty-seven papers met the inclusion criteria and were included in the review. Papers that met most of the inclusion criteria but were not included overall, included papers such as by Pugsee et al.(Reference Pugsee and Niyomvanich56) that used data from a website that included comments of the recipes but would not be classified as a social media platform (as defined in methods) and Mazzocut et al.(Reference Mazzocut, Truccolo and Antonini57), who conducted manual analysis of sentiment rather than computational.

Fig. 1. PRISMA flow diagram of systematic scoping review on sentiment analysis and data science to assess the language of nutrition-, food- and cooking-related content on social media(Reference Moher, Liberati and Tetzlaff55).

Characteristics of papers

Of the thirty-seven papers included, twenty-four were journal articles, ten were conference proceedings, one was a book chapter, one was a preprint publication and one was a technical report (Table 3). The results from one study were reported in both a journal article(Reference Zhou and Zhang86) and conference proceedings(Reference Zhou and Zhang85). Characteristics of the papers can be found in Table 3. The authors of the papers were affiliated with a range of countries, with the most common including the United States (n = 13), followed by India (n = 3), Ireland (n = 3), Spain (n = 3), South Korea (n = 3), Algeria (n = 2), China (n = 2), Indonesia (n = 2), Japan (n = 2), Poland (n = 2), Czech Republic (n = 1), Iran (n = 1), Latvia (n = 1), New Zealand (n = 1), Portugal (n = 1), the Netherlands (n = 1) and the United Kingdom (n = 1).

Table 3. Characteristics of studies by social media platform

Almost half (n = 15, 40·5%) of the papers had interdisciplinary authors from both health and computer science and technology fields, while sixteen (43·2%) had authors from only computer science/technology disciplines and four (10·8%) had interdisciplinary authors, however not including people with health backgrounds. The conference proceedings were primarily published by authors from computer science disciplines.

Characteristics of social media data

The majority of the papers (n = 25) used data from Twitter, followed by YouTube (n = 7) and blogs (n = 3). Less commonly used were Sina Weibo (n = 2), Facebook (n = 1), Instagram (n = 1), Reddit (n = 1), Pinterest (n = 1) and WhatsApp (n = 1). Papers primarily used one social media platform for all their data collection (n = 32), whereas five used a combination of platforms, often both social media and other websites (news sites, forums, PubMed). One study(Reference Kashyap and Nahapetian64) identified Twitter users of interest and collected data from only those users, three papers collected comments from only the top YouTube channels in the area such as cooking(Reference Donthula and Kaushik81,Reference Kaur, Kaushik and Sharma82,Reference Shah, Kaushik and Sharma84) , while others collected data through filtering posts using keywords that were relevant to their study.

The areas of nutrition, food and cooking covered across papers varied widely across five main themes and ten sub-themes (Fig. 2). The first theme involved studies looking at dietary patterns including the four sub-themes of ‘general dietary patterns and choices’ including dietary preferences and attitudes (n = 5)(Reference Dondokova, Aich and Hee-Cheol61,Reference Vydiswaran, Romero and Zhao75,Reference Vydiswaran, Romero and Zhao76,Reference Zhou and Zhang85,Reference Zhou and Zhang86) , ‘organic and sustainable food’ (n = 5)(Reference Brzustewicz and Singh59,Reference Rintyarna, Salamatu and Nazmona67,Reference Singh and Glińska-Neweś72,Reference Meza and Yamanaka83,Reference Pilař, Stanislavská and Rojík87) , ‘veganism’ including both vegan diet and lifestyle (n = 1)(Reference Jennings, Danforth and Dodds62) and ‘gluten-free diet’ (n = 1)(Reference Rivera, Warren and Curran88). The second theme involved ‘cooking and recipes’ (n = 6)(Reference Benkhelifa, Laallam and Mostafa79Reference Kaur, Kaushik and Sharma82,Reference Shah, Kaushik and Sharma84,Reference Cheng, Lin and Wang89) . The third theme involved ‘diet and health’ including the three sub-themes of diet and health conditions with the health conditions including general health status, diabetes and bowel disease (n = 4)(Reference Kashyap and Nahapetian64,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Shaw and Karami71,Reference Ramsingh and Bhuvaneswari93) , diet and obesity (n = 2)(Reference Kim, Park and Song90,Reference Yeruva, Junaid and Lee94) and diet and weight loss (n = 2)(Reference Shadroo, Nejad and Bali70,Reference Kim and Oh91) . The fourth theme involved public health including two sub-themes: public health policy and programmes in the areas of school meals, food security and sugar consumption (n = 3)(Reference Bridge, Flint and Tench58,Reference Kang, Wang and Zhang63,Reference Scott, Jihwan and Chappelka69) and food prices (n = 1)(Reference Surjandari, Naffisah and Prawiradinata74). The fifth theme involved nutrition and food in general (n = 6)(Reference Pindado and Barrena66,Reference Saura, Reyes-Menendez and Thomas68,Reference Sprogis and Rikters73,Reference Widener and Li77,Reference Yeruva, Junaid and Lee78,Reference Masih92) including the sub-theme food and mood (n = 1)(Reference Dixon, Jakić and Lagerweij60). This theme also covered topics ranging from different health foods to diets, food trends and foods considered healthy and unhealthy. The papers in the cooking and recipes theme were all conducted using YouTube data and often used similar techniques or were by the same authors.

Fig. 2. Themes and sub-themes of topics across studies.

The aims and objectives of the studies varied widely and included: gathering social media users sentiment and opinions on their topic (n = 15)(Reference Jennings, Danforth and Dodds62,Reference Kang, Wang and Zhang63,Reference Saura, Reyes-Menendez and Thomas68Reference Shadroo, Nejad and Bali70,Reference Singh and Glińska-Neweś72Reference Vydiswaran, Romero and Zhao76,Reference Zhou and Zhang85Reference Rivera, Warren and Curran88,Reference Masih92) , building a sentiment classification system for their social media data (n = 8)(Reference Shaw and Karami71,Reference Yeruva, Junaid and Lee78,Reference Benkhelifa, Laallam and Mostafa79,Reference Donthula and Kaushik81,Reference Shah, Kaushik and Sharma84,Reference Kim, Park and Song90,Reference Ramsingh and Bhuvaneswari93,Reference Yeruva, Junaid and Lee94) , exploring their topic area and who is discussing it (n = 6)(Reference Bridge, Flint and Tench58,Reference Brzustewicz and Singh59,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Pindado and Barrena66,Reference Widener and Li77,Reference Meza and Yamanaka83) , understanding food consumption patterns and emotion (n = 4)(Reference Dixon, Jakić and Lagerweij60,Reference Dondokova, Aich and Hee-Cheol61,Reference Rintyarna, Salamatu and Nazmona67,Reference Cheng, Lin and Wang89) and building an online system or application to apply sentiment findings (n = 1)(Reference Kim and Oh91). Other studies focused on developing a system to recommend recipes based on sentiment (n = 1)(Reference Benkhelifa, Bouhyaoui and Laallam80), monitoring health status (n = 1)(Reference Kashyap and Nahapetian64) and exploring potential applications for machine learning in their topic area (n = 1)(Reference Kaur, Kaushik and Sharma82). Those studies that focused on developing a methodology were more likely to build their own classification system than use an open-source tool, and their results were more likely to focus on testing of their model or framework rather than exploring what the data of their topic area were saying.

For those papers that used Twitter, the data were collected through either the Twitter API (n = 12, 48%) (Table 1), which uses archive Twitter data, or the Twitter Streaming API (n = 6), the live collection version of the API (Table 4). Other methods to extract Twitter data included the Decahose streaming API, which provides a random sample of 10% of all public Twitter messages. The YouTube API was used for all papers using YouTube. Social media data were collected primarily after 2010, with only three papers collecting data from before 2010. Data were collected within certain time periods; however, seven papers did not report the range of dates. Of those studies that reported dates, the timeframe in which social media data were collected ranged from a 5-d period to a 9-year and 4-month-long period. Data were collected from a specific location in seven papers, the United States(Reference Kang, Wang and Zhang63,Reference Scott, Jihwan and Chappelka69,Reference Widener and Li77,Reference Yeruva, Junaid and Lee78) , China(Reference Zhou and Zhang86) and India(Reference Ramsingh and Bhuvaneswari93), with other papers not specifying a location or focusing on social media across the world. Some papers collected only data that were published in a specific language, including English(Reference Dixon, Jakić and Lagerweij60,Reference Jennings, Danforth and Dodds62,Reference Kang, Wang and Zhang63,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Saura, Reyes-Menendez and Thomas68,Reference Shadroo, Nejad and Bali70Reference Singh and Glińska-Neweś72,Reference Meza and Yamanaka83,Reference Kim and Oh91) , Hinglish (Hindi/English)(Reference Donthula and Kaushik81,Reference Kaur, Kaushik and Sharma82) , Marglish (Marathi/English) or Devanagiri(Reference Shah, Kaushik and Sharma84) and Latvian(Reference Sprogis and Rikters73), while others did not specify.

Table 4. Social media data collection, sentiment analysis techniques and key findings by social media platform

API, application programming interface; ASA, Assemble Sentiment Analysis; LIWC, linguistic inquiry and word count; SD, standard deviation; VADER, Valence Aware Dictionary and sEntiment Reasoner.

Only seven papers reported how many unique social media users contributed to the body of social media data they collected, with these studies collecting data from either Twitter or Instagram(Reference Kang, Wang and Zhang63Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Shadroo, Nejad and Bali70,Reference Vydiswaran, Romero and Zhao75,Reference Vydiswaran, Romero and Zhao76,Reference Pilař, Stanislavská and Rojík87) . The number of unique contributors ranged from 120 to 355 856 users and averaged across papers, 133 670 users contributed to the final samples of included data. Papers that used Twitter used between 700 and six million tweets that had been filtered for relevance and cleaned for data analysis (819 791 tweets on average across papers). Papers that researched YouTube commentary used between 1065 and 42 551 comments from videos (11 144 comments on average across papers).

Characteristics of sentiment analysis

The techniques used to classify the sentiment of the social media text data can be found in Table 4 (for glossary of terms, see Table 1). Techniques for sentiment analysis used included various Naïve Bayes/Bayesian methods (n = 7), support vector machines (n = 5), VADER (n = 6), decision trees (n = 3), linguistic inquiry and word count (LIWC) (n = 3), the Syuzhet package (n = 3), neural networks (multi-layer perceptron, recurrent) (n = 2), random forest (n = 2) and logistic regression (n = 2). Some papers used open-source sentiment software packages, that is, VADER(Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Rintyarna, Salamatu and Nazmona67,Reference Scott, Jihwan and Chappelka69,Reference Yeruva, Junaid and Lee78,Reference Cheng, Lin and Wang89,Reference Yeruva, Junaid and Lee94) , SentiStrength(Reference Meza and Yamanaka83), CoreNLP(Reference Yeruva, Junaid and Lee94), Sentiment140(Reference Dixon, Jakić and Lagerweij60), PHPInsight(Reference Shadroo, Nejad and Bali70), TextBlob(Reference Yeruva, Junaid and Lee94), MeaningCloud (an Excel plug-in)(Reference Bridge, Flint and Tench58) and an open-source model developed previously by Colnerič et al.(Reference Kim and Oh91,Reference Colnerič and Demšar105) . Six papers(Reference Bridge, Flint and Tench58,Reference Kashyap and Nahapetian64,Reference Sprogis and Rikters73,Reference Benkhelifa, Laallam and Mostafa79,Reference Benkhelifa, Bouhyaoui and Laallam80,Reference Rivera, Warren and Curran88) employed manual sentiment classification to verify a subset of the classifications or to provide training data for the sentiment classification method. Classifications varied between manual analysis, with Bridge et al.(Reference Bridge, Flint and Tench58) finding 64% of their tweets being negative through MeaningCloud computational analysis compared with 52% through manual analysis.

Papers either used currently available sentiment analysis techniques as they are, modified versions of currently available techniques or created new techniques or algorithms for use in their study (Table 4). Thirteen papers(Reference Kashyap and Nahapetian64,Reference Scott, Jihwan and Chappelka69,Reference Sprogis and Rikters73,Reference Surjandari, Naffisah and Prawiradinata74,Reference Benkhelifa, Laallam and Mostafa79Reference Kaur, Kaushik and Sharma82,Reference Shah, Kaushik and Sharma84,Reference Rivera, Warren and Curran88,Reference Masih92Reference Yeruva, Junaid and Lee94) used a combination of methods, which they compared to ascertain their accuracy and the most appropriate method to use for their topic area and type of data (Table 4). Five studies(Reference Sprogis and Rikters73,Reference Surjandari, Naffisah and Prawiradinata74,Reference Donthula and Kaushik81,Reference Kaur, Kaushik and Sharma82,Reference Shah, Kaushik and Sharma84) compared different word embedding and vectorisation techniques (for pre-processing of the data, see Table 1) alongside different sentiment classification techniques, while the others just looked at different sentiment classification techniques. Four papers(Reference Donthula and Kaushik81,Reference Kaur, Kaushik and Sharma82,Reference Shah, Kaushik and Sharma84,Reference Ramsingh and Bhuvaneswari93) used term frequency–inverse document frequency (TF-IDF) (Table 1) to vectorise words within the social media text, which is a pre-processing step to assign words a number on the basis of its frequency in the dataset in order to analyse the data.

There were four papers which focused on the development of a sentiment classification technique and therefore reported only the efficacy of the different methods they developed such as the recall, precision and F-measure for predicting the sentiment of the text rather than reporting the actual proportion of text classified into the sentiment categories(Reference Benkhelifa, Laallam and Mostafa79,Reference Donthula and Kaushik81,Reference Kaur, Kaushik and Sharma82,Reference Shah, Kaushik and Sharma84) . Accuracy was reported in seven studies which often involved purpose-built machines and comparing multiple methods to ascertain the most accurate in their topic area(Reference Saura, Reyes-Menendez and Thomas68,Reference Benkhelifa, Laallam and Mostafa79Reference Kaur, Kaushik and Sharma82,Reference Shah, Kaushik and Sharma84,Reference Zhou and Zhang86) . Only one paper which used an open-source tool reported accuracy(Reference Yeruva, Junaid and Lee94). Across studies accuracy was on average 73·6%, while the accuracy ranged from 33·33% for predicting negative sentiment of obesity and healthy eating tweets using CoreNLP an open-source software(Reference Yeruva, Junaid and Lee94) to 98·53% for predicting overall sentiment of cooking YouTube videos using a multi-layer perceptron neural network(Reference Donthula and Kaushik81). Neural network sentiment engines and support vector machines generally performed better than Naïve Bayes sentiment and decision tree sentiment engines.

Of the fourteen papers(Reference Bridge, Flint and Tench58,Reference Kang, Wang and Zhang63,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Rintyarna, Salamatu and Nazmona67,Reference Shadroo, Nejad and Bali70,Reference Sprogis and Rikters73Reference Vydiswaran, Romero and Zhao75,Reference Yeruva, Junaid and Lee78,Reference Rivera, Warren and Curran88Reference Kim, Park and Song90,Reference Masih92,Reference Ramsingh and Bhuvaneswari93) that reported the percentage or amount of their overall data within each sentiment, the percentage of classifications ranged from 12·9% to 81% for positive (average across papers 38·8%), 8% to 82% for neutral (average across papers 46·6%) and 5% to 76·9% for negative (average across papers 28·0%); however, not all papers reported sentiment for all classifications (Fig. 3). Some papers had higher proportions of positive classifications either overall or by category, in topic areas such as food and mood(Reference Dixon, Jakić and Lagerweij60), dietary patterns and choices(Reference Dondokova, Aich and Hee-Cheol61,Reference Vydiswaran, Romero and Zhao75) , veganism(Reference Jennings, Danforth and Dodds62), organic food(Reference Rintyarna, Salamatu and Nazmona67,Reference Pilař, Stanislavská and Rojík87) , diet and health conditions(Reference Ramsingh and Bhuvaneswari93), cooking(Reference Benkhelifa, Bouhyaoui and Laallam80) and nutrition and food in general(Reference Widener and Li77,Reference Yeruva, Junaid and Lee78) , across Twitter(Reference Dixon, Jakić and Lagerweij60Reference Jennings, Danforth and Dodds62,Reference Rintyarna, Salamatu and Nazmona67,Reference Vydiswaran, Romero and Zhao75,Reference Widener and Li77,Reference Yeruva, Junaid and Lee78) , Instagram(Reference Pilař, Stanislavská and Rojík87), YouTube(Reference Benkhelifa, Bouhyaoui and Laallam80) and multiple platforms(Reference Ramsingh and Bhuvaneswari93). Other papers had higher proportions of negative classifications in topic areas such as bowel disease and diet(Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65), diet and lifestyle as risk factors for diabetes(Reference Shaw and Karami71), sugar tax(Reference Bridge, Flint and Tench58) and food prices(Reference Surjandari, Naffisah and Prawiradinata74), all of which used Twitter data. Some had primarily neutral classifications in the topic areas of public health programmes(Reference Kang, Wang and Zhang63), diet and obesity(Reference Kim, Park and Song90), gluten-free diet(Reference Rivera, Warren and Curran88), diet and weight loss(Reference Shadroo, Nejad and Bali70), health foods such as organic, non-GMO(Reference Masih92) and nutrition and food in general(Reference Sprogis and Rikters73), across Twitter(Reference Kang, Wang and Zhang63,Reference Shadroo, Nejad and Bali70,Reference Sprogis and Rikters73) , Reddit(Reference Rivera, Warren and Curran88) and multiple platforms(Reference Kim, Park and Song90,Reference Masih92) .

Fig. 3. Proportion of sentiment classifications (positive, negative, neutral) across studies by social media platform.

Other analyses performed

Many of the papers looked at other NLP or machine learning methods alongside sentiment analysis, often to perform some form of analysis of subjective classifications of data. There were nine papers that classified the social media data by the healthiness or nutrition content of the food or subject of the text(Reference Kashyap and Nahapetian64,Reference Saura, Reyes-Menendez and Thomas68,Reference Vydiswaran, Romero and Zhao75Reference Yeruva, Junaid and Lee78,Reference Benkhelifa, Bouhyaoui and Laallam80,Reference Cheng, Lin and Wang89,Reference Yeruva, Junaid and Lee94) . A health score was commonly based on a set of pre-defined ‘healthy’ and ‘unhealthy’ words or topics that the authors used to classify the health score of individual social media text entries(Reference Kashyap and Nahapetian64,Reference Vydiswaran, Romero and Zhao75Reference Yeruva, Junaid and Lee78,Reference Yeruva, Junaid and Lee94) . Health scores were also assigned through topic modelling(Reference Saura, Reyes-Menendez and Thomas68) or the classification of different aspects of a YouTube video, with ‘healthy’ being one aspect(Reference Benkhelifa, Bouhyaoui and Laallam80). Health scores were sometimes used in combination with sentiment analyses, reporting sentiment classifications for ‘healthy’ and ‘unhealthy’ social media content (Table 4). Of those papers that reported a health score, neither the ‘healthy’ or ‘unhealthy’ text were consistently more likely to be positive, negative or neutral; however, more papers appeared to have a higher proportion of positive data overall for both ‘healthy’ and ‘unhealthy’ text. Other studies determined the nutritional content of the social media posts(Reference Saura, Reyes-Menendez and Thomas68,Reference Cheng, Lin and Wang89,Reference Ramsingh and Bhuvaneswari93) , for example analysing the nutritional content of recipes from Pinterest(Reference Cheng, Lin and Wang89) and glycaemic index of food mentioned across a range of platforms(Reference Ramsingh and Bhuvaneswari93).

Topic modelling was another NLP method used across fifteen papers(Reference Brzustewicz and Singh59,Reference Dondokova, Aich and Hee-Cheol61,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Saura, Reyes-Menendez and Thomas68Reference Singh and Glińska-Neweś72,Reference Vydiswaran, Romero and Zhao76,Reference Zhou and Zhang85,Reference Zhou and Zhang86,Reference Rivera, Warren and Curran88,Reference Cheng, Lin and Wang89,Reference Masih92,Reference Yeruva, Junaid and Lee94) to statistically group the social media text data into different clusters with related words that commonly occur together to form topics. The most commonly used topic modelling technique was Latent Dirichlet Analysis (LDA) (Table 1), which was used across nine papers(Reference Brzustewicz and Singh59,Reference Saura, Reyes-Menendez and Thomas68,Reference Shadroo, Nejad and Bali70Reference Singh and Glińska-Neweś72,Reference Zhou and Zhang85,Reference Zhou and Zhang86,Reference Rivera, Warren and Curran88,Reference Yeruva, Junaid and Lee94) . Emotion analysis which looks beyond positive/negative sentiment at the more nuanced emotion (e.g. joy, sadness, surprise) was performed in only three studies(Reference Brzustewicz and Singh59,Reference Singh and Glińska-Neweś72,Reference Kim and Oh91) with two of these studies(Reference Brzustewicz and Singh59,Reference Singh and Glińska-Neweś72) using the NRC lexicon, which is an open-source emotion analysis tool. Social network analysis(Reference Bridge, Flint and Tench58,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Meza and Yamanaka83) and clustering techniques(Reference Brzustewicz and Singh59,Reference Dondokova, Aich and Hee-Cheol61,Reference Pindado and Barrena66,Reference Scott, Jihwan and Chappelka69,Reference Kaur, Kaushik and Sharma82,Reference Pilař, Stanislavská and Rojík87Reference Cheng, Lin and Wang89) were used to explore relationships between and categorise the social media users or topics within the data. Other analyses performed included changes in the sentiment or topic of the social media data over time(Reference Jennings, Danforth and Dodds62,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Shadroo, Nejad and Bali70,Reference Sprogis and Rikters73,Reference Rivera, Warren and Curran88,Reference Kim, Park and Song90,Reference Masih92) , with some studies considering world events at the time such as disease outbreaks, prominent discussions of the topic of interest in the media and food price increases(Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Sprogis and Rikters73,Reference Rivera, Warren and Curran88) , differences in sentiment or topic across different geo-locations(Reference Dixon, Jakić and Lagerweij60,Reference Kang, Wang and Zhang63,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Pindado and Barrena66,Reference Scott, Jihwan and Chappelka69,Reference Shadroo, Nejad and Bali70,Reference Widener and Li77,Reference Yeruva, Junaid and Lee78,Reference Meza and Yamanaka83,Reference Zhou and Zhang86,Reference Masih92) and gender differences(Reference Kang, Wang and Zhang63,Reference Zhou and Zhang86) . As data were collected before and during the coronavirus disease 2019 (COVID-19) pandemic, there were three studies(Reference Brzustewicz and Singh59,Reference Rintyarna, Salamatu and Nazmona67,Reference Sprogis and Rikters73) which had some focus on the pandemic. Two of these studies(Reference Brzustewicz and Singh59,Reference Rintyarna, Salamatu and Nazmona67) had mostly positive sentiment despite data being collected during the pandemic, and one study noted a peak in discussion of certain food groups during panic buying at the start of the pandemic(Reference Sprogis and Rikters73).

Societal and practical implications of papers

All but two papers(Reference Kashyap and Nahapetian64,Reference Donthula and Kaushik81) discussed some broader societal and practical implications of their findings or their sentiment analysis techniques for future use in data science. These varied in detail and breadth, with more detail generally provided in papers including interdisciplinary authors. These implications included the following: gathering large-scale data using a platform consumers already use, discovering and being to monitor popular foods, eating habits and trends across time and across the world, and assessing public concerns and attitudes and the framing of the debate around issues such as public health policy. Other implications included being able to identify stakeholders and key influencers in different topic areas, detecting communities who discuss certain topics, understanding any common misconceptions around nutrition and understanding strategies to effectively communicate with your audience and encourage behaviour change that will be positively received.

Regarding issues such as ethics and privacy of social media data use, only three papers discussed ethics. One paper stated that YouTube data are publicly available so ethics approval to use the data was not required(Reference Meza and Yamanaka83), another stating that while Twitter’s data are publicly available they still sought ethical approval for their research(Reference Bridge, Flint and Tench58) and another discussing not using verbatim tweet examples in the research due to ethical concerns(Reference Vydiswaran, Romero and Zhao76). Only three studies discussed privacy, with the discussion of the lack of personal information of the social media users such as gender and location due to privacy and data access policies of the social media platforms(Reference Kang, Wang and Zhang63,Reference Zhou and Zhang85,Reference Zhou and Zhang86) . The potential for bias in the data or data analysis methods was discussed in ten papers. This included sampling bias of the people using social media to discuss this topic and how they did that (i.e. by using hashtags) versus non-users or people not discussing that specific topic(Reference Bridge, Flint and Tench58,Reference Dixon, Jakić and Lagerweij60,Reference Jennings, Danforth and Dodds62,Reference Vydiswaran, Romero and Zhao75Reference Widener and Li77) , bias in the labelling of sentiment in the training data for the machine learning sentiment analyser(Reference Scott, Jihwan and Chappelka69,Reference Shah, Kaushik and Sharma84) , researcher bias in manual annotation of sentiment or topics and using multiple researchers to compare annotations to reduce this bias(Reference Bridge, Flint and Tench58,Reference Kang, Wang and Zhang63,Reference Rivera, Warren and Curran88) and media bias (left versus right) and the corresponding sentiment(Reference Scott, Jihwan and Chappelka69).

Discussion

This systematic scoping review explored the academic literature related to the use of sentiment analysis of social media data in the area of nutrition, food and cooking. Of the thirty-seven papers that met the inclusion criteria, the range of nutrition related topics varied widely, including areas such as dietary patterns and choices, cooking, diet and health conditions, and public health policy and programme. Papers either focused on the development and methodology for creating a sentiment analysis tool for their respective topic of interest or used already available tools for sentiment analysis, sometimes modifying these to suit their needs. Only seven papers looked at the accuracy, precision or recall of the sentiment engine for their data to correctly identify the sentiment of the social media text. In general, using sentiment analysis on nutrition, food and cooking social media data helped with understanding of the data, but the efficacy of the techniques varied widely. The accuracy of the engine to predict sentiment across papers ranged from neural network engines having the highest accuracy of up to 98·53% to the open-source tool CoreNLP having the lowest accuracy of 33·33%. Alongside sentiment analysis, other analyses were conducted to gather further information on the social media text such as topic modelling, changes over time, network analysis and classification of the healthiness or nutrition content of the foods mentioned within the social media posts.

The included papers assessed a large range of nutrition-, food- and cooking-related topics, from attitudes of individuals in relation to their own eating to public health policy and programmes around nutrition. A previous review on sentiment analysis of health and wellbeing content found a similar variation in topics discussed, with their papers focusing on quality of life, cancer, mental health, chronic conditions, pain, eating disorders and addiction(Reference Zunic, Corcoran and Spasic51). Gohil et al. found twelve healthcare-related papers using sentiment analysis in their review focusing on public health, emergency medicine and disease(Reference Gohil, Vuik and Darzi50). None of these papers in the previous sentiment analysis reviews on health and wellbeing(Reference Gohil, Vuik and Darzi50,Reference Zunic, Corcoran and Spasic51) focused specifically on nutrition, cooking or food. The range of nutrition and health topic areas included in the current reviews and reviews by Zunic et al. and Gohil et al. speaks to the breadth of the area. However, this breadth and the particular nuances in language related to the specific topics (i.e. public health versus cooking) make it difficult to draw conclusions about the efficacy of using sentiment analysis in specific topic areas. The breadth of topics in nutrition science and health that have been covered in this review and previous reviews is dissimilar to other applications of sentiment analysis focusing on reviews and products. The review- and product-related data are generally more homogeneous, with the social media posts analysed all giving their opinions on the same topic(Reference Shivaprasad and Shetty109). This makes for greater comparability between papers, unlike in the current review when the topics and social media data were heterogeneous. Commonly, open-source sentiment analysis tools are trained using this homogeneous product review or unspecific social media data(Reference Gohil, Vuik and Darzi50) and therefore may not be suitable for the specific nuanced language of nutrition and health social media data.

Across the current review and previous reviews on sentiment analysis in the areas of healthcare(Reference Gohil, Vuik and Darzi50) and health and wellbeing(Reference Zunic, Corcoran and Spasic51), a range of different tools were used from purpose-built models using methods such as support vector machines, Naïve Bayes learning and decision trees to open-source freely available tools and commercial software. While the accuracy of purpose-built sentiment engines was more likely to be tested, the accuracy of open-source tools to predict sentiment for health-related topics is largely either unexplored or low. In the review by Gohil et al. on healthcare sentiment analysis, no papers applying open-source tools tested accuracy(Reference Gohil, Vuik and Darzi50), and in the current review, accuracy was tested for only one open-source tool with a resulting accuracy of only 33·33%. Similar to the current review, accuracy was not routinely reported in reviews using sentiment analysis in the areas of health and wellbeing(Reference Zunic, Corcoran and Spasic51) and healthcare(Reference Gohil, Vuik and Darzi50). The accuracy of sentiment engines using purpose-built or modified tools such as support vector machines, Naïve Bayes classifiers and decision trees to predict sentiment of health and wellbeing data in a previous review on sentiment analysis was on average 79·8%(Reference Zunic, Corcoran and Spasic51), which was slightly higher than the average of 73·6% from the current review. These purpose-built or modified tools are more likely to be trained with data relevant to the topic area, making them more accurate; however, they require specialist computer science knowledge to create and run, so are not accessible to all. In comparison, the open-source tools are more accessible without specialist knowledge, but the lexicons used appear inappropriate for all topic areas. For example, the large lexical database WordNet, which is commonly used, does not include a general health, medical or nutrition domain(Reference Bentivogli, Forner and Magnini110). This limits the benefit of using such open-source tools that are not altered for specific contexts, as they may be unlikely to capture the nuances in language and classify the sentiment of nutrition or health data appropriately. It is important that accuracy is measured when using a pre-built or open-source sentiment analysis method in a new context to ensure it a suitable method.

Sentiment analysis is an interdisciplinary field as it is used by and is optimised with the input from experts from linguistics, NLP, machine learning, computer science, psychology and sociology(Reference Ligthart, Catal and Tekinerdogan111). Specialist knowledge in the area of computer science and technology is critical to develop a sentiment tool that can be trained using pre-coded data from the topic area of interest. However, it is also important to have context and subject matter experts to assist with the development of the sentiment analysis methods due to the particular context and language used when discussing nutrition and health. In the current review, 40·5% of the papers had interdisciplinary authors from both health and computer science and technology fields. A previous review on social media analytics’ use in nutrition found a third of papers had interdisciplinary authors; however, only two out of thirty-five papers involved authors from a nutrition background(Reference Stirling, Willcox and Ong40). Of those interdisciplinary papers in this current review, the collaboration between nutrition subject experts and computer science allowed for the development of new ontologies or dictionaries specific to diet and obesity(Reference Kim, Park and Song90), cooking(Reference Kaur, Kaushik and Sharma82) and medical terms(Reference Rivera, Warren and Curran88). Two other papers used their interdisciplinary team to expand the previously existing linguistic inquiry and word count (LIWC) dictionary to include food-specific sentiment words that were relevant to their context(Reference Vydiswaran, Romero and Zhao75,Reference Vydiswaran, Romero and Zhao76) . Of the papers which were not multi-disciplinary in the current review, previously developed lexicons and dictionaries were most commonly used. However, to ensure that the word polarity of previously developed sentiment analysis tools is relevant to the new domain or topic of interest, cross-domain sentiment alignment is necessary(Reference Hao, Mu and Hong112). To successfully apply sentiment analysis techniques developed by linguistics and computer scientists in healthcare, it is imperative that health professionals are also involved due to differing lexicons and interpretations of the sentiment of words from their different contexts.

Previous research using social media data by health professionals has often applied manual coding of content of the text and/or images of up to 5000 social media posts(Reference Klassen, Borleis and Brennan29,Reference Stirling, Willcox and Ong40,Reference Barklamb, Molenaar and Brennan41) . In the current review, only six papers(Reference Bridge, Flint and Tench58,Reference Kashyap and Nahapetian64,Reference Sprogis and Rikters73,Reference Benkhelifa, Laallam and Mostafa79,Reference Benkhelifa, Bouhyaoui and Laallam80,Reference Rivera, Warren and Curran88) used a form of manual coding to verify a subset of the classifications or to provide training data for the sentiment classification method. Gohil et al.(Reference Gohil, Vuik and Darzi50) previously found that six of twelve papers analysing sentiment of healthcare Tweets used a manual annotated sample and four of these used this sample to train their dataset. Manual classification has also been used as the sole sentiment analysis method in a study on nutrition as a complementary medicine by Mazzocut et al.(Reference Mazzocut, Truccolo and Antonini57), but was used on a smaller scale of only 423 data points. Sentiment analysis provides an opportunity to move beyond manual coding and to analyse previously unfeasible amounts of social media data in a systematic way and in much less time(Reference Liu28). Humans have bias which is useful for applying context to the data with individual epistemologies and viewpoints, but this can make inter-rater reliability low when manually coding(Reference McDonald, Schoenebeck and Forte113). While not able to take into account some complexities of context and specificities in language, a sentiment analyser can be trained with relevant data to mostly accurately predict sentiment(Reference Liu28). However, due to sentiment analysis models being trained with ‘real-world’ human social media data, they have biases towards what people typically say on social media(Reference Caliskan, Bryson and Narayanan114). For example, if a topic is generally discussed in a negative way, there would be more training data linking a negative sentiment to that topic and therefore the predicted sentiment for new data related to the topic would have a more negative sentiment score(Reference Caliskan, Bryson and Narayanan114).

Sentiment analysis is rapidly evolving with constant improvements in the techniques and algorithms with developments in machine learning which will enhance accuracy of sentiment classification and the ability to apply these methods to different topics outside product reviews(Reference Mäntylä, Graziotin and Kuutila115). This rapid evolution makes for difficulties in generalising the applicability of sentiment analysis methods for future use, as a method that was useful 5 years ago may no longer be useful due to the constant changes and improvements, as well as the changes in social media. Additionally, being able to filter out inherent (direct or indirect) biases in the training datasets has been and will continue to be one of the biggest challenges in utilising machine learning techniques. Only ten out of the thirty-seven papers in this review discussed potential biases in the social media data themselves or the sentiment analysers and potential methods to overcome the bias. Today, there is much higher awareness of these potential biases, and new guidelines are being outlined to mitigate bias(Reference Manyika, Silberg and Presten116).

While it is useful to understand the sentiment of text to understand people’s emotions behind differing topics, to gain a deeper understanding of the text data, other NLP and subjectivity analysis techniques (which include sentiment and emotion analysis) can be used in conjunction as well as quantitative social media analytics (Fig. 4). In this review, most papers did not just conduct sentiment analysis but also looked at other subjectivity analysis and NLP techniques such as topic modelling, topic evolution, sentiment evolution, emotion analysis and network analysis. Sentiment analysis is limited in its classifications usually on a three- or five-point scale for positive or very positive to negative or very negative(Reference Liu28), which may not be indicative of more complex emotions. Emotion analysis, which was conducted in only three studies in the current review, goes further than positive and negative to classifying text into, for example, eight emotions; anger, disgust, fear, joy, sadness, trust, anticipation and surprise on the basis of emotions defined by Plutchik(Reference Plutchik106). Emotion analysis is considered more complicated than sentiment analysis but has been successfully performed using similar techniques to sentiment analysis, such as using a neural network(Reference Colnerič and Demšar105) or other machine learning techniques. Topic modelling was the most commonly performed additional analysis which creates topics using probabilistic algorithm methods such as LDA. Topic modelling can be useful to group similar text-based data into themes to understand and summarise large text-based datasets in a more nuanced way and to explore the relationships between themes and changes over time through topic evolution(Reference Blei117). Network analysis was also used in three papers(Reference Bridge, Flint and Tench58,Reference Pérez-Pérez, Perez-Rodriguez and Fdez-Riverola65,Reference Meza and Yamanaka83) , which is a useful way to explore relationships between social media users and how the content or people on social media discussing the topic are interconnected(Reference Prell30). Those papers which focused on developing a sentiment analysis technique and comparing different algorithms for the most accurate prediction did not often conduct other analyses. Looking beyond sentiment resulted in a more in-depth view of the data, what topics were being discussed, changes in sentiment/topic over time and with different events in time and the community and the influencers that were discussing their topic.

Fig. 4. An overview of social media data analysis techniques which were used across studies in combination with sentiment or emotion analysis to provide more nuanced insights into social media data.

Social media as a data source provides a unique view into unfiltered real-time conversations that are constantly evolving(Reference Morstatter and Liu118). Because of its breadth in terms of topics, it is useful for exploratory and discovery research(Reference Zhang, Cao and Wang119). However, there are limitations to using social media as a data source for research. Social media platforms are not necessarily representative of the general population(Reference Ribeiro, Benevenuto and Zagheni120) or the data sample collected may not be representative of what is actually being said overall on the platform(Reference Morstatter and Liu118). In this current review, Twitter was the most commonly used platform. Twitter and Facebook users have been found to be generally younger and with a higher education level than non-users and are more likely to be interested in politics particularly with more left-leaning political beliefs(Reference Mellon and Prosser121). There are limitations in the ability to access certain data, with social media APIs having restrictions in the amount of data collected and sometimes being accessible only by organisations with partnerships with the platform or for a cost. Twitter has an accessible and free API, which may be a reason behind it being the most commonly used platform in this review and amongst previous research(Reference Zunic, Corcoran and Spasic51). However, the Twitter API also has biases in what data you can retrieve as the Twitter Streaming API provides only a sample of the data to use(Reference Morstatter and Liu118). The amount of data you can collect particularly using a free API is sometimes limited, with geo-location and other demographics of the users such as age and gender not always available as users have the option to switch precise location on or off, with the default being off(Reference Cesare, Grant and Hawkins122). Finally, it is important to note the potential ethical implications of using social media data, with only six studies in the review discussing ethics or privacy of social media data. While only publicly available social media posts are used and social media users agree to their data being used for research purposes through the user agreements, the users may not know exactly what their data are being used for(Reference Golder, Ahmed and Norman123). There are potential risks to privacy and confidentiality and, therefore, it is imperative that careful consideration be taken to the ethical concerns of using these data(Reference Benton, Coppersmith and Dredze124) and that the potential benefits of the research outweigh any potential harm(Reference Hammack125).

Limitations

Limitations of this review include not undertaking a quality assessment of the included papers as this was a systematic scoping review(Reference Munn, Peters and Stern54). The study also included conference proceedings as they are widely used in the computer science field, as well as journal articles which differ in their reporting requirements and quality. However, some papers may have been missed due to the specific databases and search terms used. Only papers published in English were included, and therefore the results may be affected by information bias. We collected papers around nutrition, food and cooking broadly, and due to the heterogeneity in topics published in this area, there are limited conclusions about accuracy of using sentiment analysis in specific areas of nutrition. Both sentiment analysis and social media are rapidly evolving fields, and therefore the scoping review captures the area at only one specific point in time.

Recommendations for future research

On the basis of our experiences during the data extraction and synthesis of results, we recommend that future research utilising sentiment analysis, or more generally research on subjectivity analysis, could benefit from the following:

  1. 1. Interdisciplinary teams including those from computer science and subject-specific experts, especially subject matter experts, should be involved in the refinement of the sentiment lexicons and interpretation of the findings;

  2. 2. Development of specific sentiment or emotion lexicons related to the topic, as sentiment may differ for words from one topic to another (e.g. ‘heart’ having a neutral sentiment within a medical context while having a positive sentiment in a general context) and analysis of the accuracy of these sentiment analysis techniques with updated lexicons to predict sentiment in that topic area;

  3. 3. A combined use of subjectivity analysers and other techniques such as topic modelling and network analysis to gain a deeper understanding of the data and potential future implications using the data;

  4. 4. Clearer reporting of methodology including social media search terms used to retrieve data, date range of searches, procedures to mitigate bias in training datasets and discussion of ethical practice, particularly in relation to privacy; and

  5. 5. Consideration of the influence of world or local events on the social media conversation across specific date ranges and the change of conversations across time.

Conclusions

Social media data are useful to obtain a more nuanced understanding of what social media users are saying and sharing. However, research needs to go beyond traditional quantitative social media metrics such as likes and comments and incorporate a range of subjectivity analysis and NLP methods. Owing to the large volume of social media data, automated analysis techniques are needed. Sentiment analysis methods have been applied to nutrition-, food- and cooking-related content and had a relatively high accuracy rate for assessing sentiment (in the limited number of papers that assessed accuracy). The high accuracy rate was often due to the authors building their own algorithm which best suited the data, and therefore required expertise in computer science and technology. Open-source and publicly available sentiment analysis methods were used; however, papers which used them often did not test the accuracy of predicting sentiment or the accuracy was low potentially due to the lexicon used being based on a non-nutrition or health context. The meaning behind terms is often subject specific, and therefore subject matter experts (e.g. in nutrition) would make sure the textual data analysis is relevant to that topic. While it was shown sentiment analysis can be useful to analyse social media data, papers which used other NLP or machine learning techniques gained a more nuanced understanding of their data beyond sentiment. Interdisciplinary work is the key to successful implementation of machine learning, subjectivity analysis and NLP methods that are rigorous, accurate and relevant to the specific field (e.g. nutrition) and provide practical and societal implications of their findings.

Supplementary material

To view supplementary material for this article, please visit https://doi.org/10.1017/S0954422423000069.

Acknowledgements

We would like to thank Professor Wray Buntine for his assistance with search term development and Monash librarian Mario Sos for his assistance with search term development and database searching techniques.

Financial support

Annika Molenaar and Eva Jenkins are supported by Australian Government Research Training Program Scholarships.

Conflicts of Interest

None declared.

Authorship

Annika Molenaar was involved in the planning of the review, search term development, database searching, assessment of papers for eligibility, data extraction and writing the article. Eva L. Jenkins was involved in the assessment of papers for eligibility and reviewing the written manuscript. Linda Brennan was involved in the planning of the review and reviewing the written transcript. Dickson Lukose was involved in reviewing the written manuscript. Tracy McCaffrey was the senior author involved in planning the review, overseeing the database searching and reviewing the written manuscript.

References

Vos, T, Abajobir, AA, Abate, KH et al. (2017) Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet 390, 12111259. doi: 10.1016/S0140-6736(17)32154-2.CrossRefGoogle Scholar
Firth, J, Gangwisch, JE, Borisini, A et al. (2020) Food and mood: how do diet and nutrition affect mental wellbeing? BMJ 369. doi: 10.1136/bmj.m2382.Google ScholarPubMed
Owen, L & Corfe, B (2017) The role of diet and nutrition on mental health and wellbeing. Proc Nutr Soc 76, 425426. doi: 10.1017/S0029665117001057.CrossRefGoogle ScholarPubMed
Schwartz, MB, Just, DR, Chriqui, JF et al. (2017) Appetite self-regulation: environmental and policy influences on eating behaviors. Obesity 25, S26S38. doi: 10.1002/oby.21770.CrossRefGoogle ScholarPubMed
Penders, B (2018) Why public dismissal of nutrition science makes sense: post-truth, public accountability and dietary credibility. Br Food J. doi: 10.1108/BFJ-10-2017-0558.CrossRefGoogle ScholarPubMed
Penders, B, Wolters, A, Feskens, EF e t al. (2017) Capable and Credible? Challenging Nutrition Science. Eur J Nutr 2017, 56(6), 20092012. doi: 10.1007/s00394-017-1507-y.CrossRefGoogle ScholarPubMed
Adamski, M, Truby, H, Klassen, MK et al. (2020) Using the internet: nutrition information-seeking behaviours of lay people enrolled in a massive online nutrition course. Nutrients 12, 750. doi: 10.3390/nu12030750.CrossRefGoogle Scholar
Lynn, T, Rosati, P, Leoni Santos, G et al. (2020) Sorting the healthy diet signal from the social media expert noise: preliminary evidence from the healthy diet discourse on twitter. Int J Environ Res Public Health 17, 8557. doi: 10.3390/ijerph17228557.CrossRefGoogle ScholarPubMed
Vaterlaus, JM, Patten, EV, Roche, C et al. (2015) #Gettinghealthy: the perceived influence of social media on young adult health behaviors. Comput Human Behav 45, 151157. doi: 10.1016/j.chb.2014.12.013.CrossRefGoogle Scholar
Jenkins, EL, Ilicic, J, Barklamb, AM et al. (2020) Assessing the credibility and authenticity of social media content for applications in health communication: scoping review. J Med Internet Res 22, e17296. doi: 10.2196/17296.CrossRefGoogle ScholarPubMed
Sloan, L & Quan-Haase, A (2017) The SAGE Handbook of Social Media Research Methods. Thousand Oaks, CA: SAGE.Google Scholar
Carrotte, ER, Prichard, I & Lim, MSC (2017) “Fitspiration” on social media: a content analysis of gendered images. J Med Internet Res 19, e95. doi: 10.2196/jmir.6368.CrossRefGoogle ScholarPubMed
Amidor, T (2020) Popular Diet Trends: Instagram Diet Trends. https://www.todaysdietitian.com/newarchives/0520p16.shtml (accessed April 12 2021).Google Scholar
Ramachandran, D, Kite, J, Vassallo, AJ et al. (2018) Food trends and popular nutrition advice online–implications for public health. Online J Public Health Inf 10. doi: 10.5210/ojphi.v10i2.9306.Google Scholar
Carpenter, DM, Geryk, LL, Chen, AT et al. (2016) Conflicting health information: a critical research need. Health Expect 19, 11731182. doi: 10.1111/hex.12438.CrossRefGoogle ScholarPubMed
Freeman, B, Kelly, B, Vandevijvere, S et al. (2016) Young adults: beloved by food and drink marketers and forgotten by public health? Health Promot Int 31, 954961. doi: 10.1093/heapro/dav081.Google ScholarPubMed
Lofft, Z (2020) When social media met nutrition: how influencers spread misinformation, and why we believe them. Health Sci Inq 11, 5661. doi: 10.29173/hsi319.CrossRefGoogle Scholar
Wang, Y, McKee, M, Torbica, A et al. (2019) Systematic literature review on the spread of health-related misinformation on social media. Soc Sci Med 240, 112552. doi: 10.1016/j.socscimed.2019.112552.CrossRefGoogle ScholarPubMed
IBM Cloud Education (2020) Application Programming Interface (API). https://www.ibm.com/cloud/learn/api (accessed Apr 13 2021).Google Scholar
Kok, JN, Boers, EJ, Kosters, WA et al. (2009) Artificial intelligence: definition, trends, techniques, and cases. Artif Intell 1, 270299.Google Scholar
Li, S (2018) Topic Modeling and Latent Dirichlet Allocation (LDA) in Python. https://towardsdatascience.com/topic-modeling-and-latent-dirichlet-allocation-in-python-9bf156893c24 (accessed June 17 2021).Google Scholar
Guthrie, L, Pustejovsky, J, Wilks, Y et al. (1996) The role of lexicons in natural language processing. Commun ACM 39, 6372. doi: 10.1145/234173.234204.CrossRefGoogle Scholar
Pennebaker, JW, Boyd, RL, Jordan, K et al. (2015) The Development and Psychometric Properties of LIWC2015. Austin: University of Texas at Austin.Google Scholar
SAS (2021) Machine Learning What It Is and Why It Matters. https://www.sas.com/en_au/insights/analytics/machine-learning.html (accessed June 17 2021).Google Scholar
Misra, S & Li, H (2019) Noninvasive fracture characterization based on the classification of sonic wave travel times. Mach Learn Subsurf Charact, 243287. doi: 10.1016/b978-0-12-817736-5.00009-0.Google Scholar
Hirschberg, J & Manning, CD (2015) Advances in natural language processing. Science 349, 261266. doi: 10.1126/science.aaa8685.CrossRefGoogle ScholarPubMed
Goldberg, Y (2017) Neural network methods for natural language processing. Synth Lect Human Lang Technol 10, 1309.CrossRefGoogle Scholar
Liu, B (2012) Sentiment analysis and opinion mining. Synth Lect Human Lang Technol 5, 1167.CrossRefGoogle Scholar
Klassen, KM, Borleis, ES, Brennan, L et al. (2018) What people “like”: analysis of social media strategies used by food industry brands, lifestyle brands, and health promotion organizations on Facebook and Instagram. J Med Internet Res 20, e10227. doi: 10.2196/10227.CrossRefGoogle ScholarPubMed
Prell, C (2012) Social Network Analysis: History, Theory and Methodology. Thousand Oaks, CA: SAGE.Google Scholar
Cambridge University Press (2008) Stemming and Lemmatization. https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html (accessed June 17 2021).Google Scholar
Chang, C-C & Lin, C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syste Technol (TIST) 2, 127. doi: 10.1145/1961189.1961199.CrossRefGoogle Scholar
Nguyen, E (2014) Chapter 4 – text mining and network analysis of digital libraries in R. In Data Mining Applications with R, pp. 95115 [Zhao, Y and Cen, Y, editors]. Boston, MA: Academic Press.CrossRefGoogle Scholar
McDonald, L, Malcolm, B, Ramagopalan, S et al. (2019) Real-world data and the patient perspective: the PROmise of social media? BMC Med 17, 15. doi: 10.1186/s12916-018-1247-8.CrossRefGoogle ScholarPubMed
Pew Research Center (2021) Social Media Use in 2021. https://www.pewresearch.org/internet/2021/04/07/social-media-use-in-2021/ (accessed June 17 2021).Google Scholar
Eurostat (2019) Are you using social networks? https://ec.europa.eu/eurostat/web/products-eurostat-news/-/EDN-20190629-1 (accessed June 17 2021).Google Scholar
Yellow (2020) Yellow social media report 2020 – consumers. https://www.yellow.com.au/social-media-report/ (accessed June 17 2021).Google Scholar
Chau, MM, Burgermaster, M & Mamykina, L (2018) The use of social media in nutrition interventions for adolescents and young adults—a systematic review. Int J Med Inf 120, 7791. doi: 10.1016/j.ijmedinf.2018.10.001.CrossRefGoogle Scholar
Klassen, KM, Douglass, CH, Brennan, L et al. (2018) Social media use for nutrition outcomes in young adults: a mixed-methods systematic review. Int J Behav Nutr Phys Act 15, 118. doi: 10.1186/s12966-018-0696-y.CrossRefGoogle ScholarPubMed
Stirling, E, Willcox, J, Ong, K-L et al. (2020) Social media analytics in nutrition research: a rapid review of current usage in investigation of dietary behaviours. Public Health Nutr, 142. doi: 10.1017/S1368980020005248.Google ScholarPubMed
Barklamb, AM, Molenaar, A, Brennan, L et al. (2020) Learning the language of social media: a comparison of engagement metrics and social media strategies used by food and nutrition-related social media accounts. Nutrients 12, 2839. doi: 10.3390/nu12092839.CrossRefGoogle ScholarPubMed
Paul, MJ, Sarker, A, Brownstein, JS et al. (2016) Social media mining for public health monitoring and surveillance. Biocomput 2016: Proc Pacific Symp, 468479. doi: 10.1142/9789814749411_0043.CrossRefGoogle Scholar
Farzindar, A & Inkpen, D (2017) Natural language processing for social media. Synth Lect Human Lang Technol 10, 1195. doi: 10.2200/S00809ED2V01Y201710HLT038.CrossRefGoogle Scholar
Hutto, CJ & Gilbert, E (2014) VADER: a parsimonious rule-based model for sentiment analysis of social media text. Proc Int AAAI Conf Web Soc Media, 8(1), 216225. doi: 10.1609/icwsm.v8i1.14550.CrossRefGoogle Scholar
Zainuddin, N & Selamat, A (2014) Sentiment analysis using support vector machine. 2014 Int Conf Comput Commun Control Technol (I4CT), 333337. doi: 10.1109/I4CT.2014.6914200.CrossRefGoogle Scholar
Narayanan, V, Arora, I & Bhatia, A (2013) Fast and accurate sentiment classification using an enhanced Naive Bayes model. Int Conf Intell Data Eng Automat Learn, 194201. doi: 10.48550/arXiv.1305.6143.Google Scholar
Ain, QT, Ali, M, Riaz, A et al. (2017) Sentiment analysis using deep learning techniques: a review. Int J Adv Comput Sci Appl 8, 424. doi: 10.14569/IJACSA.2017.080657.Google Scholar
Fang, X & Zhan, J (2015) Sentiment analysis using product review data. J Big Data 2, 114. doi: 10.1186/s40537-015-0015-2.CrossRefGoogle Scholar
Ramteke, J, Shah, S, Godhia, D et al. (2016) Election result prediction using Twitter sentiment analysis. 2016 Int Conf Invent Comput Technol (ICICT) 1, 15. doi: 10.1109/INVENTIVE.2016.7823280.Google Scholar
Gohil, S, Vuik, S & Darzi, A (2018) Sentiment analysis of health care tweets: review of the methods used. JMIR Public Health Surveill 4, e43. doi: 10.2196/publichealth.5789.CrossRefGoogle ScholarPubMed
Zunic, A, Corcoran, P & Spasic, I (2020) Sentiment analysis in health and well-being: systematic review. JMIR Med Inf 8, e16023. doi: 10.2196/16023.CrossRefGoogle ScholarPubMed
Page, MJ, McKenzie, JE, Bossuyt, PM et al. (2021) Updating guidance for reporting systematic reviews: development of the PRISMA 2020 statement. J Clin Epidemiol. doi: 10.1016/j.jclinepi.2021.02.003.Google ScholarPubMed
Tricco, AC, Lillie, E, Zarin, W et al. (2018) PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 169, 467. doi: 10.7326/M18-0850.CrossRefGoogle ScholarPubMed
Munn, Z, Peters, MD, Stern, C et al. (2018) Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med Res Methodol 18, 17. doi: 10.1186/s12874-018-0611-x.CrossRefGoogle ScholarPubMed
Moher, D, Liberati, A, Tetzlaff, J et al. (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6, e1000097. doi: 10.1136/bmj.b2535.CrossRefGoogle ScholarPubMed
Pugsee, P & Niyomvanich, M (2015) Comment analysis for food recipe preferences. 2015 12th Int Conf Electr Eng/Electron, Comput, Telecommun Inf Technol (ECTI-CON), 14. doi: 10.1109/ECTICon.2015.7207119.CrossRefGoogle Scholar
Mazzocut, M, Truccolo, I, Antonini, M et al. (2016) Web conversations about complementary and alternative medicines and cancer: content and sentiment analysis. J Med Internet Res 18, e120. doi: 10.2196/jmir.5521.CrossRefGoogle ScholarPubMed
Bridge, G, Flint, SW & Tench, R (2021) A mixed-method analysis of the #SugarTax debate on Twitter. Public Health Nutr 24, 35373546. doi: 10.1017/S1368980021000938.CrossRefGoogle ScholarPubMed
Brzustewicz, P & Singh, A (2021) Sustainable consumption in consumer behavior in the time of COVID-19: topic modeling on Twitter data using LDA. Energies 14, 5787. doi: 10.3390/en14185787.CrossRefGoogle Scholar
Dixon, N, Jakić, B, Lagerweij, R et al. (2012) FoodMood: measuring global food sentiment one tweet at a time. Proc Int AAAI Conf Web Soc Media 6, 27.CrossRefGoogle Scholar
Dondokova, A, Aich, S, Hee-Cheol, K et al. (2019) A text mining approach to study individuals’ food choices and eating behavior using twitter feeds. Front Comput Theory, Technol Appl (FC 2018), 3–6 July 2018, 520527. doi: 10.1007/978-981-13-3648-5_60.Google Scholar
Jennings, L, Danforth, CM, Dodds, PS et al. (2019) Exploring perceptions of veganism. arXiv, 21.Google Scholar
Kang, Y, Wang, Y, Zhang, D et al. (2017) The public’s opinions on a new school meals policy for childhood obesity prevention in the U.S.: a social media analytics approach. Int J Med Inf 103, 8388. doi: 10.1016/j.ijmedinf.2017.04.013.CrossRefGoogle ScholarPubMed
Kashyap, R & Nahapetian, A (2014) Tweet analysis for user health monitoring. 2014 4th Int Conf Wireless Mobile Commun Health, 348351. doi: 10.4108/icst.mobihealth.2014.257537.CrossRefGoogle Scholar
Pérez-Pérez, M, Perez-Rodriguez, G, Fdez-Riverola, F et al. (2019) Using Twitter to understand the human bowel disease community: exploratory analysis of key topics. J Med Internet Res 21, e12610. doi: 10.2196/12610.CrossRefGoogle ScholarPubMed
Pindado, E & bBarrena, R (2020) Using Twitter to explore consumers’ sentiments and their social representations towards new food trends. Br Food J. doi: 10.1108/BFJ-03-2020-0192.Google Scholar
Rintyarna, BS, Salamatu, M, Nazmona, M et al. (2021) Mapping acceptance of Indonesian organic food consumption under COVI D-19 pandemic using sentiment analysis of Twitter dataset. J Theor Appl Inf Technol 99, 10091019.Google Scholar
Saura, JR, Reyes-Menendez, A & Thomas, SB (2020) Gaining a deeper understanding of nutrition using social networks and user-generated content. Internet Interv 20, 100312. doi: 10.1016/j.invent.2020.100312.CrossRefGoogle ScholarPubMed
Scott, D, Jihwan, O, Chappelka, M et al. (2018) Food for thought: analyzing public opinion on the supplemental nutrition assistance program. J Technol Human Serv 36, 3747. doi: 10.1080/15228835.2017.1416514.CrossRefGoogle Scholar
Shadroo, S, Nejad, MY, Bali, AO et al. (2020) A comparison and analysis of the Twitter discourse related to weight loss and fitness. Netw Model Anal Health Inf Bioinf 9. doi: 10.1007/s13721-020-00228-9.Google Scholar
Shaw, G Jr & Karami, A (2017) Computational content analysis of negative tweets for obesity, diet, diabetes, and exercise. Proc Assoc Inf Sci Technol 54, 357365. doi: 10.1002/pra2.2017.14505401039.CrossRefGoogle Scholar
Singh, A & Glińska-Neweś, A (2022) Modeling the public attitude towards organic foods: a big data and text mining approach. J Big Data 9, 121. doi: 10.1186/s40537-021-00551-6.CrossRefGoogle ScholarPubMed
Sprogis, U & Rikters, M (2020) What can we learn from almost a decade of food tweets. Proc 9th Conf Human Lang Technol – Baltic Perspect (Baltic HLT 2020) 328, 191198. doi: 10.48550/arXiv.2007.05194.Google Scholar
Surjandari, I, Naffisah, MS & Prawiradinata, MI (2015) Text mining of Twitter data for public sentiment analysis of staple foods price changes. J Ind Intell Inf 3, 253257. doi: 10.12720/jiii.3.3.253-257.Google Scholar
Vydiswaran, VGV, Romero, DM, Zhao, X et al. (2018) “Bacon bacon bacon”: food-related tweets and sentiment in Metro Detroit. Proc Twelfth Int AAAI Conf Web Soc Media (ICWSM 2018) 12, 692695.Google Scholar
Vydiswaran, VGV, Romero, DM, Zhao, X et al. (2020) Uncovering the relationship between food-related discussion on Twitter and neighborhood characteristics. J Am Med Inf Assoc: JAMIA 27, 254264. doi: 10.1093/jamia/ocz181.CrossRefGoogle ScholarPubMed
Widener, MJ & Li, W (2014) Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US. Appl Geogr 54, 189197. doi: 10.1016/j.apgeog.2014.07.017.CrossRefGoogle Scholar
Yeruva, VK, Junaid, S, Lee, Y et al. (2017) Exploring social contextual influences on healthy eating using big data analytics. 2017 IEEE Int Conf Bioinf Biomed (BIBM), 15071514. doi: 10.1109/BIBM.2017.8217885.CrossRefGoogle Scholar
Benkhelifa, R, Laallam, FZ, Mostafa, M et al. (2018) Opinion extraction and classification of real-time YouTube cooking recipes comments. Int Conf Adv Mach Learn Technol Appl (AMLTA2018) 723, 395404. doi: 10.1007/978-3-319-74690-6_39.Google Scholar
Benkhelifa, R, Bouhyaoui, N & Laallam, FZ (2019) A real-time aspect-based sentiment analysis system of YouTube cooking recipes. Studies Comput Intell 801, 233251. doi: 10.1007/978-3-030-02357-7_11.CrossRefGoogle Scholar
Donthula, SK & Kaushik, A (2019) Man is what he eats: a research on Hinglish sentiments of YouTube cookery channels using deep learning. Int J Recent Technol Eng 8, 930937. doi: 10.35940/ijrte.B1153.0982S1119.Google Scholar
Kaur, G, Kaushik, A & Sharma, S (2019) Cooking is creating emotion: a study on Hinglish sentiments of YouTube cookery channels using semi-supervised approach. Big Data Cognit Comput 3, 119. doi: 10.3390/bdcc3030037.CrossRefGoogle Scholar
Meza, XV & Yamanaka, T (2020) Food communication and its related sentiment in local and organic food videos on YouTube. J Med Internet Res 22. doi: 10.2196/16761.Google Scholar
Shah, SR, Kaushik, A, Sharma, S et al. (2020) Opinion-mining on Marglish and Devanagari comments of YouTube cookery channels using parametric and non-parametric learning models. Big Data Cognit Comput 4, 119. doi: 10.3390/bdcc4010003.CrossRefGoogle Scholar
Zhou, Q & Zhang, C (2017) Detecting dietary preference of social media users in China via sentiment analysis. Proc Assoc Inf Sci Technol 54, 5-pp. doi: 10.1002/pra2.2017.14505401062.CrossRefGoogle Scholar
Zhou, Q & Zhang, C (2018) Detecting users’ dietary preferences and their evolutions via Chinese social media. J Database Manag 29, 89110. doi: 10.4018/JDM.2018070105.CrossRefGoogle Scholar
Pilař, L, Stanislavská, LK, Rojík, S et al. (2018) Customer experience with organic food: global view. Emirates J Food Agric 30, 918926. doi: 10.9755/ejfa.2018.v30.i11.1856.Google Scholar
Rivera, I, Warren, J & Curran, J (2016) Quantifying mood, content and dynamics of health forums. Proc Australas Comput Sci Week Multiconf 67, 110. doi: 10.1145/2843043.2843379.Google Scholar
Cheng, X, Lin, S-Y, Wang, K et al. (2021) Healthfulness assessment of recipes shared on pinterest: natural language processing and content analysis. J Med Internet Res 23, e25757. doi: 10.2196/25757.CrossRefGoogle ScholarPubMed
Kim, AR, Park, HA & Song, TM (2017) Development and evaluation of an obesity ontology for social big data analysis. Healthc Inf Res 23, 159168. doi: 10.4258/hir.2017.23.3.159.CrossRefGoogle ScholarPubMed
Kim, J & Oh, U (2019) EmoWei: emotion-oriented personalized weight management system based on sentiment analysis. 2019 IEEE 20th Int Conf Inf Reuse Integr Data Sc (IRI), 30 July1 Aug 2019, 342349. doi: 10.1109/IRI.2019.00060.CrossRefGoogle Scholar
Masih, J (2021) Understanding health-foods consumer perception using big data analytics. J Manag Inf Decis Sci 24, 115.Google Scholar
Ramsingh, J & Bhuvaneswari, V (2018) An efficient Map Reduce-Based Hybrid NBC-TFIDF algorithm to mine the public sentiment on diabetes mellitus – a big data approach. J King Saud Univ – Comput Inf Sci. doi: 10.1016/j.jksuci.2018.06.011.Google Scholar
Yeruva, VK, Junaid, S & Lee, Y (2019) Contextual word embeddings and topic modeling in healthy dieting and obesity. J Healthc Inf Res 3, 159183. doi: 10.1007/s41666-019-00052-5.CrossRefGoogle ScholarPubMed
Jockers, M (2020) Introduction to the Syuzhet Package. https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html (accessed May 20 2022).Google Scholar
Go, A, Bhayani, R & Huang, L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, 2009.Google Scholar
Hu, M & Liu, B (2004) Mining and summarizing customer reviews. Proceedings Tenth ACM SIGKDD Int Conf Knowl Discov Data Mining, 168177. doi: 10.1145/1014052.1014073.CrossRefGoogle Scholar
Paltoglou, G & Thelwall, M (2012) Twitter, MySpace, Digg: unsupervised sentiment analysis in social media. ACM Trans Intell Syst Technol 3. doi: 10.1145/2337542.2337551.CrossRefGoogle Scholar
Baccianella, S, Esuli, A & Sebastiani, F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. Lrec 10, 22002204.Google Scholar
Bird, S, Kelin, E & Loper, E (2009) Natural Language Processing with Python. Sebastopol, CA: O’Reilly.Google Scholar
Pinnis, M (2018) Latvian Tweet corpus and investigation of sentiment analysis for Latvian. Human Lang Technol –Baltic Perspect 307, 112119. doi: 10.3233/978-1-61499-912-6-112.Google Scholar
Kudo, T & Richardson, J (2018) SentencePiece: a simple and language independent subword tokenizer and detokenizer for Neural Text Processing. arXiv:180806226 [cs]. doi: 10.48550/arXiv.1808.06226.Google Scholar
Thelwall, M, Buckley, K & Paltoglou, G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63, 163173. doi: 10.1002/asi.21662.CrossRefGoogle Scholar
Gee Whiz Labs Inc (2011) Adjective list. http://www.keepandshare.com/doc/12894/adjective-list (accessed May 20 2022).Google Scholar
Colnerič, N & Demšar, J (2018) Emotion recognition on twitter: comparative study and training a unison model. IEEE Trans Affect Comput 11, 433446. doi: 10.1109/TAFFC.2018.2807817.CrossRefGoogle Scholar
Plutchik, R (1990) Emotions and psychotherapy: a psychoevolutionary perspective. In Emotion, Psychopathology, and Psychotherapy, pp. 341 [R Plutchik and H Kellerman]. Cambridge, MA: Academic Press.CrossRefGoogle Scholar
Manning, CD, Surdeanu, M, Bauer, J et al. (2014) The Stanford CoreNLP natural language processing toolkit. Proc 52nd Annu Meet Assoc Comput Linguist: System Demonst, 5560. Baltimore, MD: Association for Computational Linguistics.CrossRefGoogle Scholar
Loria, S, Keen, P, Honnibal, M et al. (2014) TextBlob 0.16.0 documentation. In Textblob: Simplified Text Processing. https://textblob.readthedocs.io/en/dev/index.html Google Scholar
Shivaprasad, T & Shetty, J (2017) Sentiment analysis of product reviews: a review. 2017 Int Conf Invent Commun Comput Technol (ICICCT), 298301. doi: 10.1109/ICICCT.2017.7975207.CrossRefGoogle Scholar
Bentivogli, L, Forner, P, Magnini, B et al. (2004) Revising the wordnet domains hierarchy: semantics, coverage and balancing. Proc Workshop Multiling Linguist Resour, 94101.Google Scholar
Ligthart, A, Catal, C & Tekinerdogan, B (2021) Systematic reviews in sentiment analysis: a tertiary study. Artif Intell Rev, 157. doi: 10.1007/s10462-021-09973-3.Google Scholar
Hao, Y, Mu, T, Hong, R et al. (2019) Cross-domain sentiment encoding through stochastic word embedding. IEEE Trans Knowl Data Eng 32, 19091922. doi: 10.1109/TKDE.2019.2913379.CrossRefGoogle Scholar
McDonald, N, Schoenebeck, S & Forte, A (2019) Reliability and inter-rater reliability in qualitative research: Norms and guidelines for CSCW and HCI practice. Proc ACM Human-Comput Interact 3, 123. doi: 10.1145/3359174.Google Scholar
Caliskan, A, Bryson, JJ & Narayanan, A (2017) Semantics derived automatically from language corpora contain human-like biases. Science 356, 183186. doi: 10.1126/science.aal4230.CrossRefGoogle ScholarPubMed
Mäntylä, MV, Graziotin, D & Kuutila, M (2018) The evolution of sentiment analysis—a review of research topics, venues, and top cited papers. Comput Sci Rev 27, 1632. doi: 10.1016/j.cosrev.2017.10.002.CrossRefGoogle Scholar
Manyika, J, Silberg, J & Presten, B (2019) What do we do about the biases in AI? https://hbr.org/2019/10/what-do-we-do-about-the-biases-in-ai (accessed July 5 2021).Google Scholar
Blei, DM (2012) Probabilistic topic models. Commun ACM 55, 7784. doi: 10.1145/2133806.2133826.CrossRefGoogle Scholar
Morstatter, F & Liu, H (2017) Discovering, assessing, and mitigating data bias in social media. Online Soc Netw Media 1, 113. doi: 10.1016/j.osnem.2017.01.001.CrossRefGoogle Scholar
Zhang, Y, Cao, B, Wang, Y et al. (2020) When public health research meets social media: knowledge mapping from 2000 to 2018. J Med Internet Res 22, e17582. doi: 10.2196/17582.CrossRefGoogle ScholarPubMed
Ribeiro, FN, Benevenuto, F & Zagheni, E (2020) How biased is the population of Facebook users? Comparing the demographics of Facebook users with census data to generate correction factors. 12th ACM Conf Web Sci, 325334. doi: 10.48550/arXiv.2005.08065.CrossRefGoogle Scholar
Mellon, J & Prosser, C (2017) Twitter and Facebook are not representative of the general population: political attitudes and demographics of British social media users. Res Politics 4, 2053168017720008. doi: 10.1177/2053168017720008.CrossRefGoogle Scholar
Cesare, N, Grant, C, Hawkins, JB et al. (2017) Demographics in social media data for public health research: does it matter? arXiv preprint arXiv:171011048. doi: https://doi.org/10.48550/arXiv.1710.11048.Google Scholar
Golder, S, Ahmed, S, Norman, G et al. (2017) Attitudes toward the ethics of research using social media: a systematic review. J Med Internet Res 19, e195. doi: 10.2196/jmir.7082.CrossRefGoogle ScholarPubMed
Benton, A, Coppersmith, G & Dredze, M (2017) Ethical research protocols for social media health research. Proc First ACL Workshop Ethics Nat Lang Process, 94102. doi: 10.18653/v1/W17-1612.CrossRefGoogle Scholar
Hammack, CM (2019) Ethical use of social media data: beyond the clinical context. Hastings Center Report 49, 4042. doi: 10.1002/hast.979.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Glossary of terms

Figure 1

Table 2. PICOTS summary table

Figure 2

Fig. 1. PRISMA flow diagram of systematic scoping review on sentiment analysis and data science to assess the language of nutrition-, food- and cooking-related content on social media(55).

Figure 3

Table 3. Characteristics of studies by social media platform

Figure 4

Fig. 2. Themes and sub-themes of topics across studies.

Figure 5

Table 4. Social media data collection, sentiment analysis techniques and key findings by social media platform

Figure 6

Fig. 3. Proportion of sentiment classifications (positive, negative, neutral) across studies by social media platform.

Figure 7

Fig. 4. An overview of social media data analysis techniques which were used across studies in combination with sentiment or emotion analysis to provide more nuanced insights into social media data.

Supplementary material: File

Molenaar et al. supplementary material

Molenaar et al. supplementary material 1

Download Molenaar et al. supplementary material(File)
File 18.8 KB
Supplementary material: File

Molenaar et al. supplementary material

Molenaar et al. supplementary material 2

Download Molenaar et al. supplementary material(File)
File 147.8 KB