Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we reviewed research supported by Coca-Cola funding using the Web of Science Core Collection database(
). Starting in 2008 Thomson Reuters added funding acknowledgement and competing interest statements to all the bibliographic records of the Science Citation Index Expanded. This retrieves the funding/competing interest paratextual information in the published version of an article as well as distinguishing between a conflict of interest and a funding statement, and identifies the entities that are acknowledged as providing funding for the article – saving the user from having to read the statements and identify the funding sources manually. These changes, in contrast to other existing databases, now enable users to search the database for text strings (e.g. names of corporations) in the funding acknowledgement section, either as a funding agency or simply as part of a declared conflict of interest.
To our knowledge, Web of Science is the only bibliographic database to index this information on a large scale (Scopus developed a similar algorithm, but with a considerably lower coverage of publications and for a shorter time period; and, more recently, PubMed started adding this information to the metadata of the publication records it indexes).
To retrieve metadata from the literature searched, we developed a web scraping tool that crawls the URL address of any search run in the Core Collection database of Web of Science. Our algorithm, written for R software, runs sequentially over each study page in the search results, parses the HTML code and scrapes user-defined fields for each publication (e.g. title, abstract, authors and affiliated institutions), including the funding/competing interest statement and a table compiled by Web of Science that lists all the entities that provided funding for the article, as reported by the authors (the R script for the algorithm is provided in online supplementary material 2).
We searched for all studies that included the string ‘cola’ in the ‘funding text’ field, which indexes the entire funding acknowledgement section as reported in the published manuscript (see Fig. 1 and Appendix 1 for search strategy). This broad search strategy identified 779 articles, published between 2008 and June 2016, and included articles that acknowledged both direct funding and competing interests involving The Coca-Cola Company and all its subsidiaries. In addition, the broad search term ‘cola’ also yielded studies funded by other companies, such as ‘Pepsi-Cola’.
Fig. 1 PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram for the present systematic review. This PRISMA diagram describes the study selection steps and identifies at what stage we arrived at analytical Samples 1 and 2. Seven hundred and seventy-nine records (i.e. publications) were retrieved from a search on the ‘funding text’ field in Web of Science for any mention of ‘Cola’. From these records, 318 were excluded for not meeting the screening criteria (i.e. the study acknowledging direct receipt of funding from Coca-Cola). After exclusion, we arrive at Sample 1, which contains all studies funded by the Coca-Cola brand. We subset from Sample 1 only those studies funded by The Coca-Cola Company, its affiliates in the USA and those subsidiaries that published transparency lists; this leads to the exclusion of seventy-two studies for not meeting the eligibility criteria, which gives us Sample 2
The questions set out above make an implicit separation between the research funding activities of The Coca-Cola Company and those of the Coca-Cola brand, which includes all subsidiaries and bottlers around the world. With this distinction in mind, we constructed two analytical samples.
The first sample is less restrictive, composed of all studies directly funded by any company or institute part of the Coca-Cola brand that were retrieved from our search. This sample is used to answer question 2 (‘How many studies and authors are funded by the Coca-Cola brand?’).
The second sample is a sub-sample of the first, focusing on those studies funded by The Coca-Cola Company and its philanthropic arms in North America, and by the subsidiaries and bottlers that participated in the ‘Transparency Initiative’. This sample is used to answer the remaining questions (1, 3 and 4).
Below, we describe all steps in the selection of the studies and in the creation of the two analytical samples.
The 779 studies produced by our search were screened for the inclusion of the string ‘Coca-Cola’ (or any possible variant, including affiliates of the main company, such as the Beverage Institute of Health and Wellness(
); see Appendix 1 for a list of all variants) as a funding agency. The goal of the initial screening was to parse through the search results and only keep studies that report direct receipt of funding from The Coca-Cola Company or any affiliates (see Fig. 1).
This criterion thus excluded 318 studies. These were studies where: (i) the authors only declare a competing interest due to previous relationships with The Coca-Cola Company unrelated to research funding (e.g. speaking engagements or consultancy work); (ii) Coca-Cola’s involvement in the publication was indirect (e.g. via student grants); (iii) the authors acknowledge funding from another ‘cola’, such as ‘Pepsi-Cola’; and (iv) where the algorithm used by Web of Science mistakenly included Coca-Cola as a funding agency, when the funding acknowledgement section did not indicate direct funding by the company to that particular study (this was assessed by manually inspecting all funding statements).
To be eligible for our first sample, studies had to acknowledge funding from The Coca-Cola Company or any of its affiliates, including The Coca-Cola Foundation (TCCF, the philanthropic arm of the company), Coca-Cola North America, The Beverage Institute for Health and Wellness (an organization set up by The Coca-Cola Company to support nutrition research)(
) and Coca-Cola bottlers or subsidiaries outside the USA. This criterion comprises the totality of the Coca-Cola brand in our data and did not lead to the removal of any further studies. Sample 1 is therefore comprised of 461 studies.
In the second sample, we imposed stricter eligibility criteria to isolate those studies funded by The Coca-Cola Company and its affiliates in the USA, France, Germany, Spain, New Zealand and Australia, the only countries to release records of their research funding efforts in the form of transparency lists of funded scientific experts, which were released in late 2015 and early 2016 (see full lists in online supplementary material 1).
This criterion excluded seventy-two studies that were funded by subsidiaries or bottlers other than the ones listed above. Sample 2 is thus comprised of 389 studies.
To answer the first question, concerning how comprehensive was Coca-Cola’s transparency initiative, following the revelation of its financial backing of the GEBN, we recreated Coca-Cola’s lists of ‘scientific experts’ and ‘research partnerships’ by carefully following the parameters laid out in Coca-Cola’s transparency disclosure(
), using our own data on funding statements retrieved from Web of Science (Sample 1). We then matched our recreated list to the original ones published on Coca-Cola’s websites(
). This was designed to identify any discrepancy that could, potentially, reflect selective disclosure on the Company’s part.
Coca-Cola included in its ‘Research and Partnerships’ lists the names of academics it funded or collaborated with according to the following criteria (these can be found on the company’s websites)(
): (i) funding agreements sourced exclusively from The Coca-Cola Company, The Coca-Cola Foundation, Coca-Cola North America, Coca-Cola South Pacific, Coca-Cola Australia Foundation, Coca-Cola Oceania, Coca-Cola Germany and Coca-Cola Spain; and (ii) activities and studies conducted between January 2010 and December 2015.
To match these criteria, we started with the 461 studies in Sample 1 and excluded the following: (i) studies published before 2010 and after December 2015; (ii) studies funded by Coca-Cola subsidiaries and bottlers, with the exception of those listed above; (iii) studies written as part of research consortia that were themselves funded by The Coca-Cola Company, since the funding link between the company and the publication is indirect (see online supplementary material 1, Supplemental Table 1 for a complete listing of such consortia); and (iv) authors who were not listed as principal or co-investigators on the Coca-Cola grant in the original funding statement, where this information was made available (unfortunately, most funding statements did not identify the main investigator on the grant). We opted for a conservative method of removing studies to guarantee, to the highest degree possible, an approximation to the way Coca-Cola compiled its own lists of funded researchers.
One hundred and thirty-eight studies did not meet the eligibility criteria and were removed from the matching procedure.
It should be noted that although there is a gap between the time funding is awarded and the publication date of a study, which suggests that we should restrict our parameters to publications from 2012 onwards, it is not clear from the information provided by Coca-Cola that authors of research published in 2010 would not be included in its transparency list. In fact, it is the case that some studies yielding publications in 2010 were still ongoing in subsequent years. Furthermore, a large proportion of authors who published in 2010 also appeared in published research later on, which suggests that projects funded by Coca-Cola were likely to have yielded more than one publication over time. Therefore, the method we designed to match our data to Coca-Cola’s lists includes research published from 2010 onwards.
Notwithstanding, to confirm the validity of our method, we used a sub-sample of studies published between 2012 and 2015 to compare with Coca-Cola’s transparency lists; the results lend further support to our findings using studies published from 2010 onwards (see ‘Limitations of the study’ section below).
After exclusion of ineligible studies, the procedure identified 907 authors, responsible for 331 studies that fit the criteria used by Coca-Cola to compile its lists of funded research partnerships. The combined transparency lists published by Coca-Cola in the USA, UK, Australia, France and Germany (Spain and New Zealand did not contain names of individual researchers) named 218 researchers. We then proceeded with matching the names of the 907 authors we identified in our data to the 218 names of researchers listed by Coca-Cola as recipients of its research funding, using whole and approximate string matching with manual verification of the results.
Figure 2 summarizes this iterative method in a PRISMA-type diagram.
Fig. 2 Flow diagram of the process to match Web of Science data to Coca-Cola’s transparency lists. This flow-type diagram describes: (i) the steps taken to recreate Coca-Cola’s transparency lists using our data; and (ii) the matching of our recreated list to Coca-Cola’s combined transparency lists. We start with all studies in Sample 1 and begin evaluating them against the parameters that governed Coca-Cola’s lists of scientific experts and researchers it funded, and excluding those that failed to meet the eligibility criteria. In the matching stage, we combined the lists of researchers funded by Coca-Cola in North America, UK, Australia, Germany and France and matched these names to those on the list we created using data from Web of Science. The corresponding author on studies with all unmatched names were surveyed via email and asked about Coca-Cola funding
The second question raised above seeks to reveal the universe of scientific literature funded by Coca-Cola. For this question, we focused on the Coca-Cola brand as a whole, not making any distinction between the research funding activities carried out by the main company in the USA and those of its subsidiaries and bottlers around the world.
To address this question, we employed network analysis tools to visually portray the scope of Coca-Cola’s involvement in funding scientific research, and at the same time compare it with the company’s disclosure following its transparency initiative. We built co-authorship networks for all studies that were funded by the Coca-Cola brand between 2008 and 2016. The diagrams show nodes (authors) linked via edges, which represent the co-authorship of a study. A similar approach has been used in the literature combining a systematic review with co-citation networks, instead of co-authorship networks(
Network analysis was paired with text analysis to assess the content of the scientific literature funded by Coca-Cola. In addressing question 3, we shift our focus to the funding endeavours of the Coca-Cola Company and those affiliates that participated in the transparency initiative, and discuss who and what fields of research they funded between 2008 and 2016. We added a new co-authorship network and ran a community search algorithm(
) to uncover highly cohesive subgroups that may indicate the presence of different research hubs throughout the USA (and abroad).
The algorithm calculates betweenness centrality scores for each tie in the network, a metric that counts the number of shortest paths between all pairs of nodes that pass through each tie. In short, it counts how often a tie is used as a ‘bridge’ to connect, in the shortest way possible, any two pair of nodes. It proceeds by removing the tie with the highest score of betweenness, recalculating tie betweenness centrality and iteratively removing ties with the highest betweenness score until the network becomes disconnected into several subgroups. Once it achieves an optimal number of subgroups, the partitioning of the network is complete and it assigns different colours to each subgroup.
This methodology offers valuable insights on the structure and organization of Coca-Cola’s research enterprise, as it furthers our understanding of its centralization, which actors are important and whether research themes or institutions may play a role in its organization. Furthermore, it puts Coca-Cola’s transparency initiative in perspective, both in terms of scope (how complete is the disclosure) and in terms of relevance (whether the authors the company acknowledge as recipients of funding are central or peripheral players in the network).
To better understand the research themes of Coca-Cola’s funded research (the second part of question 3), we examined the abstracts of all 389 articles that met the screening and eligibility criteria that underpinned Sample 2. Using structural topic modelling(
), a variant of the large toolbox of topic modelling estimation methods, generally described as unsupervised machine learning algorithms for probabilistic classification of large text corpora, we uncovered hidden semantic structures, or topics, that give us an insight into the different streams of research that Coca-Cola has funded since 2008.
In a nutshell, topic models estimate latent topics in a bundle of text documents and simultaneously assign the documents to the different topics, probabilistically. The algorithm works on the assumption that a document is composed of a different mixture of topics and estimates the probability distribution of documents to topics; it does this based on the semantic content of each document by leveraging information on the word frequency within and across documents. Thus, documents that share the same semantic structure (i.e. similar distributions of word frequencies) are likely to belong to the same topic.
In online supplementary material 3 we present in greater detail the estimation methods and robustness tests for the models presented here.
In the next section, the results are organized and discussed around each of the research questions set out above.