Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-08T06:15:05.845Z Has data issue: false hasContentIssue false

A data mining approach to investigate food groups related to incidence of bladder cancer in the BLadder cancer Epidemiology and Nutritional Determinants International Study

Published online by Cambridge University Press:  23 April 2020

Evan Y. W. Yu
Affiliation:
Department of Complex Genetics and Epidemiology, School of Nutrition and Translational Research in Metabolism, Maastricht University, Maastricht, The Netherlands
Anke Wesselius*
Affiliation:
Department of Complex Genetics and Epidemiology, School of Nutrition and Translational Research in Metabolism, Maastricht University, Maastricht, The Netherlands
Christoph Sinhart
Affiliation:
Department of Data Science & Knowledge Engineering, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands
Alicja Wolk
Affiliation:
Division of Nutritional Epidemiology, Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
Mariana Carla Stern
Affiliation:
Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
Xuejuan Jiang
Affiliation:
Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
Li Tang
Affiliation:
Department of Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo, NY, USA
James Marshall
Affiliation:
Department of Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo, NY, USA
Eliane Kellen
Affiliation:
Leuven University Centre for Cancer Prevention (LUCK), Leuven, Belgium
Piet van den Brandt
Affiliation:
Department of Epidemiology, Schools for Oncology and Developmental Biology and Public Health and Primary Care, Maastricht University Medical Centre, Maastricht, The Netherlands
Chih-Ming Lu
Affiliation:
Department of Urology, Buddhist Dalin Tzu Chi General Hospital, Dalin Township 62247, Chiayi County, Taiwan
Hermann Pohlabeln
Affiliation:
Leibniz Institute for Prevention Research and Epidemiology-BIPS, Bremen, Germany
Gunnar Steineck
Affiliation:
Department of Oncology and Pathology, Division of Clinical Cancer Epidemiology, Karolinska Hospital, Stockholm, Sweden
Mohamed Farouk Allam
Affiliation:
Department of Preventive Medicine and Public Health, Faculty of Medicine, University of Cordoba, Cordoba, Spain
Margaret R. Karagas
Affiliation:
Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
Carlo La Vecchia
Affiliation:
Department of Clinical Medicine and Community Health, University of Milan, Milan, Italy
Stefano Porru
Affiliation:
Department of Diagnostics and Public Health, Section of Occupational Health, University of Verona, Verona, Italy University Research Center ‘Integrated Models for Prevention and Protection in Environmental and Occupational Health’ MISTRAL, University of Verona, Milano Bicocca and Brescia, Italy
Angela Carta
Affiliation:
University Research Center ‘Integrated Models for Prevention and Protection in Environmental and Occupational Health’ MISTRAL, University of Verona, Milano Bicocca and Brescia, Italy Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
Klaus Golka
Affiliation:
Leibniz Research Centre for Working Environment and Human Factors at TU Dortmund, Dortmund, Germany
Kenneth C. Johnson
Affiliation:
Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, ON, Canada
Simone Benhamou
Affiliation:
INSERM U946, Variabilite Genetique et Maladies Humaines, Fondation Jean Dausset/CEPH, Paris, France
Zuo-Feng Zhang
Affiliation:
Departments of Epidemiology, UCLA Center for Environmental Genomics, Fielding School of Public Health, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
Cristina Bosetti
Affiliation:
Department of Oncology, Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Milan, Italy
Jack A. Taylor
Affiliation:
Epidemiology Branch, and Epigenetic and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC, USA
Elisabete Weiderpass
Affiliation:
International Agency for Research on Cancer (IARC), World Health Organization, Lyon, France
Eric J. Grant
Affiliation:
Department of Epidemiology Radiation Effects Research Foundation, Hiroshima, Japan
Emily White
Affiliation:
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Jerry Polesel
Affiliation:
Unit of Cancer Epidemiology, Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, Aviano, Italy
Maurice P. A. Zeegers
Affiliation:
CAPHRI School for Public Health and Primary Care, University of Maastricht, Maastricht, The Netherlands School of Cancer Sciences, University of Birmingham, Birmingham, UK
*
*Corresponding author: Anke Wesselius, email anke.wesselius@maastrichtuniversity.nl
Rights & Permissions [Opens in a new window]

Abstract

At present, analysis of diet and bladder cancer (BC) is mostly based on the intake of individual foods. The examination of food combinations provides a scope to deal with the complexity and unpredictability of the diet and aims to overcome the limitations of the study of nutrients and foods in isolation. This article aims to demonstrate the usability of supervised data mining methods to extract the food groups related to BC. In order to derive key food groups associated with BC risk, we applied the data mining technique C5.0 with 10-fold cross-validation in the BLadder cancer Epidemiology and Nutritional Determinants study, including data from eighteen case–control and one nested case–cohort study, compromising 8320 BC cases out of 31 551 participants. Dietary data, on the eleven main food groups of the Eurocode 2 Core classification codebook, and relevant non-diet data (i.e. sex, age and smoking status) were available. Primarily, five key food groups were extracted; in order of importance, beverages (non-milk); grains and grain products; vegetables and vegetable products; fats, oils and their products; meats and meat products were associated with BC risk. Since these food groups are corresponded with previously proposed BC-related dietary factors, data mining seems to be a promising technique in the field of nutritional epidemiology and deserves further examination.

Information

Type
Full Papers
Copyright
© The Author(s), 2020
Figure 0

Table 1. Baseline characteristics and food group information from the BLadder cancer Epidemiology and Nutritional Determinants (BLEND) data set* (Numbers and percentages; mean values and standard deviations)

Figure 1

Fig. 1. Example of a decision tree. There are three individual variables, A, B and C, on which the tree splits. Variable A has an average ranking of 1 because it is the root node and appears only once. Variable B has an average ranking of 2·5, since it appears twice, once on the second and once on the third rank. Variable C has an average ranking of 2, since it is present only once and the tree splits on it after it split on A.

Figure 2

Fig. 2. Importance values of input variables after C5.0 in the BLEND data set. A: milk and dairy products; B: eggs and egg products; C: meats and meat products; D: fishes and fish products; E: fats, oils and their products; F: grains and grain products; G: pulses, seeds, kernels, nuts and their products; H: vegetables and vegetable products; I: fruits and fruit products; J: sugar and sugar products; K: beverages (non-milk). The importance values range from 0 to 100 %, where 0 % indicates ‘unimportant’ and 100 % indicates ‘extremely important’.

Figure 3

Table 2. Classification rules derived from C5.0 ‘Ruleset’ in the BLadder cancer Epidemiology and Nutritional Determinants (BLEND) data set* (Percentages)

Supplementary material: File

Yu et al. supplementary material

Yu et al. supplementary material

Download Yu et al. supplementary material(File)
File 323.1 KB