Hostname: page-component-6766d58669-r8qmj Total loading time: 0 Render date: 2026-05-22T16:59:37.846Z Has data issue: false hasContentIssue false

Eight practices for data management to enable team data science

Published online by Cambridge University Press:  23 June 2020

Andrew McDavid
Affiliation:
Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA
Anthony M. Corbett
Affiliation:
Clinical and Translational Science Institute, University of Rochester, Rochester, NY, USA
Jennifer L. Dutra
Affiliation:
Clinical and Translational Science Institute, University of Rochester, Rochester, NY, USA
Andrew G. Straw
Affiliation:
Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA
David J. Topham
Affiliation:
Department of Microbiology and Immunology, University of Rochester, Rochester, NY, USA
Gloria S. Pryhuber
Affiliation:
Department of Pediatrics, University of Rochester, Rochester, NY, USA
Mary T. Caserta
Affiliation:
Department of Pediatrics, University of Rochester, Rochester, NY, USA
Steven R. Gill
Affiliation:
Department of Microbiology and Immunology, University of Rochester, Rochester, NY, USA
Kristin M. Scheible
Affiliation:
Department of Pediatrics, University of Rochester, Rochester, NY, USA
Jeanne Holden-Wiltse*
Affiliation:
Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA Clinical and Translational Science Institute, University of Rochester, Rochester, NY, USA
*
Address for correspondence: J. Holden-Wiltse, MPH MBA, University of Rochester Medical Center, School of Medicine and Dentistry, 601 Elmwood Ave, Box 630, Rochester, NY, 14642, USA. Email: Jeanne_Wiltse@URMC.Rochester.edu
Rights & Permissions [Opens in a new window]

Abstract

Introduction:

In clinical and translational research, data science is often and fortuitously integrated with data collection. This contrasts to the typical position of data scientists in other settings, where they are isolated from data collectors. Because of this, effective use of data science techniques to resolve translational questions requires innovation in the organization and management of these data.

Methods:

We propose an operational framework that respects this important difference in how research teams are organized. To maximize the accuracy and speed of the clinical and translational data science enterprise under this framework, we define a set of eight best practices for data management.

Results:

In our own work at the University of Rochester, we have strived to utilize these practices in a customized version of the open source LabKey platform for integrated data management and collaboration. We have applied this platform to cohorts that longitudinally track multidomain data from over 3000 subjects.

Conclusions:

We argue that this has made analytical datasets more readily available and lowered the bar to interdisciplinary collaboration, enabling a team-based data science that is unique to the clinical and translational setting.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Association for Clinical and Translational Science 2020
Figure 0

Fig. 1. A data science workflow in clinical and translational teams. The lifecycle of a Team Data Science project begins with data collection and proceeds in a nonlinear and iterative fashion until conclusions are communicated and data and models are available for reuse (1a). Study personnel will interact in varying degrees with different aspects of the data science lifecycle (1b), while a data scientist visits all phases. Bolded interactions highlight a primary use of a role, while dashed lines indicate ancillary uses.

Figure 1

Table 1. Eight practices to implement team data science

Figure 2

Fig. 2. A high-level overview of how study personnel interact with the BLIS data management platform. Clinicians, technicians, and experimentalists generate data for different aspects of the study. Data engineers implement the centralized study portal using the BLIS data management platform, with responsibility to connect all elements of the workflow and interact continuously with all study team members.