Hostname: page-component-5db58dd55d-xnzfm Total loading time: 0 Render date: 2026-06-01T00:18:33.981Z Has data issue: false hasContentIssue false

Integrating large language models in biostatistical workflows for clinical and translational research

Published online by Cambridge University Press:  30 May 2025

Steven C. Grambow*
Affiliation:
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
Manisha Desai
Affiliation:
Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
Kevin P. Weinfurt
Affiliation:
Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA
Christopher J. Lindsell
Affiliation:
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
Michael J. Pencina
Affiliation:
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
Lacey Rende
Affiliation:
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
Gina-Maria Pomann
Affiliation:
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
*
Corresponding author: S.C. Grambow; Email: steven.grambow@duke.edu
Rights & Permissions [Opens in a new window]

Abstract

Introduction:

Biostatisticians increasingly use large language models (LLMs) to enhance efficiency, yet practical guidance on responsible integration is limited. This study explores current LLM usage, challenges, and training needs to support biostatisticians.

Methods:

A cross-sectional survey was conducted across three biostatistics units at two academic medical centers. The survey assessed LLM usage across three key professional activities: communication and leadership, clinical and domain knowledge, and quantitative expertise. Responses were analyzed using descriptive statistics, while free-text responses underwent thematic analysis.

Results:

Of 208 eligible biostatisticians (162 staff and 46 faculty), 69 (33.2%) responded. Among them, 44 (63.8%) reported using LLMs; of the 43 who answered the frequency question, 20 (46.5%) used them daily and 16 (37.2%) weekly. LLMs improved productivity in coding, writing, and literature review; however, 29 of 41 respondents (70.7%) reported significant errors, including incorrect code, statistical misinterpretations, and hallucinated functions. Key verification strategies included expertise, external validation, debugging, and manual inspection. Among 58 respondents providing training feedback, 44 (75.9%) requested case studies, 40 (69.0%) sought interactive tutorials, and 37 (63.8%) desired structured training.

Conclusions:

LLM usage is notable among respondents at two academic medical centers, though response patterns likely reflect early adopters. While LLMs enhance productivity, challenges like errors and reliability concerns highlight the need for verification strategies and systematic validation. The strong interest in training underscores the need for structured guidance. As an initial step, we propose eight core principles for responsible LLM integration, offering a preliminary framework for structured usage, validation, and ethical considerations.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Association for Clinical and Translational Science
Figure 0

Table 1. Participant demographics and large language model (LLM) Usage characteristics

Figure 1

Figure 1. Eight guiding principles for responsible large language model (LLM) use in biostatistical workflows. These principles were developed as a synthesis of findings from our survey – particularly reported usage barriers, verification strategies, and ethical concerns – alongside a review of responsible AI literature and our professional experience as early adopters in academic biostatistics. This framework is not a direct representation of survey frequencies but is instead intended to offer early guidance on best practices. Issues such as federal grant policy restrictions and risks of intellectual property exposure through public application programming interfaces (APIs) fall within the scope of principles such as ethical considerations, transparency, and multiple tool integration.

Supplementary material: File

Grambow et al. supplementary material

Grambow et al. supplementary material
Download Grambow et al. supplementary material(File)
File 1.2 MB