Hostname: page-component-77f85d65b8-t6st2 Total loading time: 0 Render date: 2026-03-26T22:09:14.038Z Has data issue: false hasContentIssue false

An analysis of the Clinical and Translational Science Award pilot project portfolio using data from Research Performance Progress Reports

Published online by Cambridge University Press:  18 August 2022

Sean A. Klein*
Affiliation:
Office of Science and Data Policy, Office of the Assistant Secretary for Planning and Evaluation, US Department of Health and Human Services, Washington, DC, USA
Michael Baiocchi
Affiliation:
Department of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
Jordan Rodu
Affiliation:
Department of Statistics, University of Virginia, Charlottesville, VA, USA
Heather Baker
Affiliation:
Division of Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
Erica Rosemond
Affiliation:
Division of Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
Jamie Mihoko Doyle
Affiliation:
Division of Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
*
Address for correspondence: S.A. Klein, PhD, Assistant Secretary for Planning and Evaluation, Room 440F1, U.S. Department of Health and Human Services, 200 Independence Avenue SW, Washington, DC 20201, USA. Email: sean.klein@hhs.gov
Rights & Permissions [Opens in a new window]

Abstract

Introduction:

Pilot projects (“pilots”) are important for testing hypotheses in advance of investing more funds for full research studies. For some programs, such as Clinical and Translational Science Awards (CTSAs) supported by the National Center for Translational Sciences, pilots also make up a significant proportion of the research projects conducted with direct CTSA support. Unfortunately, administrative data on pilots are not typically captured in accessible databases. Though data on pilots are included in Research Performance Progress Reports, it is often difficult to extract, especially for large programs like the CTSAs where more than 600 pilots may be reported across all awardees annually. Data extraction challenges preclude analyses that could provide valuable information about pilots to researchers and administrators.

Methods:

To address those challenges, we describe a script that partially automates extraction of pilot data from CTSA research progress reports. After extraction of the pilot data, we use an established machine learning (ML) model to determine the scientific content of pilots for subsequent analysis. Analysis of ML-assigned scientific categories reveals the scientific diversity of the CTSA pilot portfolio and relationships among individual pilots and institutions.

Results:

The CTSA pilots are widely distributed across a number of scientific areas. Content analysis identifies similar projects and the degree of overlap for scientific interests among hubs.

Conclusion:

Our results demonstrate that pilot data remain challenging to extract but can provide useful information for communicating with stakeholders, administering pilot portfolios, and facilitating collaboration among researchers and hubs.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of The Association for Clinical and Translational Science
Figure 0

Fig. 1. Ten most frequent root categories assigned to pilots supported by NCATS (TR Pilots) and grants supported by disease-focused ICs. Roots assigned to exploratory research grants (R01, R21) supported by National Institute of Allergy and Infectious Disease (AI), National Cancer Institute (CA), and National Heart, Lung, and Blood Institute (HL) are compared against those assigned to National Center for Advancing Translational Science pilots (TR Pilots). Bars represent the percentage of grants in each IC’s selected portfolio that were assigned to each root. Only the ten most frequent (without ties) roots are shown for each panel, all other roots are left blank. Bars do not sum to 100 because pilots can be assigned multiple roots. Abbreviations: National Center for Accelerating Translational Science (NCATS), National Institutes of Health Institutes and Centers (ICs).

Figure 1

Fig. 2. Comparison of entropy and number of projects to describe distribution of hubs’ shares of pilots within a root. (A) A scatter plot of the number of projects assigned to each root versus the root’s entropy (N = 75). Filled and empty circles represent roots, with the filled circles identifying those roots highlighted for additional analysis in (B), below. (B) Bar charts showing the distribution of hubs’ shares in the four roots in (A). Each bar represents a single hub’s share (as a percentage) of pilots assigned that root. Only the twenty largest shareholders (hubs) are shown for each root and are not the same across plots (e.g., the bottom bar for 1 may not be the same hub as the bottom bar for 2,3, or 4). Both Caregiving Research and Coronaviruses have fewer bars because fewer than twenty hubs had at least one project assigned to those roots. Plots are arranged in order of increasing numbers of projects assigned to a root.

Figure 2

Fig. 3. Conditional probability matrix for select root categories. A heatmap using a subset of ten roots was used to describe the NCATS pilot portfolio. Cells are shaded by the conditional probability of observing y-axis roots given assignment of x-axis roots in the pilot data (i.e., p(Y|X)) with the diagonal colored black. The matrix is asymmetric as the conditional probability of a y-axis root given an x-axis one is not necessarily equal to the reverse (p(Y|X) not necessarily equal to p(X|Y)).

Figure 3

Fig. 4. Scientific composition of clusters generated from inter-pilot similarity values. The five most frequent roots in each of the 6 clusters identified by hierarchical clustering. Bars represent the percent of pilots within the cluster assigned that root. Plot titles indicate the cluster number. Abbreviations: Networking and Information Technology R&D (NITRD), Machine Learning and Artificial Intelligence (ML/AI).

Figure 4

Fig. 5. Pairwise Jaccard similarities (JX,Y) from hubs’ CTSA pilot portfolios. (A) Histogram of pairwise Jaccard similarity (JX,Y) values between all hubs with at least 10 unique roots (N = 54). Only one JX,Y value per pair is included as JX,Y = JY,X. (B) Heatmap of JX,Y for a representative subset of eleven hubs. Hubs were selected to represent the full spectrum of similarity values observed. As the similarity matrix is symmetric, only the top half of the heatmap is shown with the diagonal and bottom half set to zero. (C) Comparison of the ten most frequent roots (excluding ties for concision) for the most similar hubs in (B), 2 and 7. The hub number is listed at the top of each panel with bars representing the percent of projects from that hub for the root listed on the y-axis. When no bar appears, root frequency is zero. (D) Same as (C) but for a highly dissimilar pair of hubs (3 and 5) from (B). The larger number of roots on the y-axis of (D) relative to (C) is due to hubs 3 and 5 sharing fewer roots than hubs 2 and 7. Abbreviations: Clinical and Translational Science Awards (CTSA).

Figure 5

Fig. 6. Similarity distributions for NCATS pilots and disease-focused IC research grants. The Jaccard similarity (using the ten most frequent roots, accepting ties) distributions are shown as box plots where outside values are represented as points outside the box whiskers (outliers are more than 1.5 times the interquartile range from the first or third quartile). Distributions contain similarities between only those institutions that are also hubs in the CTSA network (N = 54). Hubs with fewer than ten unique roots after aggregating their pilots are excluded. Abbreviations are the same as those used in Fig. 1. Abbreviations: National Institute of Allergy and Infectious Disease (AI), National Cancer Institute (CA), National Heart, Lung, and Blood Institute (HL), National Center for Advancing Translational Science (NCATS, TR), Clinical and Translational Science Awards (CTSA).

Supplementary material: File

Klein et al. supplementary material

Klein et al. supplementary material

Download Klein et al. supplementary material(File)
File 42.8 KB