Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-13T21:46:00.882Z Has data issue: false hasContentIssue false

A Generalizable, Data-Driven Approach to Predict Daily Risk of Clostridium difficile Infection at Two Large Academic Health Centers

Published online by Cambridge University Press:  26 March 2018

Jeeheh Oh
Affiliation:
Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan
Maggie Makar
Affiliation:
Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology, Cambridge, Massachusetts
Christopher Fusco
Affiliation:
Information Systems, Partners HealthCare, Boston, Massachusetts
Robert McCaffrey
Affiliation:
Information Systems, Partners HealthCare, Boston, Massachusetts
Krishna Rao
Affiliation:
Infectious Diseases Division, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan
Erin E. Ryan
Affiliation:
Division of Infectious Diseases, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts Infection Control Unit, Massachusetts General Hospital, Boston, Massachusetts
Laraine Washer
Affiliation:
Infectious Diseases Division, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan Department of Infection Prevention and Epidemiology, Michigan Medicine, Ann Arbor, Michigan
Lauren R. West
Affiliation:
Division of Infectious Diseases, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts Infection Control Unit, Massachusetts General Hospital, Boston, Massachusetts
Vincent B. Young
Affiliation:
Infectious Diseases Division, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, Michigan
John Guttag
Affiliation:
Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology, Cambridge, Massachusetts
David C. Hooper
Affiliation:
Division of Infectious Diseases, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts Infection Control Unit, Massachusetts General Hospital, Boston, Massachusetts Harvard Medical School, Boston, Massachusetts
Erica S. Shenoy*
Affiliation:
Division of Infectious Diseases, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts Infection Control Unit, Massachusetts General Hospital, Boston, Massachusetts Harvard Medical School, Boston, Massachusetts Medical Practice Evaluation Center, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts
Jenna Wiens*
Affiliation:
Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan
*
Address correspondence to Jenna Wiens, PhD, 2260 Hayward Street, Ann Arbor, MI 48109 (wiensj@umich.edu) or Erica S. Shenoy, MD, PhD, 55 Fruit Street, Bulfinch 334, Boston, MA (eshenoy@partners.org).
Address correspondence to Jenna Wiens, PhD, 2260 Hayward Street, Ann Arbor, MI 48109 (wiensj@umich.edu) or Erica S. Shenoy, MD, PhD, 55 Fruit Street, Bulfinch 334, Boston, MA (eshenoy@partners.org).
Rights & Permissions [Opens in a new window]

Abstract

OBJECTIVE

An estimated 293,300 healthcare-associated cases of Clostridium difficile infection (CDI) occur annually in the United States. To date, research has focused on developing risk prediction models for CDI that work well across institutions. However, this one-size-fits-all approach ignores important hospital-specific factors. We focus on a generalizable method for building facility-specific models. We demonstrate the applicability of the approach using electronic health records (EHR) from the University of Michigan Hospitals (UM) and the Massachusetts General Hospital (MGH).

METHODS

We utilized EHR data from 191,014 adult admissions to UM and 65,718 adult admissions to MGH. We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 4,836 features from patients at UM and 1,837 from patients at MGH. We used L2 regularized logistic regression to learn the models, and we measured the discriminative performance of the models on held-out data from each hospital.

RESULTS

Using the UM and MGH test data, the models achieved area under the receiver operating characteristic curve (AUROC) values of 0.82 (95% confidence interval [CI], 0.80–0.84) and 0.75 ( 95% CI, 0.73–0.78), respectively. Some predictive factors were shared between the 2 models, but many of the top predictive factors differed between facilities.

CONCLUSION

A data-driven approach to building models for estimating daily patient risk for CDI was used to build institution-specific models at 2 large hospitals with different patient populations and EHR systems. In contrast to traditional approaches that focus on developing models that apply across hospitals, our generalizable approach yields risk-stratification models tailored to an institution. These hospital-specific models allow for earlier and more accurate identification of high-risk patients and better targeting of infection prevention strategies.

Infect Control Hosp Epidemiol 2018;39:425–433

Information

Type
Original Articles
Copyright
© 2018 by The Society for Healthcare Epidemiology of America. All rights reserved 
Figure 0

FIGURE 1 Inclusion and exclusion criteria and demographics of study populations. The inclusion and exclusion criteria for the study population at each institution are shown, along with the demographics of the final study populations. The period for inclusion, length of stay duration requirements, and kind of visit differed slightly between study populations. The same exclusion criteria were applied with regards to history of CDI within 14 days prior to admission and positive CDI within 2 calendar days of admission for both study populations. The final study populations comprised 191,014 and 65,718 adult inpatient encounters at UM and MGH, respectively.

Figure 1

TABLE 1 Selected Characteristics of Study Cohorts

Figure 2

FIGURE 2 Discriminative performance of the institution specific classifiers on their respective held-out test sets. The receiver operating characteristics curves illustrate the tradeoff in performance between the false-positive rate (1-specificity) and the true-positive rate (sensitivity). Both classifiers achieve good discriminative performance as measured by the area under the ROC curve (AUROC): an AUROC of 0.82 and an AUROC of 0.75 at UM and MGH, respectively.

Figure 3

FIGURE 3 Measuring model calibration. Predictions are grouped into quintiles by predicted risk and plotted against observed CDI incidence rate within each quintile. Points that fall closer to the “y=x” line are better calibrated. Classifiers for both institutions appear to be well calibrated. This is also evident in their low Brier scores: both classifiers have scores of 0.01. Brier scores measure the accuracy of probabilistic predictions and range from 0 to 1, where 0 represents perfectly calibrated predictions. The calibration plot for UM is shown on the left, with MGH on the right.

Figure 4

FIGURE 4 Confusion matrices of the institution specific classifiers on their respective held-out test sets. Selecting a decision threshold based on the 95th percentile results in classifiers that achieves very good specificity 95.2% at both institutions and relatively good positive predictive values of 5.6% and 4.4% at UM and MGH, respectively. For perspective, the baseline positive predictive values (ie, fraction of positive cases) at each institution are 1.00% and 0.74%, respectively. Thus, both were approximately 6 times better than the baseline.

Figure 5

FIGURE 5 Measuring how far in advance the model correctly identifies cases. Using a threshold based on the 95th percentile, we measure the time from when each positive patient first crosses that threshold to when they are clinically diagnosed with CDI. At both institutions, of those patients who are correctly identified as positive (ie, the true positives) the model identifies half of the patients at least 5 days in advance (black dashed line represents the median). The plot for UM is shown on the left, with MGH on the right.

Figure 6

TABLE 2 Final Regression Coefficients With Positive Coefficients Conferring Risk and Negative Values Indicating Protectiona

Supplementary material: File

Oh et al. supplementary material

Table S1

Download Oh et al. supplementary material(File)
File 18.9 KB