Hostname: page-component-77f85d65b8-45ctf Total loading time: 0 Render date: 2026-03-26T23:16:13.278Z Has data issue: false hasContentIssue false

Whole-genome sequencing surveillance and machine learning for healthcare outbreak detection and investigation: A systematic review and summary

Published online by Cambridge University Press:  13 June 2022

Alexander J. Sundermann
Affiliation:
Microbial Genomic Epidemiology Laboratory, Center for Genomic Epidemiology, University of Pittsburgh, Pittsburgh, Pennsylvania Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
Jieshi Chen
Affiliation:
Auton Lab, Carnegie Mellon University, Pittsburgh, Pennsylvania
James K. Miller
Affiliation:
Auton Lab, Carnegie Mellon University, Pittsburgh, Pennsylvania
Elise M. Martin
Affiliation:
Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania Department of Infection Prevention and Hospital Epidemiology, UPMC Presbyterian, Pittsburgh, Pennsylvania
Graham M. Snyder
Affiliation:
Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania Department of Infection Prevention and Hospital Epidemiology, UPMC Presbyterian, Pittsburgh, Pennsylvania
Daria Van Tyne
Affiliation:
Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
Jane W. Marsh
Affiliation:
Microbial Genomic Epidemiology Laboratory, Center for Genomic Epidemiology, University of Pittsburgh, Pittsburgh, Pennsylvania Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
Artur Dubrawski
Affiliation:
Auton Lab, Carnegie Mellon University, Pittsburgh, Pennsylvania
Lee H. Harrison*
Affiliation:
Microbial Genomic Epidemiology Laboratory, Center for Genomic Epidemiology, University of Pittsburgh, Pittsburgh, Pennsylvania Division of Infectious Diseases, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
*
Author for correspondence: Lee H. Harrison, University of Pittsburgh, A530 Crabtree Hall, 130 Desoto Street, Pittsburgh, PA 15261. E-mail: lharriso@pitt.edu

Abstract

Background:

Whole-genome sequencing (WGS) has traditionally been used in infection prevention to confirm or refute the presence of an outbreak after it has occurred. Due to decreasing costs of WGS, an increasing number of institutions have been utilizing WGS-based surveillance. Additionally, machine learning or statistical modeling to supplement infection prevention practice have also been used. We systematically reviewed the use of WGS surveillance and machine learning to detect and investigate outbreaks in healthcare settings.

Methods:

We performed a PubMed search using separate terms for WGS surveillance and/or machine-learning technologies for infection prevention through March 15, 2021.

Results:

Of 767 studies returned using the WGS search terms, 42 articles were included for review. Only 2 studies (4.8%) were performed in real time, and 39 (92.9%) studied only 1 pathogen. Nearly all studies (n = 41, 97.6%) found genetic relatedness between some isolates collected. Across all studies, 525 outbreaks were detected among 2,837 related isolates (average, 5.4 isolates per outbreak). Also, 35 studies (83.3%) only utilized geotemporal clustering to identify outbreak transmission routes. Of 21 studies identified using the machine-learning search terms, 4 were included for review. In each study, machine learning aided outbreak investigations by complementing methods to gather epidemiologic data and automating identification of transmission pathways.

Conclusions:

WGS surveillance is an emerging method that can enhance outbreak detection. Machine learning has the potential to identify novel routes of pathogen transmission. Broader incorporation of WGS surveillance into infection prevention practice has the potential to transform the detection and control of healthcare outbreaks.

Information

Type
Review
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction in any medium, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of The Society for Healthcare Epidemiology of America
Figure 0

Fig. 1. Search terms in PubMed for whole-genome sequencing surveillance.

Figure 1

Fig. 2. Search terms in PubMed for machine learning and modeling.

Figure 2

Fig. 3. Summary by year of 42 whole-genome sequencing (WGS) surveillance studies in PubMed through March 15, 2021. *Through March 15, 2021.

Figure 3

Table 1. Studies by Date, Organism, and Outbreaks Detected Utilizing WGS Surveillance

Figure 4

Fig. 4. Distribution of single nucleotide polymorphisms (SNPs) for defining genetic relatedness from 42 studies.

Figure 5

Table 2. Studies Utilizing Machine Learning or Modeling to Detect outbreaks Or Transmission

Supplementary material: File

Sundermann et al. supplementary material

Sundermann et al. supplementary material

Download Sundermann et al. supplementary material(File)
File 17.4 KB