Skip to main content Accessibility help

Automated data mining of the electronic health record for investigation of healthcare-associated outbreaks

  • Alexander J. Sundermann (a1) (a2), James K. Miller (a3), Jane W. Marsh (a1), Melissa I. Saul (a4), Kathleen A. Shutt (a1), Marissa Pacey (a1), Mustapha M. Mustapha (a1), Ashley Ayres (a2), A. William Pasculle (a5), Jieshi Chen (a3), Graham M. Snyder (a2), Artur W. Dubrawski (a3) and Lee H. Harrison (a1)...
  • Please note a correction has been issued for this article.



Identifying routes of transmission among hospitalized patients during a healthcare-associated outbreak can be tedious, particularly among patients with complex hospital stays and multiple exposures. Data mining of the electronic health record (EHR) has the potential to rapidly identify common exposures among patients suspected of being part of an outbreak.


We retrospectively analyzed 9 hospital outbreaks that occurred during 2011–2016 and that had previously been characterized both according to transmission route and by molecular characterization of the bacterial isolates. We determined (1) the ability of data mining of the EHR to identify the correct route of transmission, (2) how early the correct route was identified during the timeline of the outbreak, and (3) how many cases in the outbreaks could have been prevented had the system been running in real time.


Correct routes were identified for all outbreaks at the second patient, except for one outbreak involving >1 transmission route that was detected at the eighth patient. Up to 40 or 34 infections (78% or 66% of possible preventable infections, respectively) could have been prevented if data mining had been implemented in real time, assuming the initiation of an effective intervention within 7 or 14 days of identification of the transmission route, respectively.


Data mining of the EHR was accurate for identifying routes of transmission among patients who were part of the outbreak. Prospective validation of this approach using routine whole-genome sequencing and data mining of the EHR for both outbreak detection and route attribution is ongoing.


Corresponding author

Author for correspondence: Lee H. Harrison, Email:


Hide All
1.Magill, SS, Edwards, JR, Bamberg, W, et al. Multistate point-prevalence survey of health care–associated infections. N Engl J Med 2014;370:11981208.
2.Scott, RD. The direct medical costs of healthcare-associated infections in US hospitals and the benefits of prevention, 2009. Centers for Disease Control and Prevention website. Published 2009. Accessed August 13, 2018.
3.Marsh, JW, Krauland, MG, Nelson, JS, et al. Genomic epidemiology of an endoscope-associated outbreak of Klebsiella pneumoniae carbapenemase (KPC)–producing K. pneumoniae. PLoS One 2015;10:e0144310. doi: 10.1371/journal.pone.0144310.
4.Sood, G, Perl, TM. Outbreaks in health care settings. Infect Dis Clin N Am 2016;30:661687.
5.Vonberg, RP, Weitzel-Kage, D, Behnke, M, et al. Worldwide outbreak database: the largest collection of nosocomial outbreaks. Infection 2011;39:2934.
6.Peacock, SJ, Parkhill, J, Brown, NM. Changing the paradigm for hospital outbreak detection by leading with genomic surveillance of nosocomial pathogens. Microbiology 2018;164:12131219. doi: 10.1099/mic.0.000700.
7.Heinrichs, A, Argudin, MA, De Mendonca, R, et al. An outpatient clinic as a potential site of transmission for an outbreak of NDM-producing Klebsiella pneumonia ST716: a study using whole-genome sequencing. Clin Infect Dis 2018. doi: 10.1093/cid/ciy581.
8.Domman, D, Chowdhury, F, Khan, Al, et al. Defining endemic cholera at three levels of spatiotemporal resolution within Bangladesh. Nat Genet 2018;50:951955.
9.Pak, TR, Kasarskis, A. How next-generation sequencing and multiscale data analysis will transform infectious disease management. Clin Infect Dis 2015;61:16951702.
10.Yount, RJ, Vries, JK, Councill, CD. The medical archival retrieval system: an information retrieval system based on distributed parallel processing. Inform Process Manag 1991;27:111.
11.Miller, JK, Chen, C, Sundermann, AJ, Marsh, JW, Saul, MI, Shutt, KA, Pacey, M, Mustapha, MM, Harrison, LH, Dubrawski, A. Statistical outbreak detection by joining medical records and pathogen similarity. Accepted manuscript for J Biomed Inform.
12.Parr, A, Querry, A, Pasculle, A, Morgan, D, Muto, C. Carbapenem-resistant Klebsiella pneumoniae cluster associated with gastroscope exposure among surgical intensive care unit patients at University of Pittsburgh Medical Center. Open Forum Infect Dis 2016:3:248.


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed

A correction has been issued for this article: