Hostname: page-component-77f85d65b8-grvzd Total loading time: 0 Render date: 2026-03-27T07:40:41.242Z Has data issue: false hasContentIssue false

Evaluation of an ensemble-based distance statistic for clustering MLST datasets using epidemiologically defined clusters of cyclosporiasis

Published online by Cambridge University Press:  03 August 2020

Fernanda S. Nascimento
Affiliation:
Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA
Joel Barratt*
Affiliation:
Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA Oak Ridge Institute for Science and Education, Oak ridge, TN, USA
Katelyn Houghton
Affiliation:
Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA Oak Ridge Institute for Science and Education, Oak ridge, TN, USA
Mateusz Plucinski
Affiliation:
Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
Julia Kelley
Affiliation:
Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
Shannon Casillas
Affiliation:
Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA
Carolyne (Cody) Bennett
Affiliation:
Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA Oak Ridge Institute for Science and Education, Oak ridge, TN, USA
Cathy Snider
Affiliation:
Texas Department of State Health Services, TX, USA
Rashmi Tuladhar
Affiliation:
Texas Department of State Health Services, TX, USA
Jenny Zhang
Affiliation:
Texas Department of State Health Services, TX, USA
Brooke Clemons
Affiliation:
New York State Department of Health, Wadsworth Center Parasitology Laboratory, NY, USA
Susan Madison-Antenucci
Affiliation:
New York State Department of Health, Wadsworth Center Parasitology Laboratory, NY, USA
Alexis Russell
Affiliation:
New York State Department of Health-Bureau of Communicable Disease Control, NY, USA
Elizabeth Cebelinski
Affiliation:
Minnesota Department of Health, MN, USA
Jisun Haan
Affiliation:
Minnesota Department of Health, MN, USA
Trisha Robinson
Affiliation:
Minnesota Department of Health, MN, USA
Michael J. Arrowood
Affiliation:
Waterborne Disease Prevention Branch, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA
Eldin Talundzic
Affiliation:
Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
Richard S. Bradbury
Affiliation:
Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA
Yvonne Qvarnstrom
Affiliation:
Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA
*
Author for correspondence: Joel Barratt, E-mail: jbarratt@cdc.gov, joelbarratt43@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Outbreaks of cyclosporiasis, a food-borne illness caused by the coccidian parasite Cyclospora cayetanensis have increased in the USA in recent years, with approximately 2300 laboratory-confirmed cases reported in 2018. Genotyping tools are needed to inform epidemiological investigations, yet genotyping Cyclospora has proven challenging due to its sexual reproductive cycle which produces complex infections characterized by high genetic heterogeneity. We used targeted amplicon deep sequencing and a recently described ensemble-based distance statistic that accommodates heterogeneous (mixed) genotypes and specimens with partial genotyping data, to genotype and cluster 648 C. cayetanensis samples submitted to CDC in 2018. The performance of the ensemble was assessed by comparing ensemble-identified genetic clusters to analogous clusters identified independently based on common food exposures. Using these epidemiologic clusters as a gold standard, the ensemble facilitated genetic clustering with 93.8% sensitivity and 99.7% specificity. Hence, we anticipate that this procedure will greatly complement epidemiologic investigations of cyclosporiasis.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Author(s), 2020. Published by Cambridge University Press
Figure 0

Table 1. List of PCR primers used to amplify eight Cyclospora cayetanensis genotyping targets

Figure 1

Fig. 1. Cluster dendrogram generated from the ensemble matrix of pairwise distances. The ensemble matrix was clustered using Wards clustering method to generate the dendrogram shown. A 10-cluster model was considered the most parsimonious and branches are colour-coded according to the clusters identified using this model. Peripheral bar colours indicate specimens from case-patients epidemiologically linked to outbreaks of cyclosporiasis identified in the USA in our study, where at least one specimen was genotyped; colours of these bars indicate identified epidemiologic linkages per the legend. To determine the specific location of a given specimen in this dendrogram refer to Supplementary File S1, Appendix 2, which is a searchable pdf of the same dendrogram that includes all specimen names. The number of specimens assigned to each of the 10 genetic clusters is as follows: genetic cluster 1 (34 cases), cluster 2 (92 cases), cluster 3 (93 cases), cluster 4 (144 cases), cluster 5 (10 cases), cluster 6 (40 cases), cluster 7 (150 cases), cluster 8 (35 cases), cluster 9 (28 cases), cluster 10 (40 cases).

Figure 2

Fig. 2. Ensemble pairwise distance matrix visualised using MicrobeTrace. To generate this network the same ensemble matrix used to construct Figure 1 (Supplementary File S2, Tab E) was filtered to a value of 0.15 using MicrobeTrace (https://github.com/CDCgov/MicrobeTrace/wiki). Nodes were colour-coded according to their epidemiological linkage, using the same colours used to denote epidemiologically-defined clusters in Figure 1.

Figure 3

Table 2. Concordance of epidemiologic information and genetic clustering.

Figure 4

Table 3. Assessment of the ensemble performance against an epidemiologic gold standard

Figure 5

Fig. 3. Epidemiologic curve for cyclosporiasis cases (cases over time) plotted for each genetic cluster. Onset of illness dates for cases of cyclosporiasis is plotted as a separate histogram for each genetic cluster. Temporal clustering of specimens from cluster 4 and cluster 7 is apparent. Some temporal clustering seems apparent for cluster 2, which may possess a bimodal distribution. Colours used to denote each genetic cluster here corresponds to those used to denote genetic clusters in Figure 1.

Supplementary material: File

Nascimento et al. supplementary material

Nascimento et al. supplementary material 1

Download Nascimento et al. supplementary material(File)
File 2.1 MB
Supplementary material: File

Nascimento et al. supplementary material

Nascimento et al. supplementary material 2

Download Nascimento et al. supplementary material(File)
File 2.3 MB