Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-05T17:50:25.163Z Has data issue: false hasContentIssue false

To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis

Published online by Cambridge University Press:  24 March 2025

Robert Poulin*
Affiliation:
Department of Zoology, University of Otago, Dunedin, New Zealand

Abstract

The impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host–parasite system. When using parasite abundance as a predictor in statistical analysis, a common approach is to bin values, i.e. group hosts into infection categories based on abundance, and test for differences in some response variable (e.g. a host trait) among these categories. There are well-documented pitfalls associated with this approach. Here, I use a literature review to show that binning abundance values for analysis has been used in one-third of studies published in parasitological journals over the past 15 years, and half of the studies in ecological and behavioural journals, often without any justification. Binning abundance data into arbitrary categories has been much more common among studies using experimental infections than among those using naturally infected hosts. I then use simulated data to demonstrate that true and significant relationships between parasite abundance and host traits can be missed when abundance values are binned for analysis, and vice versa that when there is no underlying relationship between abundance and host traits, analysis of binned data can create a spurious one. This holds regardless of the prevalence of infection or the level of parasite aggregation in a host sample. These findings argue strongly for the practice of binning abundance data as a predictor variable to be abandoned in favour of more appropriate analytical approaches.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press.
Figure 0

Figure 1. Proportions of studies in which parasite abundance was used as a predictor variable, where abundance values were used as individual counts (no binning) or instead lumped into categories of infection (binning). In the latter case, the proportion of cases where the data were split into 2 bins, 3 bins or more than 3 bins is also shown. The data come from studies published in parasitology journals between 2010 and 2024, and are shown separately for studies based on natural parasite infections and those that used experimental infections.

Figure 1

Figure 2. Reasons given by authors of studies in which parasite abundance was used as a predictor variable and where abundance values were lumped into categories of infection (binning). The data come from studies published in parasitology journals between 2010 and 2024, and are shown separately for studies based on natural parasite infections and those that used experimental infections.

Figure 2

Figure 3. Proportions of studies in which parasite abundance was used as a predictor variable, where abundance values were used as individual counts (no binning) or instead lumped into categories of infection (binning). The data come from studies published in journals of ecology and animal behaviour between 2010 and 2024, and are shown separately for studies based on natural parasite infections and those that used experimental infections.

Figure 3

Figure 4. Three simulated distributions of parasite abundances among 60 host individuals. In each case, the total number of parasites in the sample is 240, with a mean abundance of 4.0 parasites per individual host. Note the different scales on the y-axes. The horizontal bars with alternating shading indicate how abundance values were lumped into 2 bins (infected and non-infected; distribution A) or 3 bins (distributions B and C).

Figure 4

Table 1. Results of 2 alternative ways of analysing the association between parasite abundance and a hypothetical host trait, for 3 different distributions of parasites among a sample of 60 hosts (see Figure 4)

Figure 5

Figure 5. Association between simulated parasite abundance data (numbers of parasites per host; from distribution C in Figure 4) and a hypothetical host trait, measured both using a Spearman correlation coefficient (top) and a 1-way ANOVA after binning hosts into 3 groups based on their parasite abundance (bottom): low (1–2 parasites/host), moderate (3–5 parasites/host) and high (6–25 parasites/host). Whereas the correlation indicates a non-significant association, the ANOVA suggests there is one.

Figure 6

Figure 6. Association between simulated parasite abundance data (numbers of parasites per host; from distribution C in Figure 4) and a hypothetical host trait, measured both using a Spearman correlation coefficient (top) and a 1-way ANOVA after binning hosts into 3 groups based on their parasite abundance (bottom): low (1–2 parasites/host), moderate (3–5 parasites/host) and high (6–25 parasites/host). Whereas the correlation indicates a significant positive association, the ANOVA finds no association.

Supplementary material: File

Poulin supplementary material

Poulin supplementary material
Download Poulin supplementary material(File)
File 7.7 KB