Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-09T09:11:39.403Z Has data issue: false hasContentIssue false

Assessing the impact of incomplete species sampling on estimates of speciation and extinction rates

Published online by Cambridge University Press:  16 March 2020

Rachel C. M. Warnock
Affiliation:
Department of Biosystems Science & Engineering, Eidgenössische Technische Hochschule Zürich, 4058Basel, Switzerland; Swiss Institute of Bioinformatics (SIB), 1015Lausanne, Switzerland; and Department of Paleobiology, National Museum of Natural History, Smithsonian Institution, Washington, D.C.20560, U.S.A. E-mail: rachel.warnock@bsse.ethz.ch
Tracy A. Heath
Affiliation:
Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, U.S.A. E-mail: phylo@iastate.edu
Tanja Stadler
Affiliation:
Department of Biosystems Science & Engineering, Eidgenössische Technische Hochschule Zürich, 4058Basel, Switzerland; and Swiss Institute of Bioinformatics (SIB), 1015Lausanne, Switzerland. E-mail: tanja.stadler@bsse.ethz.ch

Abstract

Estimating speciation and extinction rates is essential for understanding past and present biodiversity, but is challenging given the incompleteness of the rock and fossil records. Interest in this topic has led to a divergent suite of independent methods—paleontological estimates based on sampled stratigraphic ranges and phylogenetic estimates based on the observed branching times in a given phylogeny of living species. The fossilized birth–death (FBD) process is a model that explicitly recognizes that the branching events in a phylogenetic tree and sampled fossils were generated by the same underlying diversification process. A crucial advantage of this model is that it incorporates the possibility that some species may never be sampled. Here, we present an FBD model that estimates tree-wide diversification rates from stratigraphic range data when the underlying phylogeny of the fossil taxa may be unknown. The model can be applied when only occurrence data for taxonomically identified fossils are available, but still accounts for the incomplete phylogenetic structure of the data. We tested this new model using simulations and focused on how inferences are impacted by incomplete fossil recovery. We compared our approach with a phylogenetic model that does not incorporate incomplete species sampling and to three fossil-based alternatives for estimating diversification rates, including the widely implemented boundary-crosser and three-timer methods. The results of our simulations demonstrate that estimates under the FBD model are robust and more accurate than the alternative methods, particularly when fossil data are sparse, as the FBD model incorporates incomplete species sampling explicitly.

Information

Type
Featured Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © 2020 The Paleontological Society. All rights reserved
Figure 0

Figure 1. The fossilized birth–death (FBD) range process. A, The complete FBD tree, including the observed data. Diamonds represent observed samples of fossils or extant species. Points labeled oi and yi are the first and last appearances of range i, respectively. B, The extended sampled stratigraphic ranges, illustrating the information used to calculate the likelihood. bi and di are the species attachment and extinction times, respectively. The attachment time bi is the time at which species i attaches to other lineages in an incompletely sampled tree, which is not necessarily equivalent to speciation time of species i. For example, the attachment time b2 for species 2 is not equivalent to the true species origin time shown in A, instead it is the origin time of its nonsampled ancestor, species 3. b5 = x0 is the origin time. In total we sample k = 5 occurrences and n = 4 ranges, with l = 2 extant and m = 2 extinct ranges. The dashed line highlights the number of possible attachment points (black circles) used to calculate γi, which accounts for each possible topology, given the value of bi. γi is equivalent to the number of lineages that coexist with species i at time bi. The exception is the oldest species, where bi represents the origin and γi is always one. The number of possible attachment points for each species is γ1 = 2, γ2 = 2, γ4 = 1, and γ5 = 1.

Figure 1

Figure 2. Boundary-crosser versus three-timer taxon categories illustrated for a given interval ti. Boundary-crosser categories: Ft taxa first appear and bL taxa last appear within the interval going forward in time, while the first and last appearance of FL taxa is confined to the same interval and bt taxa appear before and after the interval. The boundary-crosser method uses the bL, Ft, and bt taxa only in the estimation of λ and μ. Three-timer categories: two-timers are sampled immediately before and within bin ti (2ti) or within and immediately after ti (2ti+1). Three-timers are sampled immediately before, within, and after ti (3t), and part-timers are sampled immediately before and after but not within ti (pt) (note all three-timers are also two-timers). Singletons (i.e., taxa sampled from a single interval or FL taxa) are excluded before analysis. Image adapted from Alroy (2010, 2014).

Figure 2

Table 1. Overview of data used by different methods for each simulation condition. When the total number occurrences or fossil samples is not used, this refers to scenarios in which only sampled-in-bin data are available. BD, birth–death; FBD, fossilized birth–death.

Figure 3

Figure 3. Performance of the fossilized birth–death (FBD) model obtained for different turnover and sampling scenarios assuming the number of fossil samples is known. In this set of experiments, the model used for simulation and inference is the same. Results are shown for analyses excluding (ρ = 0) and including (ρ = 1) extant singletons. Rows show results obtained for speciation (λ), extinction (μ), and sampling (PS). Columns show results obtained at different sampling levels (per-interval sampling probability PS = 0.1, 0.5, or 0.99). In each box, the x-axis represents turnover and the y-axis represents coverage (the proportion simulation replicates out of 100 that contain the true value). The dashed line highlights the 0.95 coverage level.

Figure 4

Figure 4. Performance of the birth–death (BD) model obtained for different turnover and sampling scenarios assuming the number of fossil samples is known. This model assumes that each species has been sampled at least once. Analysis using this model always excludes extant singletons (ϱ = 0). Rows show results obtained for speciation (λ), extinction (μ), and sampling (PS). Columns show results obtained at different sampling levels (per-interval sampling probability PS = 0.1, 0.5, or 0.99). In each box, the x-axis represents turnover and the y-axis represents coverage (the proportion simulation replicates out of 100 that contain the true value). The dashed line highlights the 0.95 coverage level.

Figure 5

Figure 5. Performance of the fossilized birth–death (FBD) model obtained for different turnover and sampling scenarios assuming only sampled-in-bin data are available. In this set of experiments, we use sampled-in-bin data, which is a violation of the model used during inference. Results are shown for analyses excluding (ρ = 0) and including (ρ = 1) extant singletons. Rows show results obtained for speciation (λ), extinction (μ), and sampling (PS). Columns show results obtained at different sampling levels (per-interval sampling probability PS = 0.1, 0.5, or 0.99). In each box, the x-axis represents turnover and the y-axis represents coverage (the proportion simulation replicates out of 100 that contain the true value). The dashed line highlights the 0.95 coverage level.

Figure 6

Figure 6. Performance of the fossilized birth–death (FBD) model obtained under nonuniform fossil recovery assuming the number of fossil samples is known. In this set of experiments, the model used for simulation violates the model used for inference. Results are shown for analyses excluding (ρ = 0) and including (ρ = 1) extant singletons. Rows show results obtained for speciation (λ) and extinction (μ). Columns show results obtained under different nonuniform sampling scenarios, where sampling increases (0.01  → 0.99) or decreases (0.99  → 0.01) linearly toward the present. In each box, the x-axis represents turnover and the y-axis represents coverage (the proportion simulation replicates out of 100 that contain the true value). The dashed line highlights the 0.95 coverage level.

Figure 7

Figure 7. Comparison between different approaches to estimating speciation (λ) and extinction (μ) rates assuming the number of fossil samples is known. Results are shown for low (A, r = 0.1), medium (B, r = 0.5), and high (C, r = 0.9) turnover. Each column shows results obtained at different sampling probabilities (per-interval sampling probability PS = 0.1, 0.5, or 0.99). For each box, the y-axis represents the percentage error of the median estimate (birth–death [BD] and fossilized birth–death [FBD] approaches) or the point estimate (all other approaches) averaged across 100 simulation replicates. The best (lowest) and worst (highest) percentage error are highlighted for each turnover and sampling scenario. Analysis using the FBD model includes extant singletons (ρ = 1). Results obtained excluding extant singletons are shown in Supplementary Fig. S3.

Figure 8

Figure 8. Comparison between different approaches to estimating speciation (λ) and extinction (μ) rates assuming only sampled-in-bin data are available. Results are shown for low (A, r = 0.1), medium (B, r = 0.5), and high (C, r = 0.9) turnover. Each column shows results obtained at different levels (per-interval sampling probability PS = 0.1, 0.5, or 0.99). For each box, the y-axis represents the percentage error of the median estimate (birth–death [BD] and fossilized birth–death [FBD] approaches) or the point estimate (all other approaches) averaged across 100 simulation replicates. The best (lowest) and worst (highest) percentage error are highlighted for each turnover and sampling scenario. Analysis using the FBD model includes extant singletons (ρ = 1). Results obtained excluding extant singletons are shown in Supplementary Fig. S4.

Figure 9

Figure 9. Comparison between the proportion of sampled taxa incorporated into rate estimation using different methods. BD, birth–death; FBD, fossilized birth–death. Relts are shown for low (A, r = 0.1), medium (B, r = 0.5), and high (C, r = 0.9) turnover. Each column shows results obtained at different sampling probabilities (per-interval sampling probability PS = 0.1, 0.5, or 0.99). For each box, the y-axis represents the proportion of the total number of species included in the analysis averaged across 100 replicates. For the FBD analysis, results are shown for analyses excluding (ρ = 1) and including (ρ = 0) extant singletons.