Hostname: page-component-89b8bd64d-rbxfs Total loading time: 0 Render date: 2026-05-06T17:53:51.488Z Has data issue: false hasContentIssue false

Improved estimation of macroevolutionary rates from fossil data using a Bayesian framework

Published online by Cambridge University Press:  12 September 2019

Daniele Silvestro
Affiliation:
Department of Biological and Environmental Sciences, University of Gothenburg,and Global Gothenburg Biodiversity Centre, 41319 Gothenburg, Sweden; Department of Computational Biology, University of Lausanne, and Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland. E-mail: daniele.silvestro@bioenv.gu.se
Nicolas Salamin
Affiliation:
Department of Computational Biology, University of Lausanne, and Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland
Alexandre Antonelli
Affiliation:
Department of Biological and Environmental Sciences, University of Gothenburg, and Global Gothenburg Biodiversity Centre, 41319 Gothenburg, Sweden; Royal Botanic Gardens, Kew, Richmond TW9 3AE, United Kingdom
Xavier Meyer
Affiliation:
Department of Computational Biology, University of Lausanne, and Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland; Department of Integrative Biology, University of California, Berkeley, California 94720, U.S.A.

Abstract

The estimation of origination and extinction rates and their temporal variation is central to understanding diversity patterns and the evolutionary history of clades. The fossil record provides the only direct evidence of extinction and biodiversity changes through time and has long been used to infer the dynamics of diversity changes in deep time. The software PyRate implements a Bayesian framework to analyze fossil occurrence data to estimate the rates of preservation, origination, and extinction while incorporating several sources of uncertainty. Building upon this framework, we present a suite of methodological advances including more complex and realistic models of preservation and the first likelihood-based test to compare the fit across different models. Further, we develop a new reversible jump Markov chain Monte Carlo algorithm to estimate origination and extinction rates and their temporal variation, which provides more reliable results and includes an explicit estimation of the number and temporal placement of statistically significant rate changes. Finally, we implement a new C++ library that speeds up the analyses by orders of magnitude, therefore facilitating the application of the PyRate methods to large data sets. We demonstrate the new functionalities through extensive simulations and with the analysis of a large data set of Cenozoic marine mammals. We compare our analytical framework against two widely used alternative methods to infer origination and extinction rates, revealing that PyRate decisively outperforms them across a range of simulated data sets. Our analyses indicate that explicit statistical model testing, which is often neglected in fossil-based macroevolutionary analyses, is crucial to obtain accurate and robust results.

Information

Type
Articles
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Paleontological Society. All rights reserved 2019
Figure 0

Figure 1. PyRate's main analytical structure. The input data consist of dated fossil occurrences assigned to lineages, e.g., species or genera (represented by circles in A), including singletons and extant taxa. The Bayesian framework jointly estimates the life spans of all lineages (dashed lines), preservation rates (B), and origination and extinction rates (C). All parameter estimates are inferred as posterior mean values (solid lines in B and C) and 95% credible intervals (shaded areas in B and C).

Figure 1

Table 1. Glossary defining the main terms, acronyms, and parameters used in this study.

Figure 2

Figure 2. Graphical representation of the preservation rate models implemented in PyRate. In the homogeneous Poisson process model (A), the preservation rate is constant through time, and the expected times of origination and extinction (s, e) are exponentially distributed. In the nonhomogeneous Poisson process model (B), preservation rates vary throughout the life span of a species, generating gamma-like expected s, e. The time-variable Poisson process model (C) assumes piecewise constant preservation rates (e.g., different rates for each epoch) and the resulting expected s, e values combine multiple exponential distributions. All models can incorporate rate heterogeneity across lineages (gamma models).

Figure 3

Table 2. Thresholds for change in Akaike information criterion (ΔAICc) estimated by simulations to test between different preservation models. Depending on the selected best model (i.e., the one with the lowest AICc score), different thresholds are applied to determine whether the model is significantly better than the alternatives (p <0.05). Values in parentheses show the thresholds estimated for p <0.01. Cases in which ΔAICc values do not exceed the thresholds provided here indicate that the evidence in the data is not sufficient to confidently choose among preservation models. HPP, homogeneous Poisson process; NHPP, nonhomogeneous Poisson process; TPP, time-variable Poisson process.

Figure 4

Table 3. Model testing using the reversible jump Markov chain Monte Carlo (RJMCMC ) and birth–death Markov chain Monte Carlo (BDMCMC) algorithms. The simulations (replicated 100 times) are based on different numbers of origination rates (J) and extinction rates : (1) J = 1, K = 1; (2) J = 3, K = 3; and (3) J = 5, K = 5. For each value of J and K, we estimated the how frequently it was estimated as the best model by RJMCMC and BDMCMC across all replicates. Values in bold represent the frequencies at which the correct models were identified by the algorithms.

Figure 5

Figure 3. Marginal rates through time inferred for simulation scenario 2. The data sets were simulated under decreasing rates of origination (with shifts at 20 and 10 Ma) and extinction rates (with a peak at 15–10 Ma; true values are shown as dashed lines). Estimates are averaged across 100 simulations, with the shaded areas showing 95% credible intervals. The top row shows the origination and extinction rates inferred using the birth–death Markov chain Monte Carlo algorithm, whereas the bottom row shows the results of the reversible jump Markov chain Monte Carlo.

Figure 6

Table 4. Comparison of accuracy and precision of the marginal origination and extinction rates between the new reversible jump Markov chain Monte Carlo (RJMCMC ) and birth–death Markov chain Monte Carlo (BDMCMC) algorithms. Accuracy (relative errors) and precision are averaged across analyses of 100 simulated data sets for each simulation scenario. While the precision of rate estimates (here quantified by the relative size of the 95% credible intervals) is similar between algorithms, the RJMCMC implementation yields substantially more accurate results, especially in the presence of rate heterogeneity through time.

Figure 7

Figure 4. Origination and extinction rates estimated using different methods. Relative errors (A) in the rate estimates as inferred using the boundary-crossing method (Foote 2000), the three-timer approach (Alroy 2014), and our new algorithm implemented in PyRate. Box plots summarize the results of 100 simulations under three diversification scenarios. The errors were computed based on the rate estimates within 2 Myr time bins in boundary-crossing and three-timer analyses and based on the posterior means of the marginal rates through time in PyRate analyses (B).

Figure 8

Figure 5. Origination and extinction rates through time in marine mammals. Marginal posterior estimates of origination rates (A) and extinction rates (B) are shown together with the respective 95% credible intervals. These estimates incorporate not only parameter uncertainty, but also dating uncertainties (deriving from 10 replicated analyses obtained by resampling the ages of the fossil occurrences) and uncertainties around model selection, as the reversible jump Markov Chain Monte Carlo algorithm samples the number of rate shifts from their joint posterior distribution. Plots on the right show the frequency of sampling a shift in origination (C) and extinction (D) rates within arbitrarily small time bins (here set to 0.5 Myr). Dashed lines show log Bayes factors of 2 and 6 (as inferred from Markov chain Monte Carlo simulation). Sampling frequencies exceeding these lines indicate positive and strong statistical evidence for a rate shift, respectively. For comparison, origination and extinction rate estimates were also inferred under the boundary-crossing method (E, F) (Foote 2000) and the three-timer approach (G, H) (Alroy 2014). The ages of fossil occurrences were randomized 100 times based on the respective stratigraphic intervals (as in the PyRate analyses), and the rates were estimated using equal time bins of 2 Myr.

Supplementary material: Link

Silvestro Dataset

Link