Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-08T03:30:49.862Z Has data issue: false hasContentIssue false

From fossils to phylogenies: exploring the integration of paleontological data into Bayesian phylogenetic inference

Published online by Cambridge University Press:  22 April 2025

Laura P. A. Mulvey
Affiliation:
GeoZentrum Nordbayern, Department of Geography and Geosciences, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
Mark C. Nikolic
Affiliation:
Division of Paleontology (Invertebrates), American Museum of Natural History, New York, New York, U.S.A
Bethany J. Allen
Affiliation:
Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland Computational Evolution Group, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, Lausanne, Switzerland
Tracy A. Heath
Affiliation:
Department of Ecology, Evolution, & Organismal Biology, Iowa State University, Ames, Iowa, U.S.A
Rachel C. M. Warnock*
Affiliation:
GeoZentrum Nordbayern, Department of Geography and Geosciences, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
*
Corresponding author: Rachel C. M. Warnock; Email: rachel.warnock@fau.de

Abstract

Incorporating paleontological data into phylogenetic inference can greatly enrich our understanding of evolutionary relationships by providing insights into the diversity and morphological evolution of a clade over geological timescales. Phylogenetic analysis of fossil data has been significantly aided by the introduction of the fossilized birth–death (FBD) process, a model that accounts for fossil sampling through time. A decade on from the first implementation of the FBD model, we explore its use in more than 170 empirical studies, summarizing insights gained through its application. We identify a number of challenges in applying the model in practice: it requires a working knowledge of paleontological data and their complex properties, Bayesian phylogenetics, and the mechanics of evolutionary models. To address some of these difficulties, we provide an introduction to the Bayesian phylogenetic framework, discuss important aspects of paleontological data, and finally describe the assumptions of the models used in paleobiology. We also present a number of exemplar empirical studies that have used the FBD model in different ways. Through this review, we aim to provide clarity on how paleontological data can best be used in phylogenetic inference. We hope to encourage communication between model developers and empirical researchers, with the ultimate goal of developing models that better reflect the data we have and the processes that generated them.

Information

Type
Invited Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Paleontological Society
Figure 0

Table 1. Application of fossilized birth–death (FBD) models to different types of phylogenetic character data. To date, studies have included molecular data for extant samples or morphological data for extant and extinct (†) samples. (Molecular data could theoretically also be included for extinct samples, if ancient DNA is available.) In “total-evidence” analyses, character data are included for both extant (molecular and morphology) and extinct (morphology only) samples. “Extant only” refers to analyses in which molecular data are included for extant samples only and fossils are placed using constraints. “Morphology” refers to analyses in which morphology is included for both extant and extinct samples. “Extinct only” refers to analyses of fully extinct trees in which morphological data are available for extinct samples. “No phylogenetic data” refers to analyses in which no phylogenetic character data are included (see Boxes 2 and 5). All analyses using the FBD model must also include temporal data. See Section The Data for more information

Figure 1

Table 2. Available fossilized birth–death (FBD) models and extensions

Figure 2

Figure 1. A, The number of extant vs. extinct samples. Each point represents an individual analysis. Points above the dashed line have more extant samples relative to extinct, whereas those under the line have more extinct samples. B, The number of samples (extant plus extinct samples) vs. the number of morphological characters used in an analysis. C, The number of analyses using plants, invertebrates, and vertebrates.

Figure 3

Figure 2. The temporal range, based on the oldest fossil age to the tips of the tree from each of the empirical studies in the literature survey. Timeline plotted using R package divDyn Kocsis 2019. CM, Cambrian; O, Ordovician; S, Silurian; D, Devonian; C, Carboniferous; P, Permian; Tr, Triassic; J, Jurassic; K, Cretaceous; Pg, Paleogene; Ng, Neogene.

Figure 4

Figure 3. Symbolic representation of Bayes theorem (eq. 1) for a phylogenetic analysis of fossil ages and morphological characters. The data components include a morphological character matrix and fossil ages. The model parameters are a phylogeny with branch times, the diversification and sampling parameters of the fossilized birth–death (FBD) model, the lineage-specific branch rates of the clock model, and the parameters of the morphological substitution model (Mk model; Lewis 2001). The probabilities are delineated to highlight the joint posterior distribution, likelihood, and prior probability distributions. The FBD probability density includes some components for which we calculate prior probabilities (the tree topology, branch times, and diversification and sampling parameters) and some that are observations in the likelihood (fossil ages). Thus, these are separated to clarify the contributions to the posterior density coming from the prior and those coming from the data.

Figure 5

Figure 4 The temporal information available from fossils and how it can be incorporated into fossilized birth–death (FBD) models. A, Section with four fossil beds, b1–b4. Within each bed, there are fossils that can be used to provide temporal information for an FBD analysis. In this section, there are two different fossil taxa depicted as purple and black ammonites. Fossil age information can be taken as either occurrence data or stratigraphic range data. Occurrence data describe the age uncertainty associated with an individual sample or a discrete interval (shown to the left of the section). Stratigraphic range data describe the age around multiple fossils of the same taxon. The lower and upper bounds of the range (i.e., the first and last appearances) will also have a degree of age uncertainty around each of them (shown to the right of the section). Different FBD models are available to incorporate these are fundamentally different way of using fossil age information. B, How these different models incorporate the temporal information. The FBD specimen model uses occurrence information. Note that multiple fossil specimens from the same bed that are associated with the same age uncertainty should only be incorporated into the analysis once. FBD models do not currently have a way to account for abundance information. The FBD range model uses stratigraphic range information. In this case, it uses the first and last appearance fossil ages. Note, for the taxon in purple, there is only one fossil (i.e., a singleton); therefore, the occurrence and range information are the same. The gray branches on the tree represent unsampled lineages or taxa.