Hostname: page-component-5db58dd55d-qmkzp Total loading time: 0 Render date: 2026-05-31T13:46:44.385Z Has data issue: false hasContentIssue false

Drivers of fossil sampling heterogeneity: lessons from planktonic foraminifera

Published online by Cambridge University Press:  26 March 2026

Jérémy Andréoletti*
Affiliation:
Institut de Biologie, École Normale Supérieure, Université PSL, CNRS, INSERM, Paris, France
Isabel Fenton
Affiliation:
Department of Earth Sciences, University of Oxford, Oxford, U.K. The Alan Turing Institute, London, U.K.
Hélène Morlon
Affiliation:
Institut de Biologie, École Normale Supérieure, Université PSL, CNRS, INSERM, Paris, France
Erin E. Saupe
Affiliation:
Department of Earth Sciences, University of Oxford, Oxford, U.K.
*
Corresponding author: Jérémy Andréoletti; Email: jeremy.andreoletti@gmail.com

Abstract

The fossil record is subject to multiple biases that can distort macroevolutionary and paleoecological inferences. Although temporal and spatial sampling biases have received substantial attention, other sources of fossil sampling heterogeneity remain less well quantified. Using the Triton database of planktonic foraminifera, we assess the influence of geographic, ecological, morphological, and methodological factors on fossil recovery rates. We first apply a temporal subsampling method to standardize fossil occurrences over geologic time, validating this approach against an expert-curated lineage-through-time trajectory. After subsampling, the occurrences remain unevenly distributed throughout species’ lifetimes and inhomogeneously distributed across species, reflecting biological signal and/or persistent sampling biases.

We then investigate this residual sampling heterogeneity with a generalized additive model incorporating relevant predictors from Triton. Our results reveal that, after correcting for temporal biases, geographic predictors (paleolatitude, paleolongitude, longitudinal spread) explain nearly a third of sampling variation. Species-specific ecological and morphological attributes contribute an additional fraction, among which mean relative abundance emerges as the main factor. Additional predictors of fossil sampling rates include age-calculation methods and biostratigraphic sampling biases. Despite accounting for multiple sources of variation, 37% of the deviance remains unexplained, suggesting unmodeled biological, stratigraphic, diagenetic, or taxonomic drivers of sampling heterogeneity.

Overall, observed recovery rates question the validity of the homogeneous-sampling assumption used in most diversification models, and this heterogeneity cannot be reduced to a single dominant factor. This conclusion reinforces the need for integrated subsampling approaches and process-based models that explicitly account for heterogeneous fossilization rates to improve the reliability of macroevolutionary analyses.

Information

Type
Rapid Communication
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Paleontological Society
Figure 0

Figure 1. Impact of temporal subsampling on fossil occurrence distributions in the Triton database. (A) Distribution through time of sediment samples and occurrences before (blue) and after (green) temporal subsampling. The scaled profile of the expert-curated lineage-through-time (LTT) trajectory (purple) aligns with the distribution of subsampled occurrences, validating our temporal standardization method. (B) Levels of fossil sampling relative to species’ life span. Observed fossil sampling in the Triton database (blue circles) and subsampled Triton database (green circles) shows greater heterogeneity for species of similar age range than expected under a homogeneous Poisson process (darker squares). (C) Distributions of occurrences over species’ expert-curated longevities. Top, raw Triton database, illustrating the increase in the number of occurrences toward the present. Middle, Triton database restricted to extinct species, showing an excess of recent occurrences likely due to sampling biases. Bottom, subsampled database restricted to extinct species, showing a symmetric bell-shaped distribution, the tails of which indicate occurrences recorded over and beyond expected species’ temporal ranges.

Figure 1

Figure 2. Heterogeneity in fossil sampling rates for planktonic foraminifera explained by geographic, ecological, stratigraphic, and other factors. (A) Spherical maps (left) show predicted sampling rates across paleolatitudes and paleolongitudes, with lighter colors representing higher sampling rates. The top sphere is centered on longitude 0, the bottom on longitude 180. The adjacent scatter plots and curves visualize partial effects of each predictor—relative abundance, paleolatitudinal spread, paleolongitudinal spread, and sample depth—on fossil sampling rates. The dashed lines represent ±2 SE. (B) A simplified table of the proportions of explained deviance and p-values for the predictors in the generalized additive model (GAM). For factors with multiple categories, the deviance is printed only on the first row; subsequent levels show a dash. For morphogroups and ecogroups, only a single row is shown for nonsignificant groups, with the corresponding range of p-values. Significance thresholds: *p < 0.05; ***p < 0.001. (C) A pie chart displaying the proportion of deviance explained by each covariate.