The fossil record is a fundamental natural historical archive for understanding evolution and Earth's history. Without information from fossils, we would have considerably less knowledge concerning the extinction and emergence of lineages (Sun et al. Reference Sun, Dilcher, Zheng and Zhou1998; Valentine et al. Reference Valentine, Jablonski and Erwin1999; Field et al. Reference Field, Benito, Chen, Jagt and Ksepka2020), the nature of biotic crises (Sun et al. Reference Sun, Joachimski, Wignall, Yan, Chen, Jiang, Wang and Lai2012; Lyson et al. Reference Lyson, Miller, Bercovici, Weissenburger, Fuentes, Clyde, Hagadorn, Butrim, Johnson and Fleming2019; Petsios et al. Reference Petsios, Thompson, Pietsch and Bottjer2019), and major phenotypic innovations in Earth's flora and fauna (Daeschler et al. Reference Daeschler, Shubin and Jenkins2006; Murdock and Donoghue Reference Murdock and Donoghue2011; Stein et al. Reference Stein, Berry, Hernick and Mannolini2012). But behind the illuminating biological information contained in the fossil record is the reality that fossil data are inherently incomplete (Darwin Reference Darwin1859; Foote and Sepkoski Reference Foote and Sepkoski1999; Smith Reference Smith2001; Kidwell and Holland Reference Kidwell and Holland2002), due to geological factors (Raup Reference Raup1976; Smith and McGowan Reference Smith and McGowan2005), factors related to fossil preservation (e.g., taphonomic bias; Sansom et al. Reference Sansom, Gabbott and Purnell2010), and asymmetrical research interest and sampling intensity among workers (e.g., sampling bias; Smith Reference Smith1994, Reference Smith2001). These biases have been characterized in the fossil record of a variety of organismal groups (Crane et al. Reference Crane, Herendeen and Friis2004; Brocklehurst et al. Reference Brocklehurst, Upchurch, Mannion and O'Connor2012; Dean et al. Reference Dean, Mannion and Butler2016; Driscoll et al. Reference Driscoll, Dunhill, Stubbs and Benton2019), and a growing number of sampling proxies can be used to account for the multitude of factors that distort paleobiological data. As a result, our ability to explore major questions related to biodiversity in the fossil record is enhanced, but methodological problems remain (Dean et al. Reference Dean, Mannion and Butler2016; Sakamoto et al. Reference Sakamoto, Venditti and Benton2017). Furthermore, the patterns of biases associated with a majority of fossil groups are uncharacterized, and the association between biased records and phylogenetic data content remains underexplored (Sansom et al. Reference Sansom, Gabbott and Purnell2010, Reference Sansom, Wills and Williams2017; Sansom Reference Sansom2015; Brocklehurst and Benevento Reference Brocklehurst and Benevento2020).
Taphonomic and sampling biases still represent consistent barriers to reconstructing the phylogenetic relationships of fossil organisms (Patterson Reference Patterson1981; Sansom et al. Reference Sansom, Gabbott and Purnell2010, Reference Sansom, Wills and Williams2017; Sansom Reference Sansom2015; Brocklehurst and Benevento Reference Brocklehurst and Benevento2020), which are essential to the evolutionary frameworks that synthetic studies of biodiversity (Raup Reference Raup1979), biostratigraphy (Norell and Novacek Reference Norell and Novacek1992), and paleobiogeography (Benson et al. Reference Benson, Mannion, Butler, Upchurch, Goswami and Evans2013) rely upon (Sakamoto et al. Reference Sakamoto, Venditti and Benton2017). Incomplete fossils with missing morphological data have been shown to decrease the accuracy of inferred phylogenies (Guillerme and Cooper Reference Guillerme and Cooper2016; Vernygora et al. Reference Vernygora, Simões and Campbell2020), but this effect is lessened when fossil taxa are scored alongside extant taxa and more morphological characters are coded (Guillerme and Cooper Reference Guillerme and Cooper2016) or when the number of taxa included in the analysis is increased (Vernygora et al. Reference Vernygora, Simões and Campbell2020).
The most straightforward way to minimize the negative effects of missing fossil morphological data on inferring evolutionary relationships is by utilizing the most complete and best-preserved fossil specimens. Most animal and plant fossil records, however, are made up of fragmentary remains and disarticulated parts (Crane et al. Reference Crane, Herendeen and Friis2004; Brocklehurst et al. Reference Brocklehurst, Upchurch, Mannion and O'Connor2012; Dean et al. Reference Dean, Mannion and Butler2016), and by focusing primarily on complete specimens for phylogenetic analyses, the majority of extinct biodiversity and available fossil data in natural history collections is often excluded. Consequently, important fossil taxa known from fragmentary material remain underutilized in phylogenetic analyses, and we are left with considerable uncertainty in the evolutionary relationships of most known fossil organisms, which obscures our understanding of major evolutionary patterns in Earth's history (Sansom et al. Reference Sansom, Gabbott and Purnell2010; Dornburg et al. Reference Dornburg, Friedman and Near2015; Sansom Reference Sansom2015).
Maximizing the vast amount of evolutionary information available in fossil collections necessitates a quantitative characterization of the biases present in this record. Furthermore, if we wish to accurately assess the evolutionary relationships of incompletely preserved fossil specimens, we need to know whether the phylogenetic data preserved in these incomplete samples are reliable and consistent. We compiled a large dataset from natural history collections (6585 specimens; 14,417 skeletal elements) to characterize skeletal representation bias in the extensive and diverse fossil record of squamates (lizards, snakes, amphisbaenians, and mosasaurs). We then applied parsimony- and model-based estimates of phylogenetic signal to test a major question fundamental to paleobiology: Does the observed fossil record contain reliable phylogenetic information?
Materials and Methods
Sampling the Fossil Squamate Skeleton in Natural History Collections
Despite their biases, natural history collections are the most data-rich record of the evolutionary history of squamates and are the basis for all morphological and most occurrence-based analyses of their fossil record. Although natural history collections have been used to assess the nature of bias and asymmetrical preservation in the squamate fossil record, these assessments are either limited to general observations (Nydam Reference Nydam2013; Rage Reference Rage2013) or restricted taxonomically to mosasaurs (Driscoll et al. Reference Driscoll, Dunhill, Stubbs and Benton2019). In this study, we quantify which regions of the squamate skeleton are more prevalent in natural history collections and, by extension, their observed fossil record.
To understand which regions of the fossil squamate skeleton appear more frequently in museum collections, we divided the skeleton into seven discrete regions: (1) jaws and palatal bones in the skull (all bones in the mandible, premaxilla, maxilla, vomer, palatine, pterygoid, and teeth); (2) posterior cranial bones (nasal, frontal, parietal, jugal, quadrate, braincase, etc.); (3) axial elements (i.e., vertebrae and ribs); (4) pectoral girdle; (5) pelvic girdle; (6) appendicular elements; and (7) dermal elements (Fig. 1, Supplementary Fig. S1). Using these criteria, we sampled for the number of occurrences of fossil squamate skeletal elements belonging to each region. We surveyed 6585 fossil squamate specimens and tallied 14,417 occurrences of individual skeletal elements across three major fossil squamate body plans: giant, fully aquatic/marine squamates (mosasaurs); small to medium-sized, four-limbed, terrestrial lizards; and small to medium-sized, limb-reduced/limbless squamates (e.g., snakes, amphisbaenians). Our surveyed specimens range in age from the Late Jurassic to the late Pleistocene (Fig. 2A).
This study used in-person observations of fossil squamate collections and surveyed readily available digital collections databases. In-person sampling (institutions: American Museum of Natural History [AMNH], New York, N.Y., U.S.A.; Denver Museum of Nature & Science [DMNH], Denver, Colo., U.S.A.; and Yale Peabody Museum of Natural History [YPM], New Haven, Conn., U.S.A.) was carried out by identifying squamate fossils to the lowest taxonomic level and with as specific anatomical terminology as possible. Taphonomic bias and preservation mode played key roles in the precision of taxon identification and anatomical assignment. Incomplete specimens for which not enough diagnostic information was preserved were often restricted to identification at order/suborder/family taxonomic levels, using generalized anatomical terminology (e.g., “partial jawbone,” “trunk vertebra,” “metapodial”). More complete specimens could be identified at genus/species taxonomic levels, with specific anatomical terminology (e.g., “anterior right maxilla,” “proximal left femur”).
To broadly characterize fossil squamate morphological data available in museum collections globally, we sampled online electronic databases from six institutions on three continents (Institute for Vertebrate Paleontology and Paleoanthropology [IVPP], Beijing, China; Natural History Museum London [NHMUK], London, U.K.; Natural History Museum of Los Angeles County [LACM], Los Angeles, Calif., U.S.A.; the Smithsonian National Museum of Natural History [USNM], Washington, D.C., U.S.A.; University of Florida Museum of Natural History [UFMNH], Gainesville, Fla., U.S.A.; and YPM). Our criteria for selecting these institutions was dependent on readily available digital vertebrate paleontology collections databases on the institutions’ websites, with collections data that could be downloaded as .csv files or efficiently converted to a dataframe. The compilation of each dataframe was achieved on each institution's website by a simple keyword search for “Squamata” in the taxonomy category. An example of this workflow is given in Supplementary Figure S2, using YPM's website.
Once the dataframe was downloaded, we utilized the catalogued specimen labels to discern which elements of the fossil squamate skeleton were included under a given catalogue number. This was accomplished using keyword searches (Command/Control+F) for each bone in the squamate skeleton (Supplementary Data S1–S14) in Microsoft Excel. Because multiple skeletal elements are frequently included under a single specimen catalogue number, our search criterion was “number of occurrences” of a skeletal element on the specimen labels, rather than the number of individual elements with a catalogue number. If specific counts for a given skeletal element were indicated on a specimen label (e.g., vertebra [×35]), that counted number was included in the total specimens from the collection. If no count was included on a specimen label, then we counted the occurrence of that skeletal element as a single occurrence. We designed the keyword searches to be as inclusive as possible regarding the anatomical language on each specimen label. Some specimens had thorough anatomical descriptions (e.g., “posterior portion of the left maxilla”), while others were more general (e.g., “jaw fragment”). Regardless of level of detail on the label, if an element could be assigned to one of the seven anatomical regions we designated, we included it in the survey. Because our anatomical binning of phylogenetic characters for the assessments of phylogenetic signal uses the same seven anatomical regions (see “Parsimony-based Phylogenetic Signal”), this allowed us to include the specimen data with less specific labels in our characterization of bias in museum collections.
Measurement of Phylogenetic Signal
In our survey of fossil squamate collections, we expect that taphonomic and anthropogenic sampling biases will lead to some regions of the skeleton being more frequently represented compared with others (Supplementary Fig. S3). In a phylogenetic context, this imbalanced representation of the skeleton may not align with the regions of the skeleton that are more heavily emphasized for morphological character selection in phylogenetic analyses. For example, the vertebrate skull is usually the most heavily emphasized and character-dense skeletal region in phylogenetic analyses, whereas other regions of the skeleton, such as the spinal column or appendicular regions, are comparatively less emphasized and less character dense. If a group of vertebrates has a fossil record in which few or no skull material is preserved/collected and instead is mostly represented by vertebrae, then the vertebrae are “overrepresented,” given the smaller number of phylogenetic characters they can be scored for. Conversely, the skull material, or lack thereof, is “underrepresented” in the fossil record, given the larger quantity of phylogenetic characters it can be scored for. In this scenario, the accuracy of phylogenetic analyses for this fossil group is dependent on the fidelity of the overrepresented vertebral character evolution to the hypothesized topology of that group. To make sure these characters are reliable for reconstructing phylogeny, it is necessary to use measurements of phylogenetic signal to assess the quality of morphological character data both in overrepresented and underrepresented parts of the skeleton made available by the biased fossil record.
The presence of competing morphology-based hypotheses for the higher-level evolutionary relationships of squamates (Gauthier et al. Reference Gauthier, Kearney, Maisano, Rieppel and Behlke2012; Simões et al. Reference Simões, Caldwell, Tałanda, Bernardi, Palci, Vernygora, Bernardini, Mancini and Nydam2018) represents an opportunity to explore patterns of phylogenetic signal in the extensive fossil record (>242 Myr; Simões et al. Reference Simões, Caldwell, Tałanda, Bernardi, Palci, Vernygora, Bernardini, Mancini and Nydam2018) of a major component of the modern vertebrate fauna (>11,182 extant species; Uetz et al. Reference Uetz, Freed, Aguilar and Hošek2021). Broadly, phylogenetic signal describes the tendency of closely related species to resemble each other in a given trait, as the result of shared evolutionary history (Pagel Reference Pagel1999; Blomberg et al. Reference Blomberg, Garland and Ives2003; Borges et al. Reference Borges, Machado, Gomes, Rocha and Antunes2019). Phylogenetic signal is measured by tracking how closely the evolution of a character trait aligns with a given evolutionary hypothesis. We used two topologically incongruent morphological phylogenetic datasets, one from Gauthier et al. (Reference Gauthier, Kearney, Maisano, Rieppel and Behlke2012) (GEA) and the other from Simões et al. (Reference Simões, Caldwell, Tałanda, Bernardi, Palci, Vernygora, Bernardini, Mancini and Nydam2018) (SEA), to assess the reliability of phylogenetic data present in the squamate fossil record.
The GEA and SEA character matrices and .tre files of the strict consensus trees were downloaded from a publicly available phylogenetic database on G. T. Lloyd's website (Lloyd Reference Lloyd2019). The time-calibrated Bayesian Majority-Rule Consensus tree from the SEA dataset used for calculation of the δ-statistic was obtained upon request from T. R. Simões. All phylogenetic datasets used in this study are available in Supplementary Data S15–S18 and S24–S29. To minimize the impact of highly incomplete fossils on estimates of phylogenetic signal, fossil taxa were removed from the GEA and SEA datasets using Mesquite (Maddison and Maddison Reference Maddison and Maddison2018) and the drop.tip function in the ape package (Paradis and Schliep Reference Paradis and Schliep2019) in R (R Core Team 2013), and only the extant taxa included in GEA and SEA datasets were considered herein. To effectively calculate measurements of character evolution and phylogenetic signal, morphological characters that remained in a constant state (i.e., zero changes across the entire evolutionary tree with dropped fossil taxa) were removed before carrying out analyses.
Parsimony-based Phylogenetic Signal
To assess phylogenetic signal of skeletal elements in a parsimony-based framework, we calculated character consistency index (CI; Kluge and Farris Reference Kluge and Farris1969) and retention index (RI; Farris Reference Farris1989) values corresponding to the seven anatomical bins utilized in our sampling of fossil squamate collections. We used these values to measure differences in homoplasy (CI) and retained synapomorphy (RI) for characters corresponding to the seven regions of the fossil squamate skeleton. Individual CI and RI values for the 572 pruned GEA characters and 222 pruned SEA characters were calculated using the ape, TreeSearch (Smith Reference Smith2018), and phangorn (Schliep Reference Schliep2011) packages in R (Supplementary Data S19–S22). The characters were binned according to anatomical region in our sampling of museum collections (Fig. 1, Supplementary Fig. S1), and the distributions (see Supplementary Figs. S4, S5) were then compared with one another using two nonparametric statistical tests: (1) the Mann-Whitney U-test; and (2) the Kolmogorov-Smirnov test (Supplementary Data S23). Because we performed multiple statistical comparisons of CI and RI values across all anatomical regions (GEA: n = 37; SEA: n = 29), statistical tests were run using a Bonferroni correction on the α value. Additionally, because the GEA and SEA datasets contain 30 overlapping operational taxonomic units (OTUs), we swapped the character matrices (Supplementary Data S76, S77) and their corresponding topologies (Supplementary Data S80, S81) to compare phylogenetic signal of the same character partitions using a fundamentally different topology (Supplementary Figs. S6–S8).
Model-based Phylogenetic Signal
Recently, the δ-statistic (Borges et al. Reference Borges, Machado, Gomes, Rocha and Antunes2019) was developed as a means of utilizing model-based phylogenetic comparative methods to assess phylogenetic signal within categorical character data and has been recently applied in addressing paleobiological questions (e.g., Deline et al. Reference Deline, Thompson, Smith, Zamora, Rahman, Sheffield, Ausich, Kammer and Sumrall2020). Calculating the δ-statistic for a given morphological character relies on ancestral character state probabilities at each node in a phylogeny. The model operates under the expectation that the better a phylogeny is associated with a given trait, the better it is able to infer ancestral states with minimal uncertainty. The ancestral state probabilities simulate “entropy” at a given node: the higher the uncertainty in a given character state (i.e., less phylogenetic signal), the higher the entropy value at that node (for further explanation, see Borges et al. Reference Borges, Machado, Gomes, Rocha and Antunes2019). δ is a measurement of the ratio of the distribution of higher entropy values to the distribution of lower entropy values for a given character across all nodes in a phylogeny. δ is higher when the distribution of entropies favors lower values (i.e., less ancestral state uncertainty) over higher values (i.e., more ancestral state uncertainty). Ancestral states for both the GEA and SEA characters were reconstructed using the R wrapper BTW (Griffin Reference Griffin2018) for the BayesTraits (Meade and Pagel Reference Meade and Pagel2016) program (Supplementary Fig. S9). Trees for the GEA dataset did not have branch lengths, so we time-calibrated the tree with a dataset of all fossil ages using the minimum branch length method (Laurin Reference Laurin2004). We dated 78 internal nodes dated using a previously published time-calibrated version of the GEA tree (Pyron Reference Pyron2017) and minimum branch lengths of 3, 5, and 7 Myr with the BinTimePaleoPhy function in the R package paleotree (Bapst Reference Bapst2012).
We initially ran these analyses using all non-constant characters in the GEA and SEA datasets, as we had done for our parsimony-based measurements of phylogenetic signal (Supplementary Figs. S10–S13). However, in contrast to our parsimony-based measurements, we found a strong negative relationship between the percentage of missing/nonapplicable scores and the δ-statistic value for a given character (higher percentage of missing/nonapplicable scores correlates with lower δ-statistic values) (Supplementary Figs. S14–S16, Supplementary Discussion, Supplementary Data S47). For the analyses presented herein, we only considered characters for which all taxa were scored. Additionally, the extreme morphological disparities between limbed and limbless squamate taxa contributed to a large amount of missing/nonapplicable scorings when considering the two body plans together. Because of this, we ran our analyses separately for limbed and limbless squamate taxa for the GEA dataset (Supplementary Figs. S17–S19, S21) and limbed squamates for the SEA dataset (Supplementary Fig. S20). Because there are only 8 extant legless taxa in the SEA dataset, we could not analyze δ-statistic values for this subset of data; δ-statistic model sensitivity drastically decreases using <20 taxa (Borges et al. Reference Borges, Machado, Gomes, Rocha and Antunes2019).
We used the drop.tip function in the ape package (Paradis and Schliep Reference Paradis and Schliep2019) in R (R Core Team 2013) to remove the appropriate taxa for each analysis. We then calculated node probabilities for each character state for the non-constant, completely scored GEA characters and SEA characters. The δ-statistic was then calculated for each character using the set of resulting node probabilities for the GEA and SEA datasets for each character using the R script from Borges et al. (Reference Borges, Machado, Gomes, Rocha and Antunes2019) (GitHub branch: mrborges23/delta_statistic) (Supplementary Data S30–S45, S52–S71). Bootstrap tests for differences between median δ-statistic values among anatomical distributions of phylogenetic characters were carried out using 10,000 samples with replacement in R (Supplementary Data S46). Because we performed multiple comparisons of δ-statistic values across all anatomical regions (GEA limbed: n = 37; SEA limbed: n = 29; GEA limbless: n = 11; GEA snakes: n = 16), bootstrap tests were run using a Bonferroni correction on the α value. Similarly to our parsimony-based assessments of phylogenetic signal, we swapped character matrices (Supplementary Data S78, S79) and topologies (Supplementary Data S80, S81) using 23 overlapping legged OTUs among the GEA and SEA datasets. Code to repeat parsimony- and model-based analyses of phylogenetic signal is available at the GitHub branch chwoolle/PhylogeneticSignal.
Characterizing Bias in the Squamate Fossil Record
Similar patterns of skeletal region representation are found in the fossil record of the three major squamate body plans (Fig. 2B–D). Among mosasaurs (Fig. 2B), 83.41% of the observed fossil elements came from the axial (65.47%, n = 2544) and jaws + palatal (17.94%, n = 697) regions of the skeleton. Among lizards (Fig. 2C), 77.8% of the observed fossil elements came from the axial (14.77%, n = 1050) and jaws + palatal (63.03%, n = 4308) regions of the skeleton. The overwhelming majority (96.95%, n = 2983) of fossil legless squamate skeletal elements (Fig. 2D) belong to the axial region of the skeleton, whereas jaws + palatal elements (2.67%, n = 82) make up almost all of the remaining occurrences. These results reveal that the majority of fossil squamate morphological data (84.28%; Supplementary Fig. S3A) in museum collections is from the axial and jaws + palatal regions of the skeleton, whereas other regions of the skeleton (posterior cranial, dermal, pectoral girdle, pelvic girdle, and appendicular elements) do not occur as frequently. Additionally, notable discrepancies in morphological data are present between specimens sampled in person and specimens sampled using electronic collections databases, with in-person sampling yielding a higher proportion of dermal and appendicular skeletal elements than sampled in electronic databases (Supplementary Fig. S3A). These differences emphasize the importance of increased physical access to natural history collections and caution against overreliance on databases as a panacea for paleobiological analyses (see Supplementary Text).
We interpret the distribution of fossil squamate skeletal data using the following terminology: the jaws + palatal and axial regions of the fossil squamate skeleton are comparatively overrepresented in natural history collections, whereas the posterior cranial region of the skull, pectoral and pelvic girdles, appendicular elements, and dermal elements are comparatively underrepresented. For example, the low occurrence of squamate posterior cranial skeletal elements in collections (4.74% of skeletal element occurrences) contrasts with an outsized proportion of morphological characters sourced from posterior cranial skeletal elements in the GEA and SEA datasets (43.61% and 34.29%, respectively; Supplementary Fig. S3B). Therefore, in both raw numbers and within the context of phylogenetic data, the posterior cranial region of the skull is underrepresented in the squamate fossil record. Similarly, the axial region of the skeleton, which makes up the largest portion of fossil skeletal data (46.79%) but a small portion of phylogenetic characters in the GEA and SEA datasets (4.10% and 14.99%, respectively), is overrepresented in the squamate fossil record. This decoupling of skeletal partitions represented in fossil data from the partitions emphasized in phylogenetic datasets necessitates a test of whether overrepresented regions in the fossil record contain as much, more, or less phylogenetically informative character data than underrepresented regions.
Analyses of Phylogenetic Signal
Parsimony-based Assessments of Phylogenetic Signal
Overall, we found no statistically significant difference between the distributions of CI and RI values (Fig. 3B,C,E,F) among characters corresponding to either overrepresented or underrepresented skeletal regions (Mann-Whitney U, Kolmogorov-Smirnov tests, all p-values > α; Supplementary Data S23). If we parse out the character CI and RI distribution per squamate anatomical bin (Supplementary Figs. S4, S5, Supplementary Data S15–S23), we observe that distribution shapes and median CI values (SEA dataset) and RI values (GEA dataset) of characters corresponding to the pectoral girdle are statistically significantly different from other regions of the skeleton. Among characters in the other anatomical bins (jaws + palate, axial, posterior cranial, pelvic, appendicular, dermal, other characters), the amount of homoplasy and retained synapomorphy of characters is not significantly different when compared with one another in both GEA and SEA datasets (Supplementary Data S23).
With character matrices and topologies swapped (Supplementary Fig. S6), we found no statistically significant differences in median values and distribution shapes of overrepresented and underrepresented skeletal regions at-large (Mann-Whitney U, Kolmogorov-Smirnov tests, all p-values > α; Supplementary Data S23). However, when we parse out the GEA matrix versus SEA topology results by anatomical region (Supplementary Fig. S7), we observe statistically significant differences in median value and distribution shape of the RI values of characters corresponding to the pectoral girdle and appendicular elements are statistically significantly different from other regions of the skeleton (Supplementary Data S23). The median and distribution shape of CI values of GEA matrix versus SEA topology characters corresponding to the pelvic girdle were statistically significantly different from values for other regions of the skeleton (Supplementary Data S23). For the SEA matrix versus GEA topology, there were no statistically significant differences among any of the anatomical regions (all p-values > α; Supplementary Data S23).
Model-based Assessments of Phylogenetic Signal
For both the GEA and SEA datasets, characters with 0% missing data that correspond to overrepresented regions of the squamate skeleton in the fossil record exhibit: (1) similar overall variation in δ-statistic values and (2) similar median δ-statistic values compared with characters with 0% missing data that correspond to underrepresented regions of the squamate skeleton in the fossil record (Figs. 4, 5). Two-sample bootstrap tests comparing 10,000 resamples with replacement of the median δ-statistic values of overrepresented and underrepresented characters, using a Bonferroni-corrected α (Supplementary Dataset S46), reveal that the differences are not statistically significant, regardless of evolutionary hypothesis (GEA limbed: p = 0.4275; SEA limbed: p = 0.6978; GEA limbless: p = 0.1997; GEA snakes: p = 0.475; Supplementary Dataset S46). Sensitivity analyses using each of the minimum branch lengths were run to account for branch length uncertainty in the GEA dataset, though results were broadly the same (Supplementary Figs. S17–S19, Supplementary Data S30–S45). The SEA topology was already time-calibrated (Simões et al. Reference Simões, Caldwell, Tałanda, Bernardi, Palci, Vernygora, Bernardini, Mancini and Nydam2018), therefore, sensitivity analyses concerning branch lengths were not run for that topology. Additionally, to assess the stability of our results under different model parameters, we ran a number of additional sensitivity analyses using different Uniform prior distributions (U) on character transition rates (U: 0–0.001; U: 0–0.1; U: 0–1.0). Results were largely insensitive to differences in model parameters (Supplementary Figs. S17–S20). When we swapped the GEA and SEA character matrices with their topologies (overlapping limbed squamate OTUs only), we found no statistically significant differences in the median δ-statistic values between any of the anatomical regions (all p-values > α; Supplementary Data S46, Supplementary Figs. S26–S28).
Biases in the Squamate Fossil Record
Considering that jaw + palatal bones in the skull and the axial region of the squamate skeleton contain dozens to hundreds of individual elements (e.g., vertebrae, ribs, isolated teeth), it is unsurprising that we see the highest raw numbers of skeletal elements from these regions among the fossil specimens surveyed (Fig. 2B–D, Supplementary Fig. S3A). In fact, the relative occurrences of all seven skeletal regions in the fossil records of mosasaurs (Fig. 2B) and legless squamates (Fig. 2D) appear to superficially coincide with the number of potentially “fossilizable” skeletal elements in these animals. The major anomaly, from a preservation/collecting perspective, comes from the fossil record of lizards, which disproportionately favors the jaw + palatal region of the skull (Fig. 2C). This may be due to the fact that tooth enamel is more resilient to taphonomic factors and that isolated lizard jaws can generally be easily distinguished from other small vertebrate teeth and tooth-bearing elements (combination of pleurodonty and heterodonty). As a result, based on raw numbers, the only clearly discernible taphonomic/collecting bias in the squamate fossil record comes from the portion pertaining to lizards. Future work utilizing established fossil completeness metrics (Brocklehurst et al. Reference Brocklehurst, Upchurch, Mannion and O'Connor2012; Dean et al. Reference Dean, Mannion and Butler2016; Driscoll et al. Reference Driscoll, Dunhill, Stubbs and Benton2019; Woolley et al. Reference Woolley, Bottjer and Smith2021) would allow us to further characterize the taphonomic and/or collecting biases that lead to the distribution of the fossil squamate skeleton observed herein.
If we are to consider squamate evolution as a whole, and our ability to place fossil squamates into broad, higher-level phylogenies alongside extant taxa, then the squamate fossil record is biased against the region of the skeleton containing the highest amount of phylogenetic data (the posterior cranial region of the skull, 43.61% of GEA characters and 34.29% of SEA characters; Supplementary Fig. S3B). Additionally, our survey shows that the majority of morphological phylogenetic characters available to score (GEA: 67.87%; SEA: 56.48%; Supplementary Fig. S3B) belong in squamate skeletal regions that are underrepresented in museum collections. This means that even though taphonomic and collecting biases largely align with the material in the skeleton available to fossilize (with the exception of fossil lizards), the observed record of fossil squamates is still biased against the regions of the skeleton that contain the majority of phylogenetic characters. Because the squamate fossil record is clearly missing a large quantity of useful phylogenetic data, it is imperative to assess the quality of the data that remain in characters associated with jaws + palatal elements and axial elements.
Homoplasy and Retained Synapomorphy in the Fossil Record
Results from our comparisons of CI and RI distributions among the GEA and SEA datasets demonstrate that overrepresented phylogenetic character data from the squamate fossil record are not any more likely to provide misleading evidence of phylogenetic relationships than character data from the rest of the skeleton. The lack of a significant difference in observed homoplasy between characters more prevalent in the squamate fossil record suggests that the phylogenetic placements of incomplete and fragmentary fossil taxa are neither more nor less reflective of evolutionary convergence than those of more complete, extant taxa. Similarly, the lack of significant difference in retained synapomorphy among characters in overrepresented versus underrepresented fossil squamate skeletal elements suggests that the phylogenetic placements of fragmentary fossil taxa are neither more nor less reflective of shared derived character states than more complete, extant taxa. Critically, this parsimony-based result is recovered regardless of hypothesis of squamate higher-level evolutionary relationships.
It is unclear what the exact cause is behind the lower CI and RI values for characters sourced from the pectoral girdle in the GEA phylogenetic dataset (Supplementary Fig. S4). Comparisons between CI/RI value and percentage of missing data per character (Supplementary Text, Supplementary Data S47) showed no meaningful relationship, unlike our model-based measurement of phylogenetic signal. This suggests that missing/non-applicable character data do not account for the patterns in CI/RI values among pectoral characters in the GEA dataset. It is possible that major ecomorphological transitions in squamate evolution, such as limb loss and/or specialized pectoral/limb morphologies may play a role in these differences. Alternatively, lower overall sample sizes of phylogenetic characters scored from this skeletal region (Supplementary Figs. S4, S5) could factor into these CI/RI distribution differences. Regardless of cause, the lack of significant differences in the distributions of overrepresented and underrepresented CI/RI values in the large samples sizes of phylogenetic characters in the skull (jaws + palatal and posterior cranial regions) appear to be the most influential on our results.
These results are generally consistent even when swapping character matrices and topologies (Supplementary Fig. S6), with the exception of the RI values of GEA appendicular characters mapped onto the SEA topology (Supplementary Fig. S7, Supplementary Data S23). The higher median RI value and higher variation in the range of RI values in the GEA appendicular region (Supplementary Fig. S4) are eradicated when mapped onto the SEA topology (Supplementary Fig. S7). This could be due to the possibility that GEA appendicular characters demonstrate high support for crownward nodes on the GEA topology, but because of the fundamental differences in evolutionary hypotheses, the characters do not support major nodes on the SEA tree. Further tests, including running separate phylogenetic analyses using each anatomical partition (e.g., Wencker et al. Reference Wencker, Tschopp, Villa, Augé and Delfino2021), could be used in the future to understand the causes behind these differences in more detail.
Model-based Analyses of Phylogenetic Signal
Among GEA and SEA characters with no missing data, regions of the squamate skeleton that are underrepresented in the fossil record exhibit levels of median phylogenetic signal similar to overrepresented regions as measured by the δ-statistic. This is true when we consider only legged squamate taxa (Figs. 4, 5), only legless squamate taxa (Fig. 6), and snake taxa in the GEA dataset (Supplementary Fig. S21). By running separate analyses of legged taxa and legless taxa, we were able to consider a portion of characters corresponding to the pectoral girdle, pelvic girdle, and appendicular elements in legged squamates. However, even though we were able to survey representative characters for each of the seven anatomical regions, the strong negative correlation of missing data to δ-statistic values meant that we could only include roughly one-third of the total amount of characters (195 GEA characters, or 31.97% of total; 126 SEA characters, or 36.31% of total). In the absence of an appropriate benchmark study stating the minimum number of characters that can be used for the δ-statistic, it is difficult to tell how much our estimates of phylogenetic signal using the δ-statistic are impacted by low character sample size. However, our δ-statistic results are consistent with our parsimony-based results, which: (1) are derived from a much larger sample of characters (572 GEA characters, or 93.77% of total; 222 SEA characters, or 63.97% of total) and (2) have no discernible relationship with the amount of missing data (Supplementary Data S47). This suggests that smaller character sample sizes may not affect the sensitivity of the δ-statistic analyses as much as, for example, the number of included taxa (Borges et al. Reference Borges, Machado, Gomes, Rocha and Antunes2019). Further work to more rigorously assess the effect of small sample sizes on the δ-statistic is needed, but at present, we conclude that our analyses in this study are still an appropriate use of the method.
For both the GEA and SEA datasets, appendicular characters showcased the highest median δ-statistic value, in addition to an interquartile range with the highest δ-statistic values, among all anatomical bins that contained >2 characters (Fig. 5, Supplementary Figs. S22, S23, S27). Possible explanations for this pattern among appendicular characters vary according to phylogenetic dataset. For the GEA dataset, a potential explanation for the high median δ-statistic values in appendicular characters has to do with the unique limb structure of the chamaeleonid taxa (Brookesia brygooi and Chamaeleo laevigatus) that were scored differently from all other limbed taxa. Because chamaeleonids are monophyletic in the GEA hypothesis, this results in higher δ-statistic values for the appendicular character bin, even though the characters do not illuminate relationships for any group of lizards beyond Chamaeleonidae. The character selection and taxa used in the SEA dataset are different, and the one sampled chamaeleonid taxon (Trioceros jacksonii) does not appear to be influencing the δ-statistic as strongly as the chamaeleonids in the GEA dataset. However, for both datasets, there are unique differences between the squamate limb and the limb of the out-group taxon, Sphenodon punctatus (e.g., the presence of epiphyses on long bones), that polarize the characters and result in high δ-statistic values but are not necessarily informative on the interrelationships among limbed squamate taxa. In sum, the higher phylogenetic signal for appendicular characters is probably driven by the monophyly of a small subset of unique taxa (Chamaeleonidae), as well as the in-group/out-group distinction among all squamates and S. punctatus.
Even with the character matrices and topologies swapped (Supplementary Figs. S27, S28), appendicular characters exhibit the highest median δ-statistic values and an interquartile range with the highest δ-statistic values among all anatomical regions. Why this differs from our appendicular RI values is unclear, especially given the median RI value of GEA appendicular characters was significantly lowered when mapped onto the SEA topology (Supplementary Fig. S7B). It is possible that the relatively smaller sample size of GEA characters in the appendicular region for the δ-statistic analyses excluded key characters that contributed to the lower median RI value and greater interquartile range. These conflicting results could also demonstrate greater plasticity in squamate limb morphology, where, apart from a few clades (e.g., Chamaeleonidae), limb morphological character evolution carries lower phylogenetic signal across the squamate tree of life.
δ-statistic values for characters associated with axial region of the squamate skeleton have variable distributions according to dataset and taxon subsets used. For both the GEA and SEA datasets in which only legged taxa were sampled, the axial characters have the lowest (SEA) or third-lowest (GEA) median δ-statistic value out of all sampled anatomical regions (Fig. 5, Supplementary Figs. S22, S23, S27, S28). The same results hold for the swapped character matrices and topologies (Supplementary Figs. S27, S28). For the GEA dataset in which only snakes were sampled, the axial characters had higher median δ-statistic values than other regions (Supplementary Fig. S25). This suggests that, at least for the GEA dataset, axial characters are comparably more phylogenetically informative for snake taxa than they are for legged taxa or all legless taxa within the dataset. Within the context of the fossil record of snakes and other legless squamates (Fig. 2D), in which the vast majority of identified elements are from the axial region of the skeleton, this adds confidence to our ability to include incomplete fossil snakes into broader phylogenetic analyses.
As was the case with our parsimony-based measurements of phylogenetic signal, the anatomical bins with the largest sampling of characters (jaws + palatal and posterior cranial elements of the skull) appear to be influencing the results of our statistical comparisons the most. Bootstrap tests (Supplementary Data S46) indicate that there is no statistically significant difference between the median δ-statistic value of jaws + palatal and posterior cranial elements of the skull (all p-values > α), which correspond to overrepresented and underrepresented regions of the fossil squamate skeleton, respectively. This means that jaws + palatal and posterior cranial elements of the skull, which happen to be the most character-rich regions of the squamate skeleton, showcase levels of phylogenetic signal that are consistent with one another (Fig. 5, Supplementary Figs. S22–S25, S27, S28). It is important to consider this result alongside the observed fossil record of lizards (Nydam Reference Nydam2013; Rage Reference Rage2013; Fig. 2C), which disproportionately preserves jaws + palatal elements of the skull (in particular, dentaries and maxillae). By demonstrating that these skeletal regions retain similar levels of phylogenetic signal as posterior cranial elements of the skull and other underrepresented regions of the fossil squamate skeleton, we can be more confident that including incomplete fossil lizard taxa in phylogenetic analyses will not provide misleading evidence of evolutionary relationships.
A Biased Fossil Record Contains Reliable Phylogenetic Data
The arrival at similar assessments of phylogenetic signal using two non-independent but distinct phylogenetic datasets (GEA and SEA) gives us confidence that we are describing an actual natural phenomenon in the preservation of fossil data. Our parsimony- and model-based analyses show that the parts of the squamate skeleton most ubiquitously preserved and collected in the fossil record (jaws + palatal bones, vertebrae, and ribs) retain the same level of phylogenetic signal as other parts of the skeleton. Critically, these results are recovered regardless of hypothesis of squamate higher-level evolutionary relationships. This joint assessment of bias and phylogenetic signal in our sample of morphological data adds confidence to our ability to accurately infer the evolutionary relationships of fossil organisms that preserve disarticulated parts. Incorporating the world's abundance of incomplete, but potentially trustworthy fossils of vertebrates (Norell and Novacek Reference Norell and Novacek1992), echinoderms (Thompson and Denayer Reference Thompson and Denayer2017), plants (Crane et al. Reference Crane, Herendeen and Friis2004), and more into an evolutionary framework can bolster our ability to explore patterns of diversification, paleobiogeography, and biostratigraphy in deep time and can provide critical historical data for understanding today's worsening biotic crises (Ceballos et al. Reference Ceballos, Ehrlich and Dirzo2017). Using a quantitative phylogenetic framework to establish the reliability of fragmentary fossil material harnesses the full power of the fossil record in our effort to address major outstanding questions in Earth's biological history.
We would like to thank J. J. W. Sertich and K. Mackenzie in the Earth Sciences Vertebrate Paleontology collections at the Denver Museum of Natures & Science for access to specimens, collections resources and research opportunities. We are also grateful to D. Brinkman and J. A. Gauthier in the Vertebrate Paleontology Collections at Yale Peabody Museum of Natural History for specimen access and thoughtful discussion. We would also like to thank C. Mehling in the Fossil Amphibians, Reptiles, and Birds Collections at the American Museum of Natural History for specimen access. Additionally, we would like to thank M. Walsh, S. McLeod, and B. Mertz at the Natural History Museum of Los Angeles County for digital specimen data access. We would also like to thank Paleobiology managing editor J. Kastigar for logistical support with revisions. Finally, we would like to thank Paleobiology editor C. K. Boyce, associate editor M. Friedman, an anonymous reviewer, and reviewer N. Brocklehurst for extremely helpful feedback and comments that greatly augmented the quality of this article. C.H.W. was supported by the Richard Estes Memorial Grant from the Society of Vertebrate Paleontology, the Theodore Roosevelt Memorial Grant from the American Museum of Natural History, as well as National Science Foundation EAR 1647841 and ANT 1341475 (to N.D.S.). J.R.T. was supported by a Royal Society Newton International Fellowship and a Leverhulme Trust Early Career Fellowship.
Data Availability Statement
Data available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.m63xsj43b. C. H. Woolley, J. R. Thompson, Y. H. Wu, D. J. Bottjer, and N. D. Smith. 2021. Supplementary information for “A Biased Fossil Record Can Preserve Reliable Phylogenetic Signal.”