Non-technical Summary
The mammalian skull plays important roles in sensing, communicating, and obtaining and processing food, making it a useful model system to understand how developmental processes and selection for different ecological functions influence patterns of morphological evolution in general. Allometry describes changes in shape related to changes in size. Previous work has documented a pattern among mammals wherein larger species tend to have relatively longer faces than smaller species (craniofacial evolutionary allometry, or CREA). CREA mirrors patterns of skull shape change over postnatal development (ontogeny). Using 3D quantification of shape and linear measurements, we describe the major patterns of shape variation in equid (fossil and modern horses) skulls, test whether equids follow or deviate from the CREA pattern over the clade’s evolutionary history, and identify whether the evolution of high-crowned teeth (hypsodonty) is related to relative facial proportions, all using statistics appropriate to evolutionary data. Those analyses demonstrate that fossil and modern horses do not conform to the CREA pattern and that the evolution of hypsodont dentition did not influence relative facial length in the group. Comparison with previously published data suggests patterns seen during growth do not necessarily scale up to explain evolutionary changes between species. Our results challenge the idea that a single developmental or ecological factor can fully explain skull evolution in horses.
Introduction
Understanding the processes that generate patterns of morphological diversity is a fundamental challenge in evolutionary biology (Foote Reference Foote1997; Jablonski Reference Jablonski2017a, Reference Jablonski2017b). Natural selection acts on standing phenotypic variation, and hypothetically, if no processes were to bias the production of phenotypic variation, morphological evolution could proceed in any number of directions. However, due to the nature of developmental systems, this is most often not the expectation (Jablonski Reference Jablonski2020; Salazar-Ciudad Reference Salazar-Ciudad2021), and morphological evolution proceeds along constrained trajectories of shape covariation (Schluter Reference Schluter1996; Marroig and Cheverud Reference Marroig and Cheverud2005). One such set of trajectories describes changes in shape related to size. These scaling power laws can be evaluated at different levels of biological organization, and when analyzed across different species, these trajectories are known as evolutionary allometry (Klingenberg Reference Klingenberg2014).
Evolutionary allometry has recently received attention as a constraint on mammalian skull shape evolution (e.g., Cardini Reference Cardini2019; Marcy et al. Reference Marcy, Guillerme, Sherratt, Rowe, Phillips and Weisbecker2020). Among closely related groups of mammals, facial length often increases proportionally more than braincase length as body size and total skull length increases (Radinsky Reference Radinsky1985; Slater and Van Valkenburgh Reference Slater and Van Valkenburgh2008; Cardini and Polly Reference Cardini and Polly2013; Cardini Reference Cardini2019; Rhoda et al. Reference Rhoda, Haber and Angielczyk2023; Tamagnini et al. Reference Tamagnini, Michaud, Meloro, Raia, Soibelzon, Tambusso, Varela and Maiorano2023; but see Krone et al. Reference Krone, Kammerer and Angielczyk2019). When this occurs, larger species have relatively longer faces than smaller species in a pattern that was originally termed “progressive preoptic preponderance” (PPP; Robb Reference Robb1935), and more recently has been called “craniofacial evolutionary allometry” (CREA; Cardini and Polly Reference Cardini and Polly2013; Cardini et al. Reference Cardini, Polly, Dawson and Milne2015; Zelditch and Swiderski Reference Zelditch and Swiderski2023). This evolutionary pattern seems to mirror the patterns seen in ontogeny (Cardini and Polly Reference Cardini and Polly2013); newborn mammals have relatively short or brachycephalic faces compared to adults, and over ontogeny, the face grows at a faster rate than the braincase, resulting in adults with longer or more dolichocephalic faces than juveniles (e.g., O’Higgins and Collard Reference O’Higgins and Collard2002; Gianni et al. Reference Giannini, Segura, Giannini and Flores2010; Le Verger et al. Reference Le Verger, Hautier, Bardin, Gerber, Delsuc and Billet2020). If ontogenetic allometric trajectories form the basis for the CREA pattern, this would suggest that changes in adult morphology between species are achieved through changes in the timing of development (Gould Reference Gould1985). However, the CREA pattern could also be due to the interaction between other allometric patterns and biomechanical constraints (Cardini and Polly Reference Cardini and Polly2013). For example, brain size, and by extension braincase size, scales with an exponent of 3/4 relative to body size (Martin et al. Reference Martin, Genoud and Hemelrijk2005). If jaw and facial lengths scale isometrically relative to body mass (an exponent of 1/3) (Emerson and Bramble Reference Emerson, Bramble, Hanken and Hall1993), then jaw and facial lengths would scale at a faster rate than braincase size, resulting in the CREA pattern (Cardini and Polly Reference Cardini and Polly2013).
Much of the early work on which PPP and CREA are based was focused on the rich fossil record of the horse family, Equidae (Robb Reference Robb1935; Radinsky Reference Radinsky1984). Over their ∼55 million year evolutionary history, equids display a textbook set of trends toward larger size, loss of lateral digits, and increasingly large and high-crowned teeth, all of which are thought to associate with global trends toward a cooler climate, expansion of grasslands, and a shift from browsing to grazing diets (Janis Reference Janis and Regal2007). Structural reorganization, due to the acquisition of high-crowned hypsodont teeth (Reeve and Murray Reference Reeve and Murray1942; Radinsky Reference Radinsky1984), likely influenced patterns of evolutionary shape change in the equid skull. However, the degree to which allometry and hypsodonty drove skull shape variation has not yet been explicitly tested, beyond evaluating whether brachydont and hypsodont taxa differ in scaling proportions (Radinsky Reference Radinsky1984). Furthermore, because all of these studies occurred before the comparative methods revolution of the mid-1980s, none considered the confounding effects of shared evolutionary history when analyzing evolutionary changes in equid skull morphology. More recently, facial length has been shown to scale isometrically (facial length scales at the same rate as braincase length) or with negative allometry (facial length increases at a slower rate than braincase length) in extant equids (Cardini Reference Cardini2019). Cardini (Reference Cardini2019) hypothesized that this may be due to the spatial requirements of hypsodont dentition; large high-crowned teeth require more space, resulting in a proportionally longer rostrum even in smaller animals. In support of this idea, a recent analysis of modern horse and pony breeds found no evidence for craniofacial elongation in larger breeds, suggesting that despite artificial selection for small body size in miniature breeds, there are limits in the degree to which facial length can decrease while still maintaining adequate space in the palate for their dentition (Heck et al. Reference Heck, Sanchez-Villagra and Stange2019). However, evaluating the role hypsodonty played in shaping patterns of equid skull shape evolution requires the temporal perspective of the fossil record.
Here, using linear measurements, phylogenetic comparative methods, and the rich equid fossil record, we explicitly test whether equids follow or deviate from the CREA pattern and whether relative facial proportions are driven by hypsodonty. We then compare evolutionary patterns in relative facial proportions to those seen within species and over ontogeny. Finally, we use 3D geometric morphometrics to describe the major axes of shape variation in the equid skull and evaluate support for size and hypsodonty as drivers of shape variation in the equid skull.
Materials and Methods
The 2D and 3D Data
We collected 3D surface scan data for 31 skulls from 26 species of equid, spanning the family’s entire evolutionary history (Table 1). We selected only adult specimens, with permanent adult dentition erupted. Our sample size is reflective of the availability of adequately preserved specimens for morphometric analysis. Skulls were scanned using a structured light scanner (Creaform Go!Scan 20) at 0.3 mm resolution. One scan (Neohipparion affine UNSM 101500), was downloaded from Morphosource (Morphosource Ark: ark:/87602/m4/475607; Goswami et al. Reference Goswami, Noirault, Coombs, Clavel, Fabre, Halliday and Churchill2022). To evaluate whether macroevolutionary patterns in equid skull evolution mirror those seen in ontogeny, we also collected a small ontogenetic sample of plains zebra (Equus quagga) totaling eight specimens ranging in age from late fetal/newborn to adult individuals (Supplementary Table S1). We quantified skull shape using a set of 18 landmarks, adapted from Heck et al. (Reference Heck, Wilson, Evin, Stange and Sánchez-Villagra2018, Reference Heck, Sanchez-Villagra and Stange2019; Fig. 1, Supplementary Table S2). For some specimens, where landmarks could not be placed due to lack of preservation or damage to the specimen, we estimated landmark position. To estimate the position of missing landmarks, we symmetrized, where possible, symmetric landmarks across a plane defined by midline (landmarks 15–18, Supplementary Table S2). For nonsymmetric landmarks, we estimated landmark position using the fixLMtps function in the Morpho R package (Schlager Reference Schlager, Zheng, Li and Szekely2017). Estimated landmark configurations were checked via visual inspection to ensure that the resulting configuration appeared sensible. For one specimen, Neohipparion affine UNSM 101500, all missing landmarks were estimated using the fixLMtps function, because this resulted in the most reasonable estimates for bilaterally symmetric landmarks. Where possible, we landmarked two specimens per species. In these cases, we computed a species average landmark configuration using the mshape function in the geomorph R package (Baken et al. Reference Baken, Collyer, Kaliontzopoulou and Adams2021; Adams et al. Reference Adams, Collyer, Kaliontzopoulou and Baken2023). In analyses using 3D geometric morphometric data, it may be a concern when degrees of freedom (3k + 7, for 3D landmark data, where k is number of landmarks) exceed the number of specimens (Webster and Sheets Reference Webster and Sheets2010). To explore whether reducing the number of landmarks (and, therefore, degrees of freedom) impacted our results, we created an additional dataset in which we included only landmarks from the left side. We extracted the symmetric component of shape variation from the Procrustes-transformed full landmark dataset using the bilat.symmetry function in geomorph R package. We then discarded paired landmarks from the right side, such that only one half of the symmetric shape dataset was present (hereafter referred to as the “symmetric” dataset).
Evolutionary sample of equids included in the study.

Median tree from BEAST2 v. 2.7.1 (Bouckaert et al. Reference Bouckaert, Heled, Kühnert, Vaughan, Wu, Xie, Suchard, Rambaut and Drummond2014) time-scaling analysis, with species represented in our dataset denoted by gray circular tips and skull of Scaphohippus intermontanus AMNH 87301 with landmark scheme; landmark descriptions can be found in Supplementary Table S2. Silhouettes are from phylopic.org donated under the CC0 1.0 Universal Public Domain Dedication license and Public Domain Mark 1.0 license.

Facial length and braincase length were measured in two ways, following previous studies. Following Cardini (Reference Cardini2019), we first calculated inter-landmark distances that defined the length of the face as the distance between first incisors from ventral side (landmark 15) to the midpoint of the palate (landmark 16) and braincase length as the midpoint of the palate to the ventral tip of the foramen magnum (landmark 17) (Fig. 2, Supplementary Table S2). These landmarks reflect different developmental origins of the cranial elements, with the face deriving from the splanchnocranium and the braincase deriving from the neurocranium, and present reproducible points for landmark placement (Salamanca-Carreño et al. Reference Salamanca-Carreño, Parés-Casanova, Crosby-Granados, Vélez-Terranova and Benítez-Molano2023; Cardini Reference Cardini2019; Hallgrímsson et al. Reference Hallgrímsson, Brown, Ford-Hutchinson, Sheets, Zelditch and Jirik2006).
Examining craniofacial allometry across different levels of biological organization: A, B, use Cardini measurements of facial length and braincase length (Cardini Reference Cardini2019); C, D, use Radinsky measurements of facial length and braincase length (Radinsky Reference Radinsky1984); E, F, use Cardini measurements. A, C, Regression of facial length onto braincase length shows that when phylogeny is taken into account, we cannot reject an isometric scaling relationship between facial length and braincase length, suggesting that equids deviate from the craniofacial evolutionary allometry (CREA) pattern. E, Regression of facial length onto braincase length shows that intraspecifically, among modern wild and domestic equids, facial length scales isometrically or with negative allometry relative to braincase length. F, Over ontogeny, equids follow CREA with facial length scaling with positive allometry relative to braincase length. Illustrations on the right-hand side denote the different measurement schemes for facial length and braincase length. OLS, ordinary least squares; PGLS, phylogenetic generalized least squares, PICs, phylogenetic independent contrasts.

To mirror Radinsky’s (Reference Radinsky1984) classic study of horse craniofacial allometry, we subsequently took linear measurements of toothrow length (= face length) from the posterior margin of M3 to the anterior margin of the incisors and braincase length from the paracondylar process to the back of the orbit from screenshots of the skulls in lateral view in Geomagic DesignX (3D Systems, North Carolina), which were exported as .png files. Measurements were taken in ImageJ. These measurements represent an alternative way of measuring the different facial components and, similarly to the Cardini measurements, reflect their different developmental origins (Radinsky Reference Radinsky1984).
Finally, to expand our intraspecific and ontogenetic datasets and make direct comparisons with previously published studies, we collected inter-landmark distances corresponding to facial length and braincase length (i.e., the “Cardini measurements”) from Heck et al. (Reference Heck, Wilson, Evin, Stange and Sánchez-Villagra2018, Reference Heck, Sanchez-Villagra and Stange2019). Our landmarks 15, 16, and 17, correspond to landmarks 37, 46, and 60, respectively, in the intraspecific dataset of Heck et al. (Reference Heck, Wilson, Evin, Stange and Sánchez-Villagra2018) and landmarks 37, 44, and 58, respectively, in the ontogenetic dataset of Heck et al. (Reference Heck, Sanchez-Villagra and Stange2019). For the intraspecific sample of adult specimens from Heck et al. (Reference Heck, Wilson, Evin, Stange and Sánchez-Villagra2018), we analyzed the data from domestic horses, zebras, and donkeys separately. Heck et al. (Reference Heck, Wilson, Evin, Stange and Sánchez-Villagra2018) pooled data for donkeys and for zebras because of issues related to inconsistent species identification. Although the zebra and donkey data are “pseudointraspecific,” we choose to include them here because they provide a useful point of comparison with the evolutionary, ontogenetic, and domestic breed data.
Retrodeformation of Hyracotherium vasacciense
In general, we excluded specimens that had been severely deformed due to taphonomic processes and were no longer symmetric. However, we were unable to obtain any suitably preserved hyracotheriine specimens. As we felt it was important to include at least one representative from the earliest equid subfamily, we retrodeformed one specimen of Hyracotherium vasacciense (USNM PAL 336125) in which the rostrum and basicranium are slightly bent, using the method of Schlager et al. (Reference Schlager, Profico, Di Vincenzo and Manzi2018).
This method works by placing landmarks and semilandmarks along bilaterally symmetric structures on the model. These landmarks are then symmetrized, and the deformed mesh is retrodeformed by thin plate spline deformation using the original and retrodeformed landmark coordinates (Schlager et al. Reference Schlager, Profico, Di Vincenzo and Manzi2018). We placed 104 landmarks (28 normal landmarks, 76 semilandmarks along curves) on the surface of the original model of USNM PAL 336125. We used the slider3d function in the R package Morpho (Schlager Reference Schlager, Zheng, Li and Szekely2017) to slide semilandmarks along the curves. We then used the symmetrize function in Morpho (Schlager Reference Schlager, Zheng, Li and Szekely2017) to create a symmetric landmark configuration; relaxed the original landmark configuration against the symmetric landmark configuration using the relaxLM function in Morpho (Schlager Reference Schlager, Zheng, Li and Szekely2017); and used the resulting relaxed configuration to retrodeform the model of USNM PAL 336125 with the retroDeformMesh function in the R package Morpho (Schlager Reference Schlager, Zheng, Li and Szekely2017). Success of the retrodeformation process was assessed via visual inspection (Supplementary Fig. S1). We then landmarked the retrodeformed model using the original landmark configuration (Supplementary Fig. S1, Supplementary Table S2).
Metatree
Comparative evolutionary analyses require a time-scaled phylogeny to account for covariation due to shared ancestry among traits (Felsenstein Reference Felsenstein1985; Grafen Reference Grafen1989). However, a time-scaled phylogeny that incorporates most species in family Equidae does not yet exist, beyond informally constructed supertrees (e.g., Mihlbachler et al. Reference Mihlbachler, Rivals, Solounias and Semprebon2011; Famoso et al. Reference Famoso, Davis, Feranec, Hopkins and Price2016; Cantalapiedra et al. Reference Cantalapiedra, Prado, Fernández and Alberdi2017; Parker et al. Reference Parker, McHorse and Pierce2018). These trees rely on qualitative assumptions about phylogenetic affinity of individual taxa and cannot explicitly incorporate uncertainty inherent in phylogenetic datasets.
Here, we employed the metatree approach (Lloyd et al. Reference Lloyd, Bapst, Friedman and Davis2016; Lloyd and Slater Reference Lloyd and Slater2021; Wisniewski et al. Reference Wisniewski, Lloyd and Slater2022) to generate a comprehensive, time-scaled, consensus evolutionary hypothesis for living and fossil equids. A metatree is a meta-analytic phylogeny that uses matrix representation with parsimony (MRP; Baum Reference Baum1992; Ragan Reference Ragan1992a, Reference Ragan1992b) to summarize current consensus regarding relationships within a focal clade. Unlike typical MRP supertrees that encode only a single figured topology and weight each MRP character equally, metatree uses reanalysis of published phylogenetic matrices to obtain and encode all bipartitions of taxa present in the complete set of most parsimonious trees (MPTs) and summarizes the resulting phylogenetic information using a formal set of weighting rules to account for synonymy, composite terminal taxa, matrix reuse, year of publication, and MRP dataset size. For a detailed description of the analytic pipeline, we refer readers to Lloyd and Slater (Reference Lloyd and Slater2021). For this study, we sourced 16 morphological character–taxon matrices from 15 previously published studies (1 study had two matrices; Table 2). Following the metatree pipeline, we reanalyzed each of the morphological character–taxon matrices in TNT v. 1.6 (Goloboff and Morales Reference Goloboff and Morales2023) under the original character ordering and weighting schemes (parameters for each search are provided in the analysis files, see Dryad Repository (https://doi.org/10.5061/dryad.crjdfn3g3) and summarized the resulting phylogenetic information from each analysis in MRP matrices. As molecular sequence information is also phylogenetically informative for extant and recently extinct species, we also generated an MRP representation of the maximum clade credibility (MCC) tree resulting from a BEAST v. 1.8.4 (Drummond et al. Reference Drummond, Suchard, Xie and Rambaut2012) analysis of mitochondrial dataset 5 (filename=filename=BEAST_mtDNADataset5_equidsOnly_allPartns_rootConstrained_inclEovodovi.mcc) from Heintzman et al. (Reference Heintzman, Zazula, MacPhee, Scott, Cahill, McHorse and Kapp2017). This molecular phylogeny includes multiple tips per species and so, before generating the MRP matrix, we pruned the topology so each species was represented by one specimen. We used the Metatree function in the metatree R package (Lloyd Reference Lloyd2015) to generate a global MRP matrix that summarized phylogenetic information across studies, while reconciling taxonomy using the Paleobiology Database (PBDB, https://paleobiodb.org). At this point, species with identical MRP coding were removed via safe taxonomic reduction to reduce noise and improve tree search efficiency. We analyzed the global MRP matrix in TNT v. 1.6 (Goloboff and Morales Reference Goloboff and Morales2023) using 10,000 replicates, which resulted in 3774 MPTs. Species removed via safe taxonomic reduction were reinserted and a majority rule consensus (MRC) tree generated.
Source studies used in metatree.

To time-scale the topology for downstream analyses, we inferred branch lengths under a relaxed molecular clock and fossilized birth–death tree prior using BEAST2 v. 2.7.1 (Bouckaert et al. Reference Bouckaert, Heled, Kühnert, Vaughan, Wu, Xie, Suchard, Rambaut and Drummond2014). We used the MRC tree as a topology constraint, stratigraphic data from the PBDB as priors on the occurrence times for fossil taxa, and consensus sequence data derived from the molecular alignment of Heintzman et al. (Reference Heintzman, Zazula, MacPhee, Scott, Cahill, McHorse and Kapp2017) for estimating molecular branch lengths under the original partitions and models of molecular evolution. Three species for which we collected surface scan data, but which were not represented in the phylogenetic data (Mesohippus exoletus, Mesohippus westoni, and Parahippus cognatus), as well as a handful of other species that were relevant to future projects using the phylogeny, were manually inserted into the phylogeny based on taxonomy and topologically constrained via a series of “MRCA with rogues” priors that allowed the phylogenetic placement of these species to be resolved based on stratigraphic data.
The fossilized birth–death process (Heath et al. Reference Heath, Huelsenbeck and Stadler2014; Gavryushkina et al. Reference Gavryushkina, Heath, Ksepka, Stadler, Welch and Drummond2017) was used as a tree prior (net diversification rate prior, r ∼exponential[1.0]; origin, ϵ ∼exponential[2.301761], offset 56; fossil sampling probability prior, s ∼β[2, 2]; turnover prior, ϵ ∼β[2,1]), with an optimized relaxed lognormal clock model (mean = −4.24, SD = 1.25). We ran four Markov chain Monte Carlo chains of 100 million generations, sampling every 500,000 replicates. We extracted the median tree using the medTree function in the treespace R package (Jombart et al. Reference Jombart, Kendall, Almagro-Garcia and Colijn2017) to use as a phylogenetic framework for downstream analyses. Before analyses, we added 0.01 Myr to all zero-length branches (i.e., sampled ancestors).
Tests of Allometry
We first reevaluated the CREA hypothesis, that is, that facial length scales with positive allometry relative to braincase length, by performing phylogenetic regressions of ln(facial length) on ln(braincase length) for 26 species of living and extinct equids, spanning ~55 Myr of evolutionary history, using the median tree from our BEAST2 v. 2.7.1 (Bouckaert et al. Reference Bouckaert, Heled, Kühnert, Vaughan, Wu, Xie, Suchard, Rambaut and Drummond2014) analysis as our main phylogenetic framework (Fig. 1). We fit phylogenetic generalized least squares (PGLS) regressions in R with lambda co-estimated.
We interpreted a 95% confidence interval for the slope that is greater than 1 to represent positive allometry, a 95% confidence interval that includes 1 as failure to reject isometry (the null model), and a 95% confidence interval that is less than 1 as evidence for negative allometry of facial length. For the purposes of comparison with previously published work (Radinsky Reference Radinsky1984) we also fit ordinary least squares (OLS) and reduced major axis (RMA) regressions and performed analyses for two different sets of facial and braincase length measurements (based on measurements from two previous studies by Radinsky [Reference Radinsky1984] and Cardini [Reference Cardini2019]; Fig. 2).
Statistical analyses were conducted using the lm, gls, and lmodel2 functions in the stats, nlme, and lmodel2 R packages, respectively. Regressions were performed with both Cardini and Radinsky measurements. To test whether variation in phylogenetic tree structure had a tangible influence on the results of the phylogenetic regressions, we conducted PGLS regressions using a random sample of 100 trees from the posterior distribution. To test allometric patterns of facial elongation at the intraspecific level and ontogenetic level, we fit OLS regressions of facial length onto braincase length.
Influence of Acquisition of Hypsodont Dentition
To test whether the acquisition of hypsodont dentition influenced patterns of relative facial elongation, we fit OLS and PGLS regressions natural-log (ln)-transformed relative facial length (facial length/braincase length) onto an ln-transformed hypsodonty index. Hypsodonty index data were taken from Cantalapiedra et al. (Reference Cantalapiedra, Prado, Fernández and Alberdi2017) and Parker et al. (Reference Parker, McHorse and Pierce2018).
Axes of Shape Variation in the Equid Skull
It is possible that the simple measures of face and braincase length that we used earlier, following previous workers, fail to accurately describe overall patterns of craniofacial shape variation and therefore miss CREA-like patterns. To investigate the evolution of equid skull shape in a more holistic fashion, after performing a generalized Procrustes analysis on the landmark data to remove variation not due to shape (i.e., translation, rotational, and scaling effects; Goswami and Polly Reference Goswami and Polly2010), we performed a principal components analysis (PCA) of the shape data to explore the major axes of shape variation in the equid skull. We then created deformation grids using the plotRefToTarget function in the geomorph R package (Adams and Otárola-Castillo Reference Adams and Otárola-Castillo2013). To investigate the relative contributions of hypsodonty, size, and first appearance data on skull shape variation we performed a series of multivariate phylogenetic least squares regressions using the mvgls function in the mvMORPH R package (Clavel et al. Reference Clavel, Escarguel and Merceron2015).
Results
CREA and Evolutionary Allometry
Although OLS and RMA regressions were consistent with the predictions of CREA (OLS: Cardini measurements 95% confidence interval [CI] β = 1.09−1.33; Radinsky measurements 95% CI, β = 1.02−1.27; Supplementary Tables S3, S4), appropriately accounting for phylogenetic autocorrelation in the residual error of the model yields point estimates for the slope consistent with negative allometry (Fig. 2A,C, Supplementary Tables S3, S4), although isometry cannot be rejected because the 95% confidence intervals encompass 1 (Cardini measurements, β = 0.83 ± 0.25, p < 0.0001; Radinsky measurements, β = 0.92 ± 0.22, p < 0.0001). This result is insensitive to the choice of measurement dataset (Cardini measurements, β = 0.83 ± 0.25, p < 0.0001; Radinsky measurements, β = 0.92 ± 0.22, p < 0.0001), variation in tree topology (Fig. 3), use of phylogenetic independent contrasts (Fig. 2B,D), or use of phylogenetic reduced major axis regression (Cardini measurements, p = 0.34; Radinsky measurements p = 0.62). Taken together, these results indicate that phylogenetic autocorrelation inflates the signal for the CREA pattern in comparative data and confirms that evolutionary isometry of facial length cannot be rejected for equids.
Distribution of point estimates for slope from phylogenetic regression of facial length onto braincase length across distribution of 100 randomly sampled trees. Black line denotes slope = 1. Pink and blue lines denote non-phylogenetic ordinary least squares (OLS) estimate for slope of regression of facial length onto braincase length using Cardini and Radinsky measurements respectively (Radinsky Reference Radinsky1984; Cardini Reference Cardini2019).

Static and Ontogenetic Allometry
As with the evolutionary sample, equids deviate from the predictions of CREA at the intraspecific/“pseudointraspecific” level (Fig. 2E, Supplementary Table S5). In Heck et al.’s (Reference Heck, Wilson, Evin, Stange and Sánchez-Villagra2018) intraspecific sample of domestic horse breeds, donkeys, zebras, and Przewalski’s horses (Equus przewalskii), facial length scales isometrically or with negative allometry with braincase length (βhorse = 1.01 ± 0.07, p < 0.001; βdonkey =1.02 ± 0.12, p < 0.001; βzebra = 0.79 ± 0.15, p < 0.001; βPrze.horse = 0.85 ± 0.32, p = 0.001; Supplementary Table S5).
In contrast, facial length scales with positive allometry relative to braincase length over ontogeny in modern equids (Supplementary Table S6). Regression of facial length onto braincase length yields estimates for the slope that are universally greater than 1 and, in all cases except a small zebra sample that we collected for this study, have confidence intervals that only encompass values greater than 1 (Supplementary Table S6). Shetland and Welsh breeds have slightly larger estimates for the slopes than the sample of all domestic breeds and the sample of plains zebra. A PCA of the ontogenetic shape data for plains zebra suggests that increases in facial length relative to braincase length are a major axis of shape variation (Supplementary Fig. S3). PC 1 explains 56.74% of shape variation in the sample and tends to separate older specimens (negative loadings) from younger specimens (positive loadings). Inspection of warped meshes representing shape changes associated with minimum and maximum scores on PC 1 shows that shape changes along this axis are driven in part by facial lengthening. This is most apparent in the diastema between the incisors and post-canine dentition where older specimens have a relatively longer diastema than younger specimens.
Axes of Shape Variation in the Equid Skull
PC 1, which explains 58.71% of the variance, defines changes in shape that qualitatively appear to be related to facial elongation—taxa that appear to have shorter rostra, such as Hyracotherium, have negative scores on PC 1 while taxa that appear to have elongate rostra, such as fossil and modern Equus, fall on the positive end (Fig. 4). Careful inspection of deformation grids that describe global and local patterns of shape change along PC 1 reveals that this apparent pattern is somewhat of an optical illusion (Fig. 4). Rather than reflecting elongation of the face, relative to the braincase, the first principal axis of shape variation in equids relates primarily to shearing of the skull along an antero-posterior axis, with more positive scores on PC 1 associated with retraction of the anterior tips of the dorsally placed nasal bones (landmarks 3 and 4; Supplementary Table S3, Fig. 4) and posterior displacement of the infraorbital foramen, combined with anterior displacement of the more ventrally placed facial crest (landmarks 5 and 6; Supplementary Table S3; Fig. 4). These shape changes mirror those observed in form space (shape + size; Supplementary Fig. S2).
Visual inspection of a principal components analysis (PCA) of the shape data indicates phylogenetic, temporal, and size trends in skull shape. Larger, more recent species load more positively on PC 1, and older smaller species load more negatively on PC 1. However, larger species do not have proportionally longer faces based on phylogenetic regressions and regression of PICs (phylogenetic independent contrasts) of facial length onto braincase length (Fig. 2A–D). Deformation grids representing shape change associated with negative and positive scores along PC 1. Landmarks at nasals (3, 4) and infraorbital foramen (5, 6) are denoted with red and purple arrows, respectively, to emphasize changes in the position of the landmarks across PC 1. Outlines represent Hyracotherium vasacciense, Equus przewalskii, Dinohippus leidyanus, Scaphohippus intermontanus, Megahippus matthewi, and Mesohippus exoletus.

Using phylogenetic multivariate regressions, we find that size is not a significant predictor of multivariate shape (Supplementary Tables S7, S8). These results are fairly consistent regardless of differences in tree topology and branch lengths, with 99–100% of trees yielding a nonsignificant relationship between full multivariate shape and size (Supplementary Table S9). We also find little evidence that the acquisition of hypsodont dentition influenced equid skull shape over the group’s evolutionary history. Phylogenetic regression of multivariate shape onto hypsodonty index (Supplementary Table S7, Supplementary Table S8, p > 0.05 for 96–100% of trees) and relative facial length onto hypsodonty index (Fig. 5, Supplementary Tables S10, S11) indicate a nonsignificant relationship between these variables.
Phylogenetic regression of relative facial length onto hypsodonty index (HI) reveals that hypsodonty index does not predict facial proportions in equids. OLS, ordinary least squares; PGLS, phylogenetic generalized least squares. Cardini and Radinsky measurements derived from Cardini (Reference Cardini2019) and Radinsky (Reference Radinsky1984), respectively.

Inspection of Figure 4 also suggests that position on PC 1 is related to species age, with older species loading more negatively and more recent species loading more positively. However, we find it is not a significant predictor of multivariate shape regardless of tree topology or underlying dataset (Supplementary Tables S7–S9).
Discussion
A dominant theme in evolutionary biology is uniting patterns of morphological evolution observed in nature and the fossil record with the processes that generated them. One approach to this synthesis is to investigate evolutionary trends, that is, a consistent directional trajectory in phenotypic evolution, and test the developmental, environmental, or phylogenetic factors underlying them (Tamagnini et al. Reference Tamagnini, Canestrelli, Meloro, Raia and Maiorano2021). One such evolutionary trend is craniofacial evolutionary allometry (CREA), or the tendency for larger species of mammals to have relatively longer faces than smaller species (Cardini and Polly Reference Cardini and Polly2013; Tamagnini et al. Reference Tamagnini, Meloro and Cardini2017; Cardini Reference Cardini2019), with few exceptions (Richardson et al. Reference Richardson, Morales-García, Damuth, Singh and Janis2024). CREA has also been tested in other non-mammalian groups such as synapsids (Krone et al. Reference Krone, Kammerer and Angielczyk2019) and birds (Bright et al. Reference Bright, Marugán-Lobón, Cobb and Rayfield2016; Linde-Medina Reference Linde-Medina2016) with some support, suggesting that the trend could apply to amniotes as a whole. Changes in the timing of development have been suggested as a driver of the CREA pattern based on the observation that newborn animals tend to have relatively shorter faces than adults, and over ontogeny, the face grows at a faster rate than the braincase, resulting in a positive ontogenetic allometric trajectory (Wayne Reference Wayne1986).
We find, however, that when appropriate statistical tests are applied, we cannot reject an isometric scaling relationship between facial length and braincase length over equid evolutionary history. Point estimates of the scaling relationship between facial length and braincase length, as well as comparison of rates of evolution of facial length and braincase length, instead indicate isometry or negative craniofacial allometry. Our results are insensitive to how facial length and braincase length are measured or phylogenetic uncertainty, and indicate that equids present an exception to the CREA “rule” (Cardini Reference Cardini2019).
On the Importance of Phylogenetic Comparative Methods for Studying Evolutionary Allometry
Although the need to account for phylogenetic autocorrelation in the residual error term of evolutionary regressions should, today, be uncontroversial, it remains a bone of contention among comparative paleobiologists. The classic studies of horse craniofacial evolution that gave rise to the CREA hypothesis (Robb Reference Robb1935; Reeve and Murray Reference Reeve and Murray1942; Radinsky Reference Radinsky1984) predated the origin of phylogenetic comparative methods (Felsenstein Reference Felsenstein1985), in some cases by decades, but more recent authors continue to debate the need for their use. Recently, it has been argued that the use of phylogenetic regression might “erase” the signal for allometric scaling relationships in evolutionary regressions if size changes in a concerted fashion over a group’s evolutionary history (Mitchell et al. Reference Mitchell, Sherratt and Weisbecker2024).
This logic was followed by Richardson et al. (Reference Richardson, Morales-García, Damuth, Singh and Janis2024), who used it as a reason to analyze the evolution of CREA in kangaroos using non-phylogenetic methods. However, this misunderstanding reflects common misconceptions about PGLS regression (Symonds and Blomberg Reference Symonds, Blomberg and Garamszegi2014). OLS regression assumes that residual errors are identical and independently distributed, an assumption that may be violated by evolutionary data, because closely related species are expected to be more similar to one another than either is to more distantly related species and, hence, are also expected to have more similar residual errors. PGLS accounts for this autocorrelation by scaling the residual error term in the model by the (transformed) phylogenetic covariance matrix. Importantly, phylogenetic signal in the traits themselves may or may not translate to phylogenetic signal in the residual error term (Revell Reference Revell2010). As a result, the only appropriate ways to test for CREA are either to first evaluate phylogenetic signal in the residual error term of the model and then to choose the appropriate phylogenetic or non-phylogenetic approach (Uyeda et al. Reference Uyeda, Zenil-Ferguson and Pennell2018) or to co-estimate phylogenetic signal with the regression parameters (Revell Reference Revell2010).
Concerns that PGLS may fail to detect a true allometric scaling relationship when the predictor covaries with phylogeny are easily ameliorated. To demonstrate this, we first simulated two traits evolving in a correlated fashion (e.g., an allometric scaling relationship between size and shape) on our equid tree using the mvSIM function in the mvMORPH R package (Clavel et al. Reference Clavel, Escarguel and Merceron2015; Supplementary Fig. S4). As expected, PGLS consistently recovers a significant positive relationship between values (Supplementary Fig. S4A,B; see also Revell Reference Revell2010). More significantly, though, PGLS can appropriately discriminate between two traits evolving under uncorrelated and correlated trends, while OLS struggles to discriminate between these scenarios (Supplementary Fig. S4C–F). These findings demonstrate that evolutionary patterns may be consistent with multiple generating processes (and vice versa; Raup and Gould Reference Raup and Gould1974) and can only be disentangled with knowledge of the underlying structure of the evolutionary tree (Foote et al. Reference Foote, Jablonski, Erwin and Lipps1996).
Patterns of Facial Allometry across Levels of Biological Organization
One putative explanation for the CREA pattern in mammals is that allometric variation at the intra- and interspecific levels (i.e., static and evolutionary allometry) mirrors patterns of size-related variation that emerge through development (ontogenetic allometry) (Robb Reference Robb1935). Although positive allometry of the facial skeleton, relative to the braincase, is apparent in ontogenetic samples of both domestic breeds and a small sample of wild-collected plains zebra, analyses of equid craniofacial allometry at the intraspecific level show that, among both domestic and wild extant taxa, facial length scales either isometrically or with negative allometry relative to braincase length. Artificial selection to produce different domestic breeds of horse serves as an analogue system to natural selection, although operating over much shorter timescales and with possibly different evolutionary potential in terms of standing genetic and phenotypic variation. Here, though, we find that patterns of craniofacial isometry mirror each other between the evolutionary level on the one hand and the intraspecific level within domestic horse breeds of drastically different body size (replicating the work of Heck et al. [Reference Heck, Sanchez-Villagra and Stange2019]) and wild equid populations on the other. This suggests that this isometric scaling relationship was established early and conserved within the evolutionary history of this group. Whether a deviation from CREA is unique to equids or has deeper or repeated evolutionary origins in eutherian mammals has yet to be answered. Rhoda et al. (Reference Rhoda, Haber and Angielczyk2023) recently found strong support for CREA acting as an evolutionary line of least resistance within ruminant artiodactyls, which would suggest that craniofacial adaptations to herbivory and ungulate ancestry alone are not drivers of this pattern. However, tests within other perissodactyl clades, both extant (rhinos and tapirs) and extinct (e.g., brontotheres) would also be informative.
Drivers or Correlates of Equid Craniofacial Shape Evolution
It has been suggested that the acquisition of specialized dentition may result in a deviation from established allometric patterns in mammals (Slater and Van Valkenburgh Reference Slater and Van Valkenburgh2008; Cardini Reference Cardini2019; Heck et al. Reference Heck, Sanchez-Villagra and Stange2019; Tamagnini et al. Reference Tamagnini, Michaud, Meloro, Raia, Soibelzon, Tambusso, Varela and Maiorano2023). For example, Meloro and Slater (Reference Meloro and Slater2012) found rostrum shape was largely determined by canine height in sabertooth cats, while rostrum shape in non-sabertooth cats was largely driven by allometry. However, evolutionary covariation between the rostrum and braincase remained strong in both groups due to the biomechanical need to maintain structural cohesion within the skull.
The prediction that hypsodont dentition influenced relative facial proportions had been suggested for fossil and modern equids, but never tested with modern phylogenetic statistics (Robb Reference Robb1935; Reeve and Murray Reference Reeve and Murray1942; Radinsky Reference Radinsky1984). Here, we find that size is not significantly associated with multivariate shape variation in the equid skull (Supplementary Tables S8, S9), and there is no significant association between relative facial length and hypsodonty index (Fig. 5, Supplementary Tables S10, S11). Together, these results suggest that the evolution of hypsodont dentition did not play a significant role in driving skull shape evolution in equids.
If not the acquisition of hypsodont dentition or evolution along the CREA line of least resistance, what factors did drive equid skull evolution? It is worth noting here that not every trait needs to have a single adaptive origin (Gould and Lewontin Reference Gould and Lewontin1979). The skull is involved in many competing functions, including acquisition and processing of food, housing of sensory organs, and communication. These competing demands all likely constrained skull shape evolution within the equids to a limited number of directions, even with the acquisition of novel morphologies like hypsodont dentition (Bibi and Tyler Reference Bibi and Tyler2022). Furthermore, because of the nature of geometric morphometric data, we are only able to analyze here the aspects of shape captured by homologous landmarks and thus miss, for example, the complex facial fossae that are found in many ancient equids, but which are absent in modern taxa (e.g., Fig. 1). These fossae are taxonomically diagnostic and may have been important for visual, auditory, and chemical communication and species recognition (Janis Reference Janis and Regal2007). For example, Gregory (Reference Gregory1920; cited in MacFadden and Skinner Reference MacFadden and Skinner1982) suggested that the dorsal preorbital fossa housed an expanded nasal diverticulum, a blind pocket located dorsally within each nostril (Budras et al. Reference Budras, Sack, Rock, Horowitz and Berg2012), which helps form vocal signals such as snorts (Janis Reference Janis and Regal2007).
Evidence from modern horses also indicates that the region of the dorsal preorbital fossa serves as the site of origination for the m. levator labii superioris muscle that inserts on the upper lip (Budras et al. Reference Budras, Sack, Rock, Horowitz and Berg2012) and therefore serves critical functions in both visual signaling via lip elevation, as well as the “Flehman” response that allows pheromones, hormones, and scent to be transmitted to the vomeronasal organ (Estes Reference Estes1972; Crowell-Davis and Houpt Reference Crowell-Davis and Houpt1985; Janis Reference Janis and Regal2007). Modern equids also use their lips extensively to acquire and manipulate food, and it is possible that a more robust m. levator labii superioris was selectively advantageous, particularly among browsing and mixed feeding equids.
Fossil and modern horses have long served as a hallmark example of evolution in the fossil record and were instrumental in developing macroevolutionary theory (Simpson Reference Simpson1944, Reference Simpson1953; Hansen Reference Hansen1997). One of the reasons the equids are such an appealing model system is that they display several obvious concerted changes in morphology, including increases in body size, acquisition of hypsodont dentition, and reduction of digits (MacFadden Reference MacFadden1994). It is tempting to assume these changes evolved in response to a single set of selective pressures, such as the transition from closed habitats to open habitats over the Cenozoic (Wing Reference Wing, Janis, Scott and Jacobs1998; Strömberg Reference Strömberg2006), and that, as a result, we would observe integrated changes in morphology as the result of shifts in ecology. However, our results show that changes in craniodental morphology that appear to happen in a concerted fashion are actually decoupled over equid evolutionary history—the trend toward increased size and the associated changes in equid skull shape are not related to hypsodonty, a fact that is only revealed through careful, phylogenetically informed analysis.
Conclusion
Over the last 100 years the story of equid evolution has been continually revised. A more complex branching phylogenetic understanding of equid evolutionary relationships has mostly replaced the early orthogenetic narrative (Marsh, Reference Marsh1879; Gidley and Osborn Reference Gidley and Osborn1907; Matthew Reference Matthew1926, Reference Matthew1930; Stirton Reference Stirton1940; Simpson Reference Simpson1951; MacFadden Reference MacFadden1994). Likewise, further explorations of equid evolution have revealed that presumed linear transitions in equid morphology and ecology are actually more complex than previously assumed. For example, digit reduction is actually best thought of as a continuous trait, rather than a discrete transitions between monodactyl and tridactyl conditions (McHorse et al. Reference McHorse, Biewener and Pierce2017). Similarly, other lines of evidence such as microwear (Semprebon et al. Reference Semprebon, Rivals, Solounias and Hulbert2016) and isotopic analysis (MacFadden et al. Reference MacFadden, Solounias and Cerling1999) have revealed that, in many cases, hypsodont equids were not exclusively grazers and that hypsodonty may have allowed for increased dietary flexibility, rather than specialization (Feranec Reference Feranec2003, Reference Feranec2007; Mihlbachler et al. Reference Mihlbachler, Rivals, Solounias and Semprebon2011). Undoubtedly, continued study and development of new methodologies will continue to elucidate evolutionary patterns within the group. Our analysis serves as another line in the ever-evolving story of equid evolution, the trend of which seems to be away from simple narrative and toward ever-increasing complexity.
Acknowledgments
We would like to thank A.L.W.’s graduate committee (D. Jablonski, M. Foote, M. Webster, and Z. X. Luo), L. Weaver, C. Badgley, M. Zelditch, and members of the Slater and Webster labs for providing thoughtful discussion and feedback. We would like to acknowledge the contributions of the collections staff and curators at the American Museum of Natural History (J. Meng, J. Galkin, and R. O’Leary), Natural History Museum of Los Angeles County Museum (S. A. McLeod and X. Wang), University of Nebraska State Museum (R. Secord and G. Corner), Smithsonian National Museum of Natural History (H. Little, N. Pyenson, and A. Millhouse), and Field Museum (K. Angielczyk, B. Simpson, A. Ferguson, and L. Heaney) in facilitating access to the specimens in their care. The authors would also like to extend a special thank you to K. Behrensmeyer for serving as A.L.W.’s predoctoral fellowship mentor at the Smithsonian. Financial support was provided by a Big Ten Academic Alliance Smithsonian Institution Predoctoral Fellowship, Natural History Museum of Los Angeles County Student Collections Study Award, Hinds Fund award, and support from the Gerhley Fund and the Broc Kokesh Memorial Fund for the De-extinction of the Stellar’s Seacow to A.L.W. Access to TNT was provided by the Willi Hennig Society. Declaration of use of Artificial Intelligence (AI): We used ChatGPT (Open AI) to assist with aspects of the project: to brainstorm potential titles for the paper, for researching the function of the nasal diverticulum and Flehmen response in equids (ChatGPT 4.5, accessed 5 March 2025), to generate code for some of the analyses presented here (e.g., form space analysis) (ChatGPT5, accessed 1 September 2025), and to format citations (ChatGPT5, accessed 23 October 2025). The authors critically reviewed all outputs before implementing them in the research.
Competing Interests
The authors have no competing interests to declare.
Data Availability Statement
Data and code are available via Dryad and Zenodo: https://doi.org/10.5061/dryad.crjdfn3g3, https://doi.org/10.5281/zenodo.18635760. 3D morphological data are available on Morphosource, Project ID 000715259: https://www.morphosource.org/projects/000715259?locale=en.

