The geography of phylogenetic paleoecology: integrating data and methods to better understand biotic response to climate change

Abstract. Deeper knowledge about how species and communities respond to climate change and environmental gradients should be supported by evidence from the past, especially as modern responses are influenced by anthropogenic pressures, including human population growth, habitat destruction and fragmentation, and intensifying land use. There have been great advances in modeling species' geographic distributions over shallow time, where consideration of evolutionary change is likely less important due to shorter time for evolution and speciation to occur. Over these shallow time periods, we have more resources for paleoclimate interpretation across large geographic landscapes. We can also gain insight into species and community changes by studying deep records of temporal changes. However, modeling species geographic distributions in deep time remains challenging, because for many species there is sparse coverage of spatial and temporal occurrences and there are fewer paleoclimate general circulation models (GCMs) to help interpret the geographic distribution of climate availability. In addition, at deeper time periods, it is essential to consider evolutionary change within lineages of species. I will discuss a framework that integrates evolutionary information in the form of phylogenetic relatedness from clades of extant closely related species, where and when there are associated fossil occurrences, and the geographic distribution of paleoclimate in deep time to infer species past geographic response to climate change and to estimate where and when there were hotspots of ancient diversification. More work is needed to better understand the evolution of physiological tolerances and how physiological tolerances relate to the climate space in which species occur.


Introduction
We are enduring a biodiversity crisis (Myers et al. 2000;Pimm and Raven 2000;Brook et al. 2003; Thomas et al. 2004; Barnosky et al. 2011), and harnessing all possible data to inform on biodiversity patterns through space and time is critical to better understand the history of life and to be able to set and accomplish conservation goals (Dietl and Flessa 2011;Rick and Lockwood 2013;Hunt and Slater 2016). We know that species and communities move and reorganize in response to climate change and habitat alterations (Walther et al. 2002;Parmesan 2006;Walther 2010). With increasing anthropogenic pressures, including human population growth, habitat destruction and fragmentation, and intensifying land use, there will be less habitat and climate connectivity for species movement and biological community reorganization in the future (Rosenzweig et al. 2007;Ryberg et al. 2013;McGuire et al. 2016). Habitat composition is also an important consideration for sustaining metacommunity dynamics (Ryberg and Fitzgerald 2016). Species have perished and will perish locally or entirely, and species loss due to extinctions will take millions of years for recovery . Because the ecological and evolutionary processes leading to adaptation, movement, and extinction occur over long time periods and because the Earth has experienced major alterations to geographic ranges and composition of flora and fauna in the past, it is critical to draw on a deep time perspective to investigate species and community response to climate and environmental change.
The current geographic arrangement of species distributions and community compositions have been significantly influenced by humans (Sinclair et al. 2002;Kampichler et al. 2012;Newbold et al. 2015;Pineda-Munoz et al. 2021). In fact, it has been shown that human pressures on a landscape predict species geographic ranges better than species own biological traits (Di Marco and Santini 2015). Thus, incorporating multiple lines of evidence from field and laboratory experimentation, as well as observation and modeling studies, is particularly important to better understand the response of species and communities to climate and environmental change (Louys et al. 2012). Carefully designed ecological experiments over local to regional geographic extents reveal ecological processes important for determining ecological community composition, dominance, and abundance structures, such as stochastic ecological drift, priority effects, and filtering due to niche selection (Chase 2007;Ryberg et al. 2012;Fukami 2015). Coordinated distributed experiments over larger geographic extents are positioned to address global ecological and environmental problems and contribute to a better understanding of basic ecological theory (Fraser et al. 2013). Physiological experimentation reveals other global change drivers relevant to understanding species geographic range shifts, such as oxygen-and capacity-limited thermal tolerance as a way to link biological levels of organization from cells to ecosystems (Bozinovic and Pörtner 2015). Furthermore, advances in modeling species ecological niches and geographic distributions have made good use of observational data to evaluate past and potential species range shifts, demographic changes, lineage diversification, and extirpation of species (Maguire et al. 2015). Taken together, the modeling advances allow us to better understand how and why species and communities move and reorganize in response to climate and environmental change, and we can begin to anticipate future responses due to impending climate, land use, and land cover change.
Another important line of evidence to better understand the response of species and communities to climate and environmental change comes from the fossil record (Pardi and Smith 2012). Fossils show where and when species occurred in the past, as well as aspects of past species' morphology and which species occurred together within a community or regional species pool. Ecological information from fossils have been derived from their morphology, chemical composition, and depositional setting of associated sedimentary deposits (Damuth et al. 1992;Croft et al. 2018). This information has allowed paleoecologists to answer many relevant questions about the response of species and communities to climate and environmental change, because they have been able to track species through space and time (Jablonski et al. 2003;Stigall 2008); evaluate the geographic shifts of species (Enquist et al. 1995;Rödder et al. 2013;Gavin et al. 2014), ancient invasion dynamics (Jackson 1997;Dudei and Stigall 2010), and change in ancient ecosystem functioning using functional traits (Polly and Head 2015;Polly et al. 2016;Lawing et al. 2017); and inform conservation decision making (Dietl and Flessa 2011;Barnosky et al. 2017).
It can be informative to incorporate information from studies on modern flora and fauna into paleontological studies (Fritz et al. 2013;Lawing and Matzke 2014). Taxonomic resolution, time-averaging, transport, and age uncertainty in data associated with fossils sometimes make it difficult to integrate modern and fossil occurrence data, but these useful pieces of information can be combined to make inferences beyond what each data type will allow on its own. For example, using a phylogenetic framework with randomization procedures would allow one to anchor and extract important clues from the fossil record that can be bolstered by more abundant and taxonomically resolved data from the modern record (Hunt and Slater 2016). This is not to say that fossil occurrence data are not useful on their own. There are hundreds of studies that make use of fossil occurrence data that revealed important biological insight into ecological and evolutionary processes, biogeographic history, and community assembly. However, designing methods that integrate modern and fossil occurrence data bolster our ability to make inferences using information from multiple taxonomic and phylogenetic scales (Hunt and Slater 2016), strengthen our ability to use findings from paleontological studies as past anchoring points to investigate ongoing ecological and evolutionary processes (Lawing and Matzke 2014), and help us translate findings from paleontological studies to inform conservation practices (Dietl and Flessa 2011;Barnosky et al. 2017).
My intention for this paper is to provide an entry-level discussion to various modern and paleontological data types and methodologies that can be integrated in analyses that span ecological, evolutionary, and geologic time. The discussion provided in this paper is not comprehensive in reviewing all studies that integrate modern and paleontological data and methods but will discuss several methods relevant to understanding how species and communities respond to climate and environmental change through time. I will frame the discussion focusing on PaleoPhyloGeographic species distribution Models (PPGMs) as an organizing theme that integrates multiple lines of evidence to infer species past geographic response to climate change and to estimate where and when there were hotspots of ancient diversification (Lawing and Polly 2011;Rödder et al. 2013;Lawing et al. 2016;Rivera et al. 2020). Using PPGMs as an organizing concept in this paper will allow me to home in on a few important methods that were integrated in this particular framework and is intended to help readers get basic information about how these methods work so they can think through how they might integrate multiple modeling techniques with heterogenous data types. However, this paper is meant to be useful to readers beyond those only interested in implementing a PPGM analysis. In an effort to triangulate species distribution modeling, phylogenetic comparative methods, and paleontological observations, this paper provides entry-level remarks on each of these aspects and its required or associated data types and considerations. I attempt to answer basic questions about each of the data types and methods, including (1) what are the data and methods, (2) how are they related to other frameworks and methods, (3) how have they been used in previous work, (4) what are the basic premises of the methods and how do they work, (5) why are they useful to further develop, (6) what are the pitfalls for new researchers to be aware of, and (7) how can we move forward in this field of integration?

Paleophylogeographic Species Distribution Models (PPGMs)
PPGMs are retrodictions of species idealized geographic distributions based on phylogenetic comparative methods, modeled climate tolerances, and paleoclimate GCMs. This framework draws on evidence from evolutionary information in the form of phylogenetic relatedness from clades of extant closely related species, where and when there are associated fossil occurrences, and deep time paleoclimate. Thus far, PPGMs have been used to trace species range dynamics over shorter geologic time frames through glacial-interglacial cycles. These studies found that species ranges probably move more quickly than species adapt to new climate conditions (Lawing and Polly 2011;Rödder et al. 2013). Extending PPGMs over longer geologic time frames back to the Miocene shows that incorporating evolutionary history and phylogenetic comparative methods changes our understanding of deep time range shifts and helps pinpoint hotspots of ancient diversification ). This framework has been supported by deep time projections of physiological models of climate tolerance . However, it is clear that more work is needed to better understand the evolution of physiological tolerances and how they relate to the climate space in which species occur. Rivera et al. (2020) honed this framework to investigate lineagespecific differences among congeners. They showed that large shifts in the climate system drove expansion and contraction of suitable habitat and that geologic events, such as orogeny, relate to diversification events.
Other frameworks have combined several of the same data sources and methodologies in different ways. One of the earlier studies to use phylogenetic comparative methods in combination with climate envelope modeling investigated factors that may have influenced speciation in a group of dendrobatid frogs (Graham et al. 2004). In that study, ancestral reconstructions of climate envelopes were calculated and compared with extant climate envelopes in a principal components space representing an ordination of all the environmental layers that were used to characterize species climate envelopes. Phyloclimatic modeling also combines climate envelopes and phylogenetic comparative methods to reconstruct the history of climate tolerances of species (Yesson and Culham 2006). It extended the previous framework to include projections of ancestral node estimates onto past paleoclimate scenarios.
Another implementation using fossils with phylogenetic comparative methods and climate envelope models revealed new information about the distribution of stem lineages that influence the interpretations of crown group diversification and ancient evolutionary history (Meseguer et al. 2015). This approach used climate envelope modeling and a scale-invariant Mahalanobis distance to represent a lineage's optimum climate envelope (Varela et al. 2011). The authors built paleoclimate envelope models from fossil occurrences and projected those models onto paleoclimate maps. The paleoclimate envelope models were not informed by extant species climate envelopes, but they did incorporate ancestral area reconstructions, combining multiple lines of evidence to better infer the biogeographic history of a genus.
The PPGM framework moves these methods forward in two ways. First, PPGMs incorporated a simple paleoclimate interpolation along with a phylogenetic climate envelope lineage interpolation to extract concerted reconstructions of paleoclimate and phylogenetically informed climate envelopes at multiple coincident time periods of the past (Lawing and Polly 2011). This allowed for more nuanced phylogenetic reconstruction of climate envelopes and more nuanced paleoclimate estimations between time periods where there are available global atmosphere and ocean circulation models reconstructing paleoclimate across geographic space. Second, PPGMs incorporated a method to include paleoclimate information associated with fossil localities into phylogenetic climate envelope reconstructions Rivera et al. 2020). If the paleoclimate information shows that a fossil occurred in a climate that is outside the distribution of current climates for a group under evaluation, then that information can improve our understanding of the evolution of climate envelopes and the paleobiogeographic reconstruction of species.

Data for Integration
Multiple data types are available for integration of paleontological and modern data and methods (Fig. 1). Data have been made more readily available through compilation of databases and accessible data portals (Uhen et al. 2013). Some of these data portals include paleontological and modern data, such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org). Others focus more closely on the compilation of specific modern or paleontological datasets. For example, iNaturalist is an online social network that compiles modern observations of biodiversity around the world but currently is heavily biased in observations from Europe and North America (https://www.inaturalist.org). The Neotoma Paleoecology Database is a community database that compiles information about fossil data from the Pliocene to the Quaternary (www.neotomadb.org). The Paleobiology Database compiles data of fossil occurrences within collections that span all geologic ages (https://paleobiodb.org/#). GBIF compiles many of these more focused databases, yet not all information associated with fossil sites and occurrences are processed through to GBIF. This section explains data requirements for PPGM, where to find primary data, how some data are derived, and how other data are modeled. Each section addresses associated assumptions and uncertainties.
Modern Occurrence Data.-Modern occurrence data are recorded observations of individual organisms often taxonomically identified to the species or subspecies level at specific geographic places and times. Occurrence data are systematically collected through surveys or, more often, opportunistically collected through incidental observations. Data are housed in museum collections with vouchered specimens or in online databases. GBIF is one of the most comprehensive online databases that collates and stores locality data for Earth's biodiversity obtained from numerous museums and observation networks.
However, the often-incidental nature associated with many observations produce biases in these primary biodiversity data through space and time (Boakes et al. 2010;Beck et al. 2014) and there are notable gaps in distributions globally (Yesson et al. 2007;Collen et al. 2008).
Methods have been developed that attempt to account for bias in occurrence data. Those include subsampling the available occurrence data in geographic space (Hijmans 2012;Boria et al. 2014) or in environmental space (Varela et al. 2014) and weighting occurrences based on sampling effort (Stolar and Nielsen 2015). Environmental filtering, systematically subsampling occurrence data based on position in environmental space, is preferred to geographic filtering, systematically subsampling occurrence data based on position in geographic space, because environmental predictors are typically used to build climate envelope models, species distribution models (SDMs), or ecological niche models (ENMs) for species, and those are the relevant axes to deal with observation bias. In either case, bin or pixel sizes used to subsample observations influence the number of retained samples and influence model performance (Castellanos et al. 2019). Weighting subsamples of the occurrence data based on sampling effort is known to improve model predictions (Stolar and Nielsen 2015), so calibrating model evaluation statistics with a null model (Hijmans 2012), deriving a proxy variable for sampling effort (Fithian et al. 2015), or sample weighting as the inverse probability of sampling (Stolar and Nielsen 2015) are other useful ways forward.
Sampling bias, among other factors such as biotic interaction and available climate in geographic space, contributes to the incomplete characterization of climate envelopes of species, and incomplete characterization has been shown to bias parameter estimates in evolutionary models (Saupe et al. 2018). This FIGURE 1. Example of some data types for integration in paleophylogeographic species distribution models (PPGMs). A, Example of modern species occurrence data as dark blue points. B, Example of climate envelope in light blue surrounding the dark blue occurrence points mapped into a 3D climate space. The gray swath of points represents all other climate combinations in North America. C, The geographic locations of the points that occur within the light blue climate envelope mapped onto a paleoclimate model of the last glacial maximum. Light blue points are the occurrences that are within the light blue climate envelope in B. D, Simple three-species phylogeny with a red and blue point indicating two extant tip taxa of interest; the purple node is a hypothetical ancestor. E, An example of mapping the climate envelope of the red species and the blue species in a 3D climate space. F, An example of a reconstructed climate envelope of an hypothetical ancestor using phylogenetic comparative methods modeling the limits of climate envelopes. (Color online.) problem is exacerbated by anthropogenic influences on the ability for species to occupy their full range of climates (Pineda-Munoz et al. 2021). Correcting sampling bias in occurrence records has not yet been widely incorporated in climate envelope modeling, nor in PPGM-type models. The typical reasoning for using these simple modeling schemes is to allow for flexibility in the covariation of climates within climate envelopes and to attempt to more completely characterize certain aspects of a species' climate niche, in terms of minimum and maximum tolerances of climate, rather than allowing incomplete characterization to drive the relationships established between occurrences and climates. Regardless, it will be a fruitful path forward to carefully consider sampling bias and its implications for climate envelope modeling and PPGM.
Fossil Occurrence Data and Age Ranges.-Fossil occurrence data and age ranges stem from recorded observations of remains of organisms, their excrement, or their tracks, documenting presence at a particular geographic location and within a particular time range. Fossils representing occurrences can be fragmentary, weathered, or morphologically distorted through death, transport, deposition, and the fossilization process. However, it has been shown that fossils are rarely transported out of their original life habitats. Many species with robust parts (e.g., bones or shells) are found in death assemblages with high fidelity to their rank abundance at which they are found in life assemblages, and time-averaging of fossil assemblages prevents short-term seasonality or yearly signals of variation (Kidwell and Flessa 1995). Thus, fossils provide meaningful information on ecological and evolutionary dynamics in shallow and deep time.
Taxonomic assignments of fossils are often easier to make at the genus level, rather than the species level, at least for many groups of vertebrate fossils, so many more fossils will be included in an analysis if genus-level identifications are allowed in a dataset. In fact, many paleontological studies use genera as a unit of study (Polly and Spang 2002), but it has been debated whether insights gained from analyses with genera "trickle down" to the species level and enhance our understanding of evolution (Hendricks et al. 2014). At the species or genus level, information about the paleoenvironment or paleoclimate associated with fossil occurrences can provide valuable information about where species lived in the past and can alter our understanding of the biogeographic history of a group (Meseguer et al. 2015;Lawing et al. 2016).
Another piece of critical information gained from fossil occurrence data is the estimated geologic time when the organism died or when the dead organism was deposited into a depositional environment. There are many strategies for numerical and relative dating of fossil deposits (Elias 2015), as well as agedepth models for inferring age in deposits that were not directly dated (Blaauw and Christen 2011). Estimates of geologic age are typically derived from fossilized organisms or from the sedimentary deposits where fossils were found. The sedimentary deposits are either dated or correlated into a time-calibrated stratigraphic column. For the purposed of integrating fossil occurrences with modern occurrences, it is useful to extract an age range from fossil occurrences, that is, the maximum and minimum possible geologic ages of a fossil.
There are multiple databases hosting information about the locality and deposits associated with occurrences of fossil specimens. A review of these sources for vertebrate fossils documents the history and development of multiple database efforts and how they interrelate and provides information on their nature and history (Uhen et al. 2013). Some of the databases discussed in that review include other types of data. For example, the Neotoma Paleoecology Database holds community-curated data in a data model framework that supports any type of paleoecological and paleoenvironmental data from sedimentary archives (Williams et al. 2018).
Modern Climate Data.-Modern climate data are derived from weather stations across the globe. Weather stations systematically record the minimum temperature, maximum temperature, and precipitation on a daily basis. The temperature values are averaged within each month for 12 monthly estimates of minimum and maximum temperature, and the precipitation values are summed within each month for 12 monthly estimates of total precipitation, resulting in 36 variables representing 1 year of temperature and precipitation measures. Often these 36 variables are averaged across multiple years (Hijmans et al. 2005). Because weather is variable from year to year, it is useful to derive variables from these 36 measures that summarize the general climate patterns and that may be biologically meaningful for species (Nix 1986;Booth et al. 2014).
Weather stations are not uniformly distributed across the globe, so high-resolution interpolation has been used to estimate climate data for points where no primary climate information is available (Hutchinson 1991). Biases are introduced into the dataset from the choice of interpolation method and from the geographic bias in placement of weather stations. Because I am concerned here with comparing modern climate data with paleontological climate data, the variation produced from the biases in the modern climate data is low to negligible when compared with the variation in modeled climate data from the paleontological record.
Although the calendar months are a useful standard to summarize and store climate data, calendar months are not consistently biologically meaningful to species. For example, minimum temperature in January in Canada and Australia do not mean the same thing for species experiencing their climate environment (i.e., a minimum temperature value in a cold month compared with a minimum temperature value in a warm month). Nix (1986) developed a framework, termed BIOCLIM, to combine the 36 climate variables into 19 biologically meaningful variables. The 19 variables represent means and extremes of temperature and precipitation at monthly, quarterly, and annual temporal scales. They have been used extensively in studies of species distribution modeling and as predictor variables for other biodiversity assessments. See Booth et al. (2014) for further explanation of deriving BIO-CLIM variables and Hutchinson et al. (2014) for climate interpolation.
Paleoclimate Data and Models.-Climate information from the geologic record is usually documented from tree rings, corals, ice cores, and sediment deposits (Fritts 1991;Evans et al. 2002;Jones et al. 2009). Just as is the case for weather stations, many climate proxies from the geologic record are geographically unequally distributed. But there are far fewer primary data extracted from the geologic record than there are weather stations, so interpolation techniques for estimating the geographic distribution of climate in the past are not enough. GCMs of the ocean and atmosphere model the geographic distribution of modern, future, and past climates (Randall et al. 2007). These models use atmospheric and ocean circulation process modeling combined with knowledge of exogenous forcing and boundary conditions from the geologic record to anchor model behavior. Important forcings include orbital changes, solar irradiance, explosive volcanicity, land surface characteristics, and aerosols (Jones et al. 2009). GCMs are calibrated over many time steps, and they often record minimum temperature, maximum temperature, and precipitation at each temporal step in the model; and thus, those variables can be summarized as the BIOCLIM suite of 19 climate variables. See Nix (1986) and Booth et al. (2014) for an explanation of how to convert minimum temperature, maximum temperature, and precipitation variables to BIOCLIM variables.
Because GCMs are computationally intensive, we do not yet have comprehensive models of climate through all geologic time and space. At the global scale, GCMs are typically low resolution, and finer resolution GCMs have been developed by downscaling models using various techniques (Wilby and Wigley 1997). There are many GCM algorithms and boundary conditions, and each produces a different estimate of climate, so it is important to incorporate modeled climate data from multiple sources. There are several initiatives to calibrate GCMs and paleo-GCMs to make models more comparable, such as the Paleoclimate Modeling Intercomparison Project (Jungclaus et al. 2017;Kageyama et al. 2017Kageyama et al. , 2018Otto-Bliesner et al. 2017).
Many of the modeling results describing modeled spatial and temporal variation in paleoclimate are provided as part of a publication. Links to available modeling results are also compiled on relevant websites, such as on the network of websites documenting the Paleoclimate Modeling Intercomparison Project. In addition to searching the web for modeling results, it is important to search through the literature for GCMs within relevant time intervals of interest. The results files for the GCMs may be made available from the corresponding authors. Some recent efforts have provided fine-resolution paleo-GCMs for time periods that have been less available to the research community. One example is the Paleo-Clim database, providing free, easily accessible, high-resolution paleoclimate surfaces of global terrestrial areas (Brown et al. 2018).
Phylogenetic Data.-Phylogenetic information provides the hierarchical structure to cross taxonomic scales and integrate paleontological and modern occurrence data (Felsenstein 2004). In phylogenies, tips and nodes are linked together by branches, depicting a hypothesis about the relationship between tips or their topology. The relationships are modeled based on molecular or morphological similarities between tips. Tips are the operational taxonomic units used in a study; for studies on modern species, these are typically species, subspecies, or populations, and for studies on ancient species, these are typically species, genera, or even families. Nodes represent hypothetical ancestral taxa. Ultimately, it is important to understand how closely related to each other species and genera are and who is most closely related to whom. That information can be extracted from phylogenies in the form of topology and branch lengths.
To obtain phylogenetic information for the organisms of interest, one can build phylogenetic hypotheses or use phylogenetic hypotheses that have already been established. Baum and Smith (2013) and Lemey et al. (2009) provide an introduction to building phylogenies and phylogenetic analysis. Numerous phylogenetic studies have been published, and their resulting phylogenetic hypotheses are typically available as supplemental information. Treebase is an online database that hosts phylogenetic information and is a good resource for published phylogenies (Piel et al. 2000).
Often there are differing hypotheses from phylogenies built with different combinations of molecular and morphological data (Hillis 1987;Shaffer et al. 1997;Larson 1998;Swalla and Smith 2008), as well as differences in phylogenetic hypotheses when both modern and ancient operational taxonomic units are included in the analysis (Novacek 1992;Eklund et al. 2004;O'Leary and Gatesy 2008). Because there often is contention around which phylogenetic topology is best supported, it is important to collect multiple phylogenetic hypotheses and repeat analyses to gain an understanding of the range of potentially different results due to phylogenetic uncertainty. In addition, within the framework of PPGMs, for integration with fossil occurrence data and for projection onto relevant paleoclimate maps, timecalibrated phylogenies are required.

Under the Hood
Multiple methods are required for integration of paleontological and modern data within the context of PPGM. This section explains how six methods for integration work. Several of these methods, such as ecological niche modeling, species distribution modeling, and phylogenetic comparative methods, are massive fields and have had many articles and books written about them. Here, I intend to briefly introduce each method and highlight the relevant information required and considerations needed for integration in PPGM.
Modeling Ecological Niches and Species Distributions.-Ecological niche modeling and species distribution modeling typically begin with the practice of compiling information on species occurrences, associating climate or environmental data with occurrences, applying an algorithm to estimate some suitable climate or environmental space that is or probably could be occupied by a species (i.e., estimating the climate or environmental niche) (Peterson et al. 2011). One then uses the parameters from that algorithm to project a potential distribution of a species into geographic space. The majority of ENMs and SDMs are correlative in nature, as they are often based on incidental observation data and associate occurrences with predictor variables (Elith and Leathwick 2009). Many algorithms have been described for the association of occurrences to predictor variables (Elith et al. 2006), multiple algorithm projections have been combined to reduce uncertainty in projections (Hao et al. 2019), and different algorithms have been shown to be appropriate in different situations (Elith and Graham 2009).
There are many good review papers and books that provide an introduction and review of species distribution modeling and its associated concepts of ecological, environmental, and climate niches (Austin 2007;Elith and Leathwick 2009;Franklin 2010;Peterson et al. 2011;Maguire et al. 2015). These overviews and reviews detail the many considerations that are required when modeling a species' niche and its geographic distribution. More recently, guidelines have been developed to help researchers evaluate the quality of species distribution modeling studies and to help systematically account for all of the steps involved in building SDMs (Sofaer et al. 2019). I follow the recommendation of Peterson and Soberón (2012) and recognize that SDMs are inclusive of ENMs, but see Warren (2012) for further consideration of this topic. In this paper, when referring to a niche (ecological, climate, environmental, or otherwise), I am using the term consistent with a Hutchinsonian niche concept, which recognizes there is an n-dimensional hypervolume made up of biologically important axes that quantify where a species can live (Hutchinson 1957). I will use the term "climate" or "environmental niche" to explicitly refer to the type of predictor variables being used in conceptualizing the niche model. It is important to point out these practical aspects of terminology because of contention over the use and misuse of terminology and associated con- To integrate paleontological and modern data in a phylogenetic framework, and specifically for use in PPGM, rectilinear climate envelope models have been used due to their simplicity and fidelity to the Hutchinsonian niche concept (Graham et al. 2004;Yesson and Culham 2006;Lawing and Polly 2011). The rectilinear climate envelope model is one way to characterize niche dimensions for use in projecting potential species distributions into geographic space. This method extracts a range from each climate or environmental variable associated with occurrence data, either maximum and minimum or some subset of it, such as 5 th and 95 th percentiles, and considers the climate within that envelope suitable for the species being modeled. In geographic space, any point that fits within the ranges of all the climate variables included in the climate envelope model is considered suitable for the species. One drawback to the climate envelope method is that it oversimplifies the ecological niche and potential geographic distribution of modern species. However, other algorithms that have been shown to perform well in characterizing the ecological niche and potential geographic distribution of a species, such as maximum entropy and boosted regression trees (Elith et al. 2006), have multiple parameter estimates and complicated associations or breakpoints between occurrences and predictor variables. So far, it has been unclear how to model their parameters along a phylogenetic tree in a phylogenetic comparative methods framework.
Projecting ENMs forward and backward in time has now received considerable attention, as models typically do not perform well under new conditions, which are known as non-analogue climate scenarios (Fitzpatrick and Hargrove 2009;McGuire and Davis 2013;Davis et al. 2014;Moreno-Amat et al. 2017). This problem is particularly relevant when projecting models to the past, when there was quite a bit of non-analogous climate compared with modern climates (Fitzpatrick and Hargrove 2009). One method to improve model projections is to incorporate fossil occurrences into ENMs along with extant occurrences (Varela et al. 2009(Varela et al. , 2011. This accounts for shifts in the realized niche of a species through time and is meant to more closely approximate its fundamental niche. In addition, directly projecting niche models built with modern data does not incorporate the potential for niche evolution. Thus, PPGMs and other methods have been developed to take into consideration the potential evolution of a niche and the vastly different climates in which the close relatives of modern species occur. Phylogenetic Comparative Methods.-Phylogenetic comparative methods are typically used to correct for non-independence of samples in comparative studies with multiple species (Felsenstein 1985), to study the processes of evolution and speciation among multiple species (Harvey and Pagel 1991), or to infer character states of hypothetical ancestral species (Martins 1999;Omland 1999). Biologists have typically used these methods to learn about the history of organisms by using modern information stored in species' DNA, and paleontologists have compared model results with fossil data to demonstrate model reliability and uncertainty (Polly 2001).
Brownian motion has traditionally been used to model the amount of expected evolutionary change, or accumulated variation, over a specified number of time steps (generations) with either no selection or randomly varying selection acting on a phenotype (Harvey and Purvis 1991). This is a one-parameter model that estimates evolutionary rate. Other models of evolution have been described that might more accurately represent the evolutionary history of a phenotype (Butler and King 2004;Boucher et al. 2014). Notably, the Ornstein-Uhlenbeck model has been used to model selection of a trait toward an optimum and might be particularly important for climate studies, as Lawing et al. (2016) showed that much variation in climate variables among species is best modeled by an Ornstein-Uhlenbeck process. The Ornstein-Uhlenbeck model is typically a twoor three-parameter model that estimates the evolutionary rate and the strength of selection (also known as the selection coefficient or alpha) toward a fixed optimum. If the optimum is not in the same location as the mean of the population, then the third parameter of the Ornstein-Uhlenbeck model is the location of the optimum. There are multiple review papers that introduce phylogenetic comparative methods and explain their various categories and uses (Miles and Dunham 1993;Martins and Hansen 1996;O'Meara 2012;Pennell and Harmon 2013;Cooper et al. 2016).
Phylogenetic comparative methods have been employed to study the evolution of a climate niche and physiological tolerances of organisms. To integrate these methods with ENMs, researchers considered parameters from ENMs (such as the maximum and minimum value of a climate envelope meant to represent a climate niche) as phenotypes for a species. These climate parameters are treated as species traits or phenotypes and regressed along phylogenies according to a specified model (or models) of evolution. Evolutionary parameters associated with the model, such as evolutionary rate, the selection coefficient, and the optimum, are estimated. These estimates are then used to reconstruct the histories of a climate niche.
Estimates of the history of a climate niche using only extant species information and their phylogenetic relationships will not allow for reconstructions outside the distribution of climate parameters among the tip taxa. This is a problem, because we know that even as recently as the last glacial maximum (26-19 ka) there was a reasonable amount of non-analogous climate, populations of species closely related to those that occur now also occurred during that time, and climate during that time does occur outside the climate envelopes of extant species. Thus, PPGMs and other methods have developed procedures to incorporate evidence of past climate that is geographically associated with fossil occurrence data by incorporating fossil occurrences into a phylogenetic reconstruction.
Anchoring Phylogenetic Comparative Methods with Fossil Occurrences.-There have been many efforts to incorporate fossil information to inform phylogenetic methods (Finarelli and Flynn 2006;Pyron and Burbrink 2012;Hunt 2013;Slater 2013;Slater and Harmon 2013). These typically focus on time-calibrating trees with fossil information (Felsenstein 2002;Pyron 2011;Ronquist et al. 2012;Bapst 2013) and tree building to incorporate total evidence from morphological and molecular data into character matrices to analyze and develop hypotheses about the relationships between species, extant and extinct (Williams 1994;Purvis 1995;Ronquist et al. 2012). Incorporating paleoecological or paleoclimate information associated with ancient species is an area that has been less explored, but it is important to consider, as the information associated with fossils allows us to anchor models in the past, as better proxies and GCMs provide more realistic reconstructions of the past climates species would have encountered.
Ideally, the species or genera associated with the modern and fossil occurrences being modeled would have one or more time-calibrated phylogenetic trees that incorporate all extant and extinct species in the study. In this case, regular phylogenetic comparative methods can handle incorporating modern and paleontological information about climate niches. There are occurrences in the fossil record that are assigned to extant species or genera. In the case of fossil occurrences assigned to extant species, one may incorporate the paleoclimate associated with the fossil occurrences directly into the ENM for the extant species. More often, at least for vertebrate species, fossil occurrences are assigned to a genus, but the species affinity is unknown.
One way to deal with the unknown placement of a fossil occurrence within a phylogeny is to repeat a randomization procedure for its placement, perform a phylogenetic comparative analysis, evaluate the model, and extract important parameter estimates Rivera et al. 2020). After this procedure is repeated many times, a distribution of important parameter estimates is available for comparison to the original phylogenetic comparative method performed with no fossil occurrences included. This anchoring procedure can be used to evaluate the usefulness of anchoring a phylogenetic comparative reconstruction with fossil occurrences. The fossil occurrences will only introduce noise in the analysis if they occur within the range of extant variation. However, they will provide useful insight into ancestral reconstructions if they occur in places with paleoclimate estimates outside the range of extant climates associated with modern occurrences (Fig. 2).
Coherent Models for Projection from Lineage Interpolation.-Ancestral reconstructions for phylogenetic comparative methods produce estimates for hypothetical ancestral nodes. Those nodes are located within the phylogeny at a place and time that depends on the amount of similarity between taxa in the study and not based on particularly important points in the geologic past. Thus, the estimated time of the ancestral node reconstructions do not necessarily line up with the time of the available paleo-GCMs. Matching ancestral climate estimates through lineage interpolation with paleoclimate interpolations for projection was a novel implementation from a PPGM-type analysis (Lawing and Polly 2011). Lineage interpolation uses the evolutionary parameters from best-fit models from a phylogenetic comparative analysis to interpolate along a branch (or lineage) between tips and nodes or between nodes and nodes. Estimates of a climate niche can be extracted from the lineage interpolation for any specified time since the most recent last common ancestor of a clade. These interpolation methods allow for the production of coherent time-calibrated models of a past climate niche to project onto an appropriate timecalibrated map of paleoclimate (Fig. 3). Paleoclimate Interpolations.-Paleoclimate interpolations use linear interpolations weighted by stable oxygen isotope values between climate extremes from geologically interesting end points modeled with paleo-GCMs, GCMs, or modern climate data  (Fig. 4). So far, these interpolations have used one global proxy of climate to proportionally adjust climate values between two or more extremes (Lawing and Polly 2011;Lawing et al. 2016;Gamisch 2019). The adjustment is applied uniformly across the globe. Other proxies for deep-ocean and surface temperatures include alkenones (Bard 2001) and Mg/Ca from benthic foraminifera (Billups and Schrag 2002), which have been used to successfully reconstruct global temperatures and could be explored as other proxies for paleoclimate interpolations.
Without a doubt, GCMs are preferable to paleoclimate interpolations, because they account for complex processes of ocean and atmospheric circulation. However, GCMs are computationally intensive and so have not been modeled for all time periods. Stable oxygen isotope ratios from benthic foraminifera record a global signal of changes in temperature and are useful proxies for changes in global climate (Zachos et al. 2001;Lisiecki and Raymo 2005;Cramer et al. 2009). There is a reasonable amount of variation between the climate estimates produced by some GCMs. The simple linear interpolation method, paleoclimate interpolations, shows less variation between an interpolated paleoclimate and two paleo-GCMs than between the two paleo-GCMs for a test period during the Holocene (Lawing and Polly 2011). A new suite of interpolated paleoclimate layers is available at 10 kyr time intervals back to 5.4 Ma at a spatial resolution of 2.5 arc-minutes (Gamisch 2019). However, the procedure could be improved by incorporating more GCM layers to anchor the FIGURE 3. Example of a three-species phylogeny with simulated climate profiles shown as histograms of mean annual temperature (MAT) at the tips of the phylogeny. Black arrows indicate the minimum and the maximum of each of the climate profiles for each of the species. A reconstructed range is mapped over the hypothetical ancestral node. There are three climate envelope reconstructions shown at three time periods along one lineage to indicate that the climate envelope can be interpolated between the reconstructed node and any tip taxon. interpolation to capture deeper time paleoclimate alterations. Gamisch (2019) also provides a detailed protocol for the paleoclimate interpolation procedure.
Multivariate Environmental Similarity Surface through Time.-Rectilinear climate envelope models identify whether geographic places fall within or outside a defined climate niche. Some studies projecting models built with only modern occurrences onto climates of the past find no suitable area for species (Rödder et al. 2013;Franklin et al. 2015). Instead of showing that no climate is suitable, it is often more interesting to determine how close the climate is to a climate envelope. Elith et al. (2010) developed a method, multivariate environmental similarity surface (MESS), to calculate how similar a suite of climate variables is to suitable. To calculate similarity between a reference set (here the set of observations occurring within a climate envelope) and each sample point in geographic space, the Euclidean distance is measured from the edge of each variable in the climate envelope to the particular value of the climate variable at the sample point and summed. MESS maps highlight the geographic areas that are within a climate envelope and the level of similarity of areas that are outside a climate envelope. MESS is particularly useful in evaluating PPGM predictions, because of the non-analogous nature of modeled past climates (Rivera et al. 2020).
Integration with PPGM.-Earlier, I described the various data types and methods that are required to build PPGMs for a group of species. Integrating this information into a framework to project species climate envelopes onto paleoclimate maps through time requires multiple steps. (1) Obtain and clean species occurrence data for all extant species included in the analysis.
(2) Obtain and clean all fossil occurrence data for relevant species or genera included in the analysis. (3) Obtain one or more timecalibrated phylogenetic trees. (4) Determine the relevant descriptors of the climate niche for all species in the analysis. (5) Calibrate SDMs for each species in the study using a rectilinear climatic envelope model to determine the maximum and minimum, or 5 th and 95 th percentiles, of relevant descriptors of the climate niche. (6) Add fossils into the phylogenetic trees according to the described randomization procedure or constrained to more appropriate locations in the phylogenies.
(7) Obtain paleoclimate information from GCMs for relevant time periods. (8) Extract relevant descriptors of the climate niche at fossil locations from paleoclimate maps. (9) Use phylogenetic comparative methods to estimate climate envelopes at hypothetical ancestral nodes. (10) Interpolate between node reconstructions and extant species at relevant time periods. (11) Project climate envelope reconstructions onto paleoclimate maps that have been aligned for each relevant time period for each lineage of the phylogeny. (12) Conduct post hoc comparisons of the projections to address biogeographic hypotheses, which might include the use of MESS to characterize the similarity of an entire paleoclimate surface to a specified climate envelope.
The circumstances under which this method is probably most beneficial is when there exist a reasonable amount of observation data and phylogenetic information for an extant species group and at least some fossils identified to belong within the crown group. In addition, groups that have good information on their physiological tolerances to climate will be particularly fitting. Over shallow time periods during the Quaternary, consideration of evolutionary change in physiological tolerances is likely less important due to shorter time for evolution and speciation to occur, so it would be less useful to go through the process of modeling phylogenetic changes when they might not influence projections of climate envelopes into paleoclimate space. This would be true for species that have time to speciation occurring over millions of years, but it would not be true for species that have shorter time to speciation. At deeper time periods, it is essential to consider species evolutionary change.
Caveats with this methodology include the assumption that the climate niche evolves, that we can capture the evolution of the climate niche using parameters associated with its distribution, and that those parameters are related to physiological requirements of a species (Meik et al. 2015). Climate data as a proxy for physiological tolerances are probably not adequate. Addo-Bediako et al. (2000) found that although species maintain little variation in upper thermal limits across their geographic ranges, they have more variable lower thermal limits that decline with increasing latitude in insects. Gouveia et al. (2014) show that upper thermal limits are related to the position of the climate niche in climate space but do not relate to the maximum temperature extracted from the geographic range of anurans.
One way forward is to use principles of biophysical or physiological ecology to model the climate niche of species (i.e., mechanistic models), instead of the climate envelope models described here, which is considered a correlative approach to species distribution modeling. Some researchers have advocated using mechanistic models derived from species physiology to build algorithms to estimate the climate or environmental niche in place of the first two steps of a correlative SDM of collecting species occurrence data and associated climate or environmental data (Kearney and Porter 2009). This is an interesting path forward, because the physiological parameters might be considered phenotypes on which natural selection could act, more directly linking phylogenetic models with models of a species' distribution. However, mechanistic models require very specific physiological data for organisms, with extensive validation from the field and lab, where correlative models based on observational data will be more readily populated with much already available data.
Another caveat tangentially related to the caveats already presented is the incomplete characterization of the climate niche. Due to expected biotic influences on species geographic distributions and the variation in available climate space through time, occurrences of species are not expected to capture the full range of climates in which a species may be able to survive and reproduce. Saupe et al. (2018) investigate the effects of incomplete characterization of climate niches by modeling the evolution of a couple of climate niche variables in virtual species. They find that the incomplete characterization of niches increases rates of niche evolution and biases in the comparisons of evolutionary patterns between clades. They caution researchers to beware of these effects and to correct for them by estimating niche truncation. One way to check for niche truncation is to test whether species distributions are in equilibrium with modern climate (Araújo et al. 2005;Munguía et al. 2012). However, even if the species distributions are in equilibrium with modern climate, there remain potential gaps in climate space not occupied by available modern climate. If those gaps occur on the edges of species climate niches, then niche truncation could occur. Including younger fossils in the characterization of the climate niche might offer a more complete characterization of a truncated niche (Varela et al. 2009(Varela et al. , 2011. Even with these caveats, this method remains interesting to investigate and improve upon, because it provides an avenue for developing models of species potential distributions through time, while accounting for evolutionary and climate change (Rivera et al. 2020). The results of these models can also be harnessed to provide various expectations of past community composition, which could be compared with observed past communities. These investigations would improve our understanding of the effects of compositional changes and non-analogous compositions on our understanding of past ecosystem dynamics.

Conclusions
We can gain critical insight into biotic response to climate and environmental change by integrating modern and paleontological data, along with phylogenetic comparative methods, ecological niche and species distribution modeling, and paleoclimate interpolations. The approach described here could be broadly applied to integrative studies addressing questions about biota that cross spatial and temporal scales, including investigating biodiversity patterns, macroevolution, community assembly and disassembly, and ecological resilience. Study designs that iterate through divergent assumptions, such as parameters that emphasize niche evolution contrasted to niche conservatism, will result in a suite of possible outcomes that could be evaluated to gain insight into ecological and evolutionary processes governing the distribution of species and their responses to environmental change.
There are multiple study designs that will accommodate the integration of paleontological and neontological datasets. One approach to evaluate biotic response to environmental change is to use methods designed for paleontology and paleontological data to forecast biotic response and compare it with modern biodiversity data. Another approach is to use methods designed for modern observations and inference, project those back in time, and compare the projections with paleontological data. Although these are powerful approaches, especially for model validation, there are several considerations when making these comparisons; see Willig (2003) for a discussion on factors that limit our understanding of biodiversity in space and time. Importantly, the modeling procedure and validation dataset can be mismatched in spatial and temporal scale, so it may be unclear whether some validation procedure fails because the modeling does not accurately capture the important biological processes or due to the spatial and temporal mismatch of the datasets.
A third approach is focused on integrating modern and paleontological data into the same algorithmic procedures for making inferences through new methods development. This approach accommodates the inclusion of both paleontological and neontological data sources and specifically deals with aligning spatial and temporal scales for integration. PPGM-type modeling relies on this third approach, and the associated methods still require multiple aspects of development. The most pressing development needs include understanding how to better characterize niches of species, how those relate to genus occurrences in the fossil record, and how to best incorporate phylogenetic modeling for complicated niche characterization algorithms. It is also critical to better understand the link between physiological ecology and climate tolerances and to further investigate whether and how climate niches evolve.
Modeling species potential distributions in deep time also remains challenging due to the available occurrence data. For many species, there is sparse coverage of spatial and temporal occurrences in the fossil record. Kemp and Hadly (2016) highlight the taxonomic biases present in available data. Targeted sampling will be required to gain more comprehensive coverage for some species. In addition, we need more paleoclimate general circulation models to describe distributions of climate through time and to help interpret the geographic distribution of ancient climate availability. So far, PPGM-type modeling has been applied to only a couple groups of squamate reptiles and to North American chelonians. It is important to extend the application of these methods to species groups with more numerous fossils and with more taxonomically resolved fossil identifications.
Many of the biological and paleontological data we rely on for these modeling efforts are supported by natural history collections (Cook and Light 2019). But natural history collections are struggling, as they are underfunded and undersupported, and many important collections have even been closed (Dalton 2003;Schilthuizen et al. 2015). In addition, there is a dearth of researchers depositing new specimens into collections (Turney et al. 2015;Salvador and Cunha 2020). We need more support for natural history collections in the twentyfirst century and more support for new users and depositors of voucher specimens (Miller et al. 2020).
Despite the complexities and caveats, it is useful to continue to develop ways to further integrate data and methods across the biologypaleontology spectrum. These methods allow us to meaningfully incorporate paleoclimate data associated with fossils into phylogenetic comparative analyses to anchor reconstructions and better gauge the evolutionary tempo and mode of climate tolerances. They allow us to test current biogeographic hypotheses and develop new suites of hypotheses to better understand geographic shifts in species distributions in response to past global change events. In addition to providing insight into ecological and evolutionary processes that support biodiversity, these past modeled responses may serve as a comparison to recent, modern, and future projected responses to global change.
Acknowledgments I would like to thank J. Lamsdell and C. Congreve for the invitation to speak at the GSA symposium titled "Phylogenetic Paleoecology: Macroecology within an Evolutionary Framework" and for editing this volume of papers presented at the symposium. I would also like to thank J. Lamsdell for his encouragement to write this article and two anonymous reviewers for providing insightful feedback that improved this article. This work was partly supported by the USDA NIFA Hatch TEX09600 project 1003462 and 1020451 and by the Integrative Climate Change Biology and Conservation Paleobiology in Africa programs of the International Union of Biological Sciences.