The Fractional MacroEvolution Model: a simple quantitative scaling macroevolution model

Non-technical Summary Is it environment or life that drives macroevolution? A recent analysis of the massive paleobiology database argues that the answer depends on the timescale. At short timescales, less than 40 million years, it is the environment, at longer timescales, life can effectively adapt. Both the environment and life are scaling—they fluctuate over the full range of scales from millions to hundreds of millions of years (the megaclimate regime). In this paper, we present a simple model of this scaling “crossover” phenomenon. The model has some unusual features: it is fully random and is based on fractional (rather than classical integer-ordered) differential equations. The model is driven by temperature (a proxy for the environment) and the turnover rate (a proxy for life); it has two exponents, a cross-over time and two correlations, yet it is able to reproduce not only the statistics of the temperature, diversity, extinction, origination, and turnover rates, but it also effectively reproduces the pairwise correlations between them, and this over the whole range of timescales. If forced deterministically, it gives the response to bolide impact or other sharp forcing events. Abstract Scaling fluctuation analyses of marine animal diversity and extinction and origination rates based on the Paleobiology Database occurrence data have opened new perspectives on macroevolution, supporting the hypothesis that the environment (climate proxies) and life (extinction and origination rates) are scaling over the “megaclimate” biogeological regime (from ≈1 Myr to at least 400 Myr). In the emerging picture, biodiversity is a scaling “crossover” phenomenon being dominated by the environment at short timescales and by life at long timescales with a crossover at ≈40 Myr. These findings provide the empirical basis for constructing the Fractional MacroEvolution Model (FMEM), a simple stochastic model combining destabilizing and stabilizing tendencies in macroevolutionary dynamics, driven by two scaling processes: temperature and turnover rates. Macroevolution models are typically deterministic (albeit sometimes perturbed by random noises) and are based on integer-ordered differential equations. In contrast, the FMEM is stochastic and based on fractional-ordered equations. Stochastic models are natural for systems with large numbers of degrees of freedom, and fractional equations naturally give rise to scaling processes. The basic FMEM drivers are fractional Brownian motions (temperature, T) and fractional Gaussian noises (turnover rates, E+) and the responses (solutions), are fractionally integrated fractional relaxation noises (diversity [D], extinction [E], origination [O], and E– = O– E). We discuss the impulse response (itself representing the model response to a bolide impact) and derive the model's full statistical properties. By numerically solving the model, we verified the mathematical analysis and compared both uniformly and irregularly sampled model outputs with paleobiology series.


Introduction
Several centuries of paleontological research revealed that the evolution of life on Earth is characterized by high temporal complexity characterized by periods of sluggish and predictable evolution (Jablonski 1986;Casey et al. 2021) with mass extinctions characterized by selectivity that is low or different in kind than in "background intervals" (Raup 1992a(Raup , 1994;;Payne and Finnegan 2007).There are also mass evolutionary radiations that are sometimes contemporaneous with mass extinctions (Cuthill et al. 2020).Moreover, the factors and modes of macroevolution apparently vary with time-for example, the Cambrian explosion or Ediacaran-Cambrian radiation and post-Cambrian evolution (Gould 1990;Erwin 2011;Mitchell et al. 2019); environment (Jablonski et al. 2006;Miller and Foote 2009;Kiessling et al. 2010;Boyle et al. 2013;Spiridonov et al. 2015;Tomašových et al. 2015); and timescales (Van Dam et al. 2006;Spiridonov et al. 2017b;Crampton et al. 2018;Beaufort et al. 2022).Furthermore, macroevolution is strongly influenced by Earth's systems: geological, climatic, and paleoceanographic factors (Marshall et al. 1982;Lieberman and Eldredge 1996;Lieberman 2003;Saupe et al. 2019;Carrillo et al. 2020;Halliday et al. 2020), but also by biotic interactions, which can translate into patterns that are apparent on extremely long timescales of tens to hundreds of millions of years (Vermeij 1977(Vermeij , 2019;;Jablonski 2008;Erwin 2012).There are also questions on the role of general stochasticity and path dependence/memory in evolutionary dynamics (Schopf 1979;Hoffman 1987;Gould 2001Gould , 2002;;Cornette and Lieberman 2004;Erwin 2011Erwin , 2016)).The question is: can we reconcile in a single simple model this multitude of hierarchically organized and causally heterogenous processes producing macroevolutionary dynamics, while maintaining simplicity and conceptual clarity?Here, we argue that we can.
The development of large, high temporal resolution databasesboth of past climate indicators (Veizer et al. 1999;O'Brian et al 2017, Song et al. 2019;Grossman and Joachimski 2022) and of paleobiological information, such as the Paleobiology Database (PBDB; Alroy et al. 2001Alroy et al. , 2008) ) or NOW (Jernvall and Fortelius 2002;Žliobaitė et al. 2017;Žliobaitė 2022)-is transforming our understanding of macroevolution.Time series are frequently long enough that they can be studied systematically, not just as chronologies to be compared with other chronologies, but as functions of temporal scale, that is, the behavior of their fluctuations as functions of duration (or equivalently, their behavior as functions of frequency).
Before attempting to understand processes at specific timescales, it is important to understand their context, that is, the dynamical regime in which they operate.Dynamical regimes are objectively defined by scaling; they are regimes over which fluctuations are scaling (see the review by Lovejoy [2023]).By definition, a scaling regime is one in which fluctuations ΔT (in some quantity such as temperature) are of the form ΔT(Δt) ∝ Δt H , where Δt is duration, or "lag," scale, and H is an exponent.If such power law relationships hold (typically for various statistics), then long-and shortduration events/fluctuations only differ quantitatively; they are qualitatively the same.This is because over such a regime, longduration fluctuations, ΔT(λΔt), at scale, λ Δt ( λ > 1), are related to the shorter-duration fluctuations, ΔT(Δt), by: ΔT(λΔt) = λ H ΔT(Δt), that is, the fluctuations at different timescales differ only in their amplitudes, λ H (with the equality understood in an appropriate stochastic sense).In addition, we can already distinguish the qualitatively different types of regime by the sign of the exponent H. H > 0 implies that fluctuations increase with scale and can appear nonstationary, whereas H < 0 implies that they decrease with scale, they appear to converge.
An important consequence for our understanding of deeptime biogeodynamics-here understood as the joint Earth-life systems-is the robustness of the "megaclimate" regime.The megaclimate regime was first discovered in benthic paleotemperatures (Lovejoy 2015), and it was characterized by "positive scaling" (a shorthand for H > 0) on the basis of long paleotemperature data from ocean core stacks (Veizer et al. 2000;Zachos et al. 2001), see also Lovejoy (2013) for the shorter timescale weather, macroweather, and climate regimes (up to Milankovitch scales).This implies that the difference between temperatures typically becomes larger and larger at epochs separated by longer and longer intervals of time.Theoretically, megaclimate is the hypothesis that there is a unique (presumably highly nonlinear) biogeological dynamical regime that operates over timescales spanning the range ≈1 Myr to (at least) several hundred millions of years.This would be the consequence of a unique (albeit complex, nonlinear) underlying dynamic that is valid over this wide range of scales; presumably it involves a scaling (hence hierarchical) mechanism that operates from long to short durations.A consequence is the existence of a statistical scaling regimes (notably of paleotemperatures), empirically verified throughout the Phanerozoic.While its inner scale appears to be fairly robust at around 1 Myr, its outer scale (the longest duration over which it is valid) is not known, although it appears to be at least 300 Myr.The megaclimate regime implies that the underlying biologyclimate dynamics are essentially the same over these timescales: that is, at long enough time scales the statistics are stationary (although up to the outer, longest, limiting scale of the regime, they may appear to be nonstationary).
The hypothesis that biology and the climate are linked and that climate is a crucial and defining variable in ecological and evolutionary turnovers (Vrba 1985(Vrba , 1993;;Eldredge 2003;Lieberman et al. 2007;Hannisdal and Peters 2011;Mayhew et al. 2012;Crampton et al. 2016;Spiridonov et al. 2016Spiridonov et al. , 2017aSpiridonov et al. , 2020a,b;,b;Mathes et al. 2021) is hardly controversial.However, the scope and utility of the megaclimate notion would increase if it could be backed up by direct analysis of paleobiological series, particularly of extinction and origination rates.This has now been done.A recent paper (Spiridonov and Lovejoy 2022), hereafter SL, found that genus-level extinction and origination rates exhibited scaling statistics over roughly the same range as the paleotemperatures confirming that the megaclimate includes these key macroevolutionary parameters (see Fig. 9, the left-hand side of the figure in the section "The Statistics of the Simulated Series Resampled at the Data Sampling Times," for a plot of the data used in SL and modeled in this paper).
The shortest scale of SL's paleobiological time series was closer to ≈ 3 Myr (the average age resolution was 5.9 Myr), which corresponds to the durations of the shortest PBDB stages-a standard shortest time resolution for Phanerozoic-scale global biodiversity analyses (e.g., Alroy et al. 2008;Alroy 2010b), although note that some sub-age data are available at resolutions closer to 1 Myr).Systematic reviews and multiple case studies revealed that even variously defined (molecular, morphological, phylogenetic, and taxic) evolutionary rates universally exhibit negative temporal scaling (H < 0) behavior (Gingerich 1993(Gingerich , 2001(Gingerich , 2009;;Roopnarine 2003;Harmon et al. 2021;Spiridonov and Lovejoy 2022), which suggests the universality of the temporal scaling-hence hierarchical-evolutionary dynamics.An inner megaclimate scale of ≈1 Myr was also proposed in Lovejoy (2015) and is discussed in the nonspecialist book Weather, Macroweather and Climate (Lovejoy 2019).The scaling, and thus by implication dominance of timescale symmetric hierarchical interactions, was also detected on multimillion-year timescales in sedimentation rates/stratigraphic architecture (Sadler 1981), sea level (Spiridonov and Lovejoy 2022), and dynamics of continental fragmentation (Spiridonov et al. 2022), which shows universality of the pattern in major Earth systems as well.Therefore, the timescaling patterns of evolution and megaclimate overlap at the very wide range of temporal scales (from ≈10 6 to > 4 × 10 8 yr), which motivates the development of quantitative models that explicitly tackle and integrate these timescale symmetries.
If macroevolution and climate respect wide range scaling, then it may be possible to resolve a long-standing debate in macroevolution.In terms first posed by Van Valen (1973), we may ask: are Fractional MacroEvolution Model evolutionary processes dominated by external factors-especially climate, the "Court Jester" (Barnosky 2001;Benton 2009)-or by life itself-the "Red Queen" (Van Valen 1973;Finnegan et al. 2008)?SL proposed a scaling resolution of the debate in which at scales below a critical transition time τ of ≈40 Myr, the climate process is dominant, but there is a crossover beyond which life (self-regulating by means of geodispersal and competition) is dominant.SL thus quantitatively concluded that at long enough timescales, the Red Queen ultimately overcomes the Court Jester.The scaling processes of the Earth system here are playing a double role (thus the Geo-Red Queen theory)-climate fluctuations growing with timescale cause perturbations in diversity to grow in size, but at the same time, at longer and longer timescales, fluctuating climates and plate tectonics cause the mixing and competitive matching of biota, thus effecting global synchronization that results in a crossover when an unstable and wandering diversity regime changes to a longer timescale fluctuation canceling or stabilizing regime (Spiridonov and Lovejoy 2022).
Physicists use the term "crossover" as shorthand to describe analogous phenomena involving processes that are subdominant over one scale range but eventually become dominant at longer scales.However, such transitions are typically modeled by Markov processes such that the autocorrelations are exponential, implying that at the critical timescale, the transition between two regimes is fairly sharp.In SL, on the contrary, in keeping with the basic megaclimate scaling dynamics, the crossover was postulated to be a consequence of the interaction of two scaling processes, that is, the transition is a slow, power law decay of one and the slow emergence of another.An analogous scaling crossover phenomenon was found in phytoplankton, in which the competing scaling processes were phytoplankton growth (with turbulence) and a predator-prey process of zooplankton grazing (Lovejoy et al. 2001).
SL argued that both macroevolution and climate respect wide-range statistical scaling, but that nevertheless, their quantitative and qualitative differences are significant, and these differences were the key to understanding the diversity (D) statistics that appeared to be involve a crossover between two different power laws.While temperature (T ) fluctuations vary with timescale, Δt as ΔT(Δt) ≈ Δt H T with H T ≈ 0.25, the corresponding laws for extinction (E) and origination (O) have H E ≈ H O ≈ −0.25.When H > 0, fluctuations grow with scale, such that the corresponding series tend to "wander" without any tendency to return to a well-defined value, and they appear "unstable."On the contrary, when H < 0, successive fluctuations tend to have opposite signs, such that they increasingly cancel over longer and longer timescales, and they fluctuate around a long-term value, thus appearing "stable." To deepen our understanding, it is necessary to build a quantitative model of the interaction of climate and life.In recognition of the strongly nonlinear nature of evolutionary dynamics, numerous deterministic chaos models such as predator−prey models (e.g., Huisman and Weissing 1999;Caraballoa et al. 2016) have been developed.Although extensions with some stochastic forcing exist, for example, in Vakulenko et al. (2018), the stochasticity is a perturbing noise on an otherwise deterministic system.In paleontology, the model of exponential (unconstrained) proportional growth of diversity was historically popular (Stanley 1979;Benton 1995) or expanded to include possible accelerations due to niche construction effects (a secondorder positive feedback)-a hyperbolic model (Markov and Korotayev 2007).These simple models of expansion were contrasted with single or coupled logistic models of resourceconstrained competitive macroevolutionary dynamics, sometimes also supplemented with random perturbations that account for effects of mass extinctions (Sepkoski 1984(Sepkoski , 1996)); or implicitly hierarchical, and also competition-constrained, Gompertz models (Brayard et al. 2009).However, such models assume that only a few degrees of freedom are important (typically fewer than 10), whereas the true number is likely to be astronomical.It is therefore logical to model the process in a stochastic framework (involving infinite dimensional probability spaces), where the primary dynamics are stochastic, using the scaling symmetry as a dynamical constraint.Therefore, there is growing recognition of stochastic models as essential tools for understanding macroevolutionary dynamics.Actually, some of the first models that tried to explain complexities of macroevolutionary dynamics were stochastic Markovian birth and death models (Gould et al. 1977;Raup and Valentine 1983;Raup 1985Raup , 1992b;;Nee 2006).Several recent applications of linear stochastic differential equations were used in causal inference of macroevolutionary drivers and competitive interactions between clades (Liow et al. 2015;Reitan and Liow 2017;Lidgard et al. 2021).
Beyond the realism of implicitly involving larger numbers of degrees of freedom, stochastic models have the advantage that they may be linear, even though the corresponding deterministic models may be highly nonlinear.Also, by the simple expedient of using fractional-ordered differential equations rather than the classical integer-ordered ones, stochastic models can readily handle scaling, which is rarely explicitly considered in macroevolutionary analyses.Fractional differential equations provide a natural way of defining processes that vary over a wide range of timescales.Although, we forced the model with a nonintermittent Gaussian white noise in this paper, in future, this can easily be replaced by strongly intermittent (multifractal) processes that are presumably necessary to realistically model the extremely intermittent behavior, implied, for example, by mass extinctions or thermal climatic events, which are observed in macroevolution and megaclimate respectively.Scaling processes are characterized by a slow (power law) decay of the memory, much slower than an exponential rate, such that values of a time series are nontrivially correlated, even if they are separated by long time periods.Therefore, the dynamics of the system are potentially conditioned not only on the current state of the system but also on its distant past.This is exactly the property that is exploited in constructing state-of-the-art descriptive and predictive models of long-memory phenomena at shorter timescales (weeks to years), namely macroweather forecasts, based on the Scaling LInear Macroweather Model (SLIMM; Lovejoy et al. 2015) and StocSIPS (Del Rio Amador and Lovejoy 2019Lovejoy , 2021;;Lovejoy and Del Rio Amador 2023).The same basic model using deterministic climate forcings produces climate projections to 2100, notably with much lower uncertainty than classical general circulation model (GCM) approaches (Hébert et al. 2022;Lovejoy 2022b;Procyk et al. 2022).
The useful scaling property of fractional equations arises because they have impulse response functions (Green's functions)-and hence solutions-that are based on scaling (power laws) rather than the exponential Green's functions associated with integerordered differential equations.In general, fractional derivatives are simply convolutions with power laws of different orders, and convolutions with different domains of integration define different types of fractional derivatives.In the fractional equations discussed in this paper, the particularly simple Weyl fractional derivative is used; in the frequency domain, it simply corresponds to a power law filter.Finally, it could be noted that although fractional derivatives were discovered several centuries ago, there has been an explosion of interest in them in the last decade or so, and many books on the topic are now available (e.g., Oldham and Spanier 1974;Miller and Ross 1993;Hilfer 2000;West et al. 2003;Petras 2011;Baleanu et al. 2012;Klafter et al. 2012;Owolabi and Atangana 2019).
In this paper, we therefore build a simple model for biodiversity (D) that can reproduce and explain SL's findings.The model is parsimonious: it has only two scaling drivers-the climate and life-and by construction it reproduces the observed scaling crossover at 40 Myr.Although the model has two basic exponents (climate and life) and two coupling coefficients between temperature and turnover and between turnover and diversity, a total of four basic parameters, it satisfactorily reproduces the fluctuation statistics of D, T, E, and O as well as the turnover (E + = O + E) and difference E − = O − E over the range of ≈3 Myr to several hundred millions of years (the longest scales available).The six variables imply 15 pairwise correlations, and the correlations that are implied by the model are not single values between pairs of variables at a unique timescale, but each correlation is a function of the timescale indicating an agreement between the model and data over a wide range of scales.
In the model, the driver of the life processes is turnover.As with many other properties of groups of biological individuals, turnover can be defined at many levels.Here we use the taxic approach to modeling and analysis, such that we consider macroevolutionary turnover at the level of genera.Similarly, as in the case of organisms from different species in populations, turnover of genera in the biota shows total perturbations to a given diversity state.Namely perturbation by subtraction (extinction) and addition (origination), which represent cases of creative destruction and the destructive creation respectively, which could work as stabilizing or destabilizing factors of the global diversity depending on the circumstances (Cuthill et al. 2020).Beyond this, it explains the 15 pairs of scale by scale fluctuation correlations over the same observed range.The data are from the SL paper; they represent stage-level time series of Phanerozoic marine animal genera O and E (second-for-third origination and extinction proportion [Alroy 2015;Kocsis et al. 2019] not standardized for the duration of stages), sample standardized using the shareholder quorum method (Alroy 2010a) genus diversity of Phanerozoic marine animals based on PBDB data (https:// paleobiodb.org).The paleotemperatures (T ) are also the same as in the SL paper, obtained from Song et al. (2019).The rates used in the analyses are proportions, per-lineage rates, which is a natural way of describing processes working on a per capita basis (such as evolution).
The inclusion of the maximal amount of data and the taxonomic precision are the essential trade-offs of any taxic macroevolutionary study.With sufficiently well-preserved fossils, it is often possible to make accurate genus-level identification (relatively high accuracy and robustness of identification).On the other hand, if the taxon can be identified only to the family level, as is often a case for multi-element taxa, probably very few remains were preserved, or in the case of single skeleton taxa, the preservation is not adequate.Therefore, the decision to use higher-rank taxonomic data far removed from the species level could increase the risk of including more noise to the data.We chose the genus level as a compromise, which is the closest taxonomic level to the species level, while on the other hand, covering more occurrences than the fossil record resolved to the species level.
Because rates of macroevolution originations and extinctions can be estimated in many ways, and their properties (accuracy, and precision) can vary depending on the properties of the sampling process and the evolutionary process itself, we performed a sensitivity analysis using Sepkoski's genus-level marine animal data and Foote's per-lineage rates (see Appendix 2).Analysis shows that despite the differences in datasets, their completeness, differences in standardization (Sepkoski's data are not sample standardized), and differences in extinction and origination rates used, the basic results pertaining to the scaling are nearly identical in both cases.Therefore, in this paper, we only discuss the more complete and sample-standardized paleobiological time series derived from PBDB data.Currently, the PBDB data are the best source for multi-lineage global-scale analysis of evolutionary patterns, despite the presence of possible distortions related to the uneven spatial sampling, due to objective geological heterogeneities of the fossil record and various historical and socioeconomic factors that significantly impacted the study of ancient life (Raja et al. 2022;Ye and Peters 2023).Future work toward a more even representation of the global-scale data will certainly improve the accuracy of diversity estimates; this in itself would be an interesting test of the model.
As a final comment, we should note that the basic-simpleststochastic crossover process is the fractionally integrated fractional relaxation noise (ffRn process), whose properties were only fully elucidated very recently (Lovejoy 2022a) in the context of longterm weather forecasts (Del Rio Amador and Lovejoy 2021) and climate projections (Procyk et al. 2022).The new model has conceptual commonalities with the environmental "stress model" of M. Newman that attempted to replicate the scaling statistics of extinction intensities of marine biota (Newman 1997;Newman and Palmer 2003).The model presented here is more sophisticated, as it ties the principal macroevolutionary variables-O and E-to a principal geophysical scaling process-the megaclimate-in producing realistic multi-timescale global dynamics of marine animal biodiversity, while keeping its conceptual simplicity in transparently using a few crucial parameters of time-scaling and correlations.The model is also implicitly hierarchical as implied by its scaling relations, and this is a desirable feature of a unified evolutionary theory (Eldredge 1985(Eldredge , 1989;;Gould 2002;Lieberman et al. 2007).

The Model
The Equations Wide Range Scaling.The SL picture is one where the extrabiological factors ("the climate") are scaling and drive biodiversity from ≈1 Myr to ≈40 Myr, where the crossover occurs, followed by the domination of biotic regulation at the longer timescales, which is also enabled by global homogenization of biota at long timescales caused by plate tectonics and changes in climate zones (Geo-Red Queen dynamics).The endemism and peculiarities or contingencies in evolutionary innovation occur at all times and places.Because evolutionary innovations are inherently unpredictable and can change carrying capacities of ecosystems and communities (Erwin 2012), their effects at the global scale of measurement of diversity are destabilizing: they act as random shocks.Because all innovations appear in certain times and places, it takes a certain time for the dispersion of innovations and the re-equilibration of the global-scale biosphere (adaptation of other taxa) following perturbations.The principal agent of mixing of biotas is plate tectonic motion (which also separates biota at shorter timescales).Therefore, the only way biota can readjust and equilibrate at the global scale in the light of multiscale local and regional evolutionary innovations is by means of geodispersal mediated by competition and co-adaptation.Endemicity exists at all times, but the duration of endemicity of concrete faunas is limited by unstable Earth dynamics that work toward homogenization (the same faunas or floras cannot be endemic forever).Larger taxa either disperse to their maximal capacity and persist (and become effectively cosmopolitan) or they go extinct.
The Basic Diversity Equation.Based on this picture, we propose the following Fractional MacroEvolution Model (FMEM).First we describe the model, then we comment on it.
The basic diversity equation is: where h is a (fractional) order of differentiation whose physical interpretation is given shortly, τ is the crossover timescale (≈40 Myr), and E + = E + O is the turnover anomaly, that is, the deviation of the turnover rate from its long-term average (in the model, E + can therefore be either positive or negative).For reference, it is defined on the right, where O and E are the anomalies of the origination and extinction rates with respect to their long-term averages (the diversity D is a similarly defined anomaly with respect to its long-term average).Whereas D and E + are already nondimensional, the temperature anomaly T must be nondimensionalized, for example, by the standard deviation of its increments at some convenient reference scale, say 1 Myr.s T and s E are constants that are determined by the coupling between T and D (s T ) and E + and D (s E ; see eq. 17).
The Drivers.The basic drivers are the climate (T ) and life (E + ), themselves driven by Gaussian white noises γ T , γ E : where α is the basic biology (extinction and origination rates) exponent (α ≈ 0.25 as deduced from SL's analysis), and h is the exponent difference (contrast) between the temperature and biology, from SL's analysis: h = 0.75 − α ≈ 0.5.As discussed later (eq.7 or eq.35), combined with the diversity equation (eq.1), these determine D. Equations ( 1) and ( 2) specify the model dynamics; see Figure 1 for an overall schematic.The derivatives are fractional; in this paper, we use the semiinfinite "Weyl" fractional derivatives.For the arbitrary function W(t), the ζ-ordered Weyl fractional derivative is defined as: where Γ is the usual gamma function, and s is an unimportant variable of integration.In this paper, the range of differentiation, These are shown at the right, because they have nontrivial properties, such that they are best modeled as the responses to more elementary causes-the temperature and turnover rate forcings ( f T , f E+ ).In the paper, we primarily discuss the simple case that reproduces the Paleobiology Database (PBDB) statistics, where these are Gaussian white noises ( f T = γ T , f E+ = γ E+ ), However, deterministic forcings such as bolide impacts are also discussed, shown here with both T and E + forced with a Dirac function of amplitude f 0,T , f 0,E+ , respectively.More general forcings can be used and their responses can be obtained using the impulse response functions.The middle line shows how the T, E + drivers determine the diversity (D).Finally, to complete (close) the model, we need a diagnostic equation that enables us to determine E − , E, O; this is shown in the bottom line.

380
Shaun Lovejoy and Andrej Spiridonov 0 < ζ < 1, is adequate, but more generally, if ζ is outside the range, equation (3) can simply be combined with ordinary integerordered differentiation to obtain the required result.Because fractional derivatives (and their inverse, fractional integrals) are, as in equation ( 3)-generally convolutions, different fractional operators are defined on different ranges of integration for the convolutions.The semi-infinite Weyl derivatives are particularly easy to deal with, because they are simply power law filters in Fourier space, see equation ( 12) (for more information on fractional equations, see, e.g., Miller and Ross 1993;Podlubny 1999).The γ's in equation ( 2) are Gaussian white noises; they are proportional to "unit" white noises, γ.Unit white noises have the properties: where the angle brackets indicate ensemble (statistical) averaging, and δ is the Dirac function.Equation ( 2) therefore implies that T, E + are fractional integrals of white noises.Depending on the values of the exponents, these are fractional Gaussian noises (fGns) and fractional Brownian motions (fBms) (Mandelbrot and Van Ness 1968; see the later discussion on the small-and large-scale limits).
Completing the Model, the Diagnostic Equation.Equations ( 1) and (2) determine D, E + , T. However, for the model to determine E and O, we need a final equation for E − : This is just the differential form of the usual discrete-time definition of diversity: D j+1 = D j (1 + O j − E j ), where j is a time index.τ D is the discretization time, the basic resolution of the series.Equation (5) plays no role in the dynamics, conventionally, it defines D (see eq. 34 for its Fourier expression).Mathematically, equation ( 5) is thus a "diagnostic equation," because it simply allows us to close (complete) the model by determining O, E: A schematic of the full model is given in Figure 1 showing its various parts, including the possibility of deterministic forcing discussed later (eq.12).

Discussion
Diversity as an ffRn.The diversity model was written in a nonstandard way (eq.1), because in this form, its basic behavior is transparent.When h > 0, the fractional term is the highestorder derivative; at high frequencies it therefore dominates the zero th -order (D) term, such that at short lags, Δt < τ, diversity fluctuations ΔD ∝ ΔT such that D follows the temperature.However, at low frequencies (Δt > τ), the zero th -order term dominates, and we have instead ΔD ∝ ΔE + .By inspection, the model therefore reproduces the crossover at lag τ, and the crossover will be scaling due to the scaling of T, E + (eq.2).The mathematical structure of the model is clearer if we substitute the drivers in terms of their own Gaussian forcings γ T , γ E (eq. 2), rewriting equation 1 as: The term on the right-hand side represents the total forcing of the diversity.(Note: d −α /dt −α is a fractional integral order α: for Weyl derivative and integrals it is the inverse of the α-order derivative d a /dt a ).
The linear combination of white noises s T γ T + s E γ E is also a white noise.The D equation is thus an order h-ordered fractional relaxation equation forced by an order α fractionally integrated white noise, that is, it is a "fractionally integrated fractional relaxation" process (ffRn; Lovejoy 2022a).The basic "unit" ffRn process U h,α (t) satisfies: where γ is the unit white noise defined earlier (eq.4), and we have used the fact that for Weyl fractional derivatives, fractional differentiation and integration commute.If time is rescaled (t→t/τ), we see (from eq.7) that D is proportional to U α,h .We note that if h = 1, the D equation (eq. 1) would be a classical relaxation equation, and if forced by a white noise (i.e., if α = 0), D would be a classical Ornstein-Uhlenbeck (OU) process.OU processes are thus the classical special cases of crossover phenomena involving high-frequency processes with exponential decorrelations (e.g., Markov processes) that lead to white noise behavior at low frequencies.They are currently the conventional approaches to the modeling and analysis of microevolutionary as well as macroevolutionary dynamics (Khabbazian et al. 2016;Bartoszek et al. 2017;Liow et al. 2022).
Deterministic Behavior: Impulse Response Functions.The D process-the solution to equation ( 7)-is the response of the operator to a white noise forcing.The general behavior of responses to linear operators is determined by their impulse response (Green's) functions G α,h that satisfy: (Lovejoy 2022a), where δ(t) is the Dirac ("delta") function.G α,h can be expressed in terms of "generalized exponentials" or Mittag-Leffler functions e h,h+α as: Fractional MacroEvolution Model At small t, the leading order term is therefore G a,h (t) ≈ t a−1+h G(a+h) .The large t (asymptotic) expansion is: 1999).Whereas the small t expansion is t α−1 times terms of positive powers of h, the large t expansion is in terms of t α−1 times terms in negative powers of h, with leading term G a,h (t) = t a−1 G(a) .Unless h = 0, G α,h (t) therefore transitions between two different power laws.The special case h = 0 that applies to the temperature and turnover forcings (eq.2), corresponds to the pure power law G a,0 (t) = t a−1 G(a) .G α,h has the property that if it is (fractionally) integrated ζ times, the result is just G α+ζ,h .As explained in Appendix 1, G α,h is useful for numerical simulations.
At a typical highest resolution of global datasets with timescales ≈ of 1 Myr, G α,h gives the deterministic response to forcings that are effectively impulses at this scale, for example, a bolide strike (Alvarez et al. 1980;During et al. 2022), supernova or gamma ray burst (Fields et al. 2020), or even much slower hyperthermal event such as the Paleocene-Eocene thermal maximum (Gingerich 2006;McInerney and Wing 2011) or the Cenomanian-Turonian event (Eaton et al. 1997;Meyers et al. 2012;Venckutė-Aleksienė et al. 2018), extensive volcanic eruption episodes, or other short-timescale (below ≈ 1 Myr) stressors.Figure 2 shows a plot of impulse responses for temperature and turnover demonstrating their singular nature for the empirical parameters estimated in SL (α ≈ 0.25, h ≈ 0.5).These singular responses combine apparently contradictory features: on the one hand, the falloff at short times following the impulse is very sharp, whereas on the other hand, the decay is very slow at long times, so its effects take a long time to disappear.The sharpness feature is desirable; because mass extinctions, and potentially other episodes of dramatic biotic change, effectively represent periods of almost infinite turnover rate, it is near instantaneous or singular-like (Foote 1994(Foote , 2005)).Indeed, the global stratigraphic stages and substages are based on the episodes of turnover.Figure 3 shows the impulse response for the diversity that has two different power law regimes: a high-frequency t α−1+h regime and a low-frequency t α−1 regime (the leading terms in eqs. 10, 11) with a transition at various scales τ indicated.As expected, the former (temperature dominance) corresponds to the singularity t −0.25 and the latter to the singularity t −0.75 (turnover dominance; both are shown in Fig. 2).Note that because the equations are linear, the impulse responses from the deterministic forcings shown in Figures 2 and 3 will be superposed onto the stochastic white noise-driven responses that we calculate in the section entitled "Numerical Simulations." The other aspect of the singular impulse responses is their long time decays that are much slower than conventional exponential decays and can be thought of as a kind of memory that characterizes the responses to both deterministic and stochastic forcing.Our model thus predicts that there will be long-term consequences of bolide impacts or other catastrophic events.This is in accord, for example, with the findings of Krug et al. (2009) and Krug and Jablonski (2012) that the Cretaceous-Paleogene (K-Pg) mass extinction caused by the effects of the Chicxulub asteroid impact changed long-term origination rates and their spatial distribution, a situation that persists today, 66 Myr after the event, in accord with this long-memory feature of the FMEM model.The slow decay in the response is also a desired property in modeling macroevolutionary dynamics, as it can replicate effects of phylogenetic inertia or "phylogenetic constraint" (Gould 2002) and also inertia of persistence of geological structures that can affect dynamics of biodiversity for periods, eras, or even eons.The genetic composition is basically the material trans-generational memory of biological individuals of all levels of the Linnaean hierarchy (Eldredge 1996).Therefore, extinction (or origination) of genera or taxa of higher ranks can change the functioning and the properties of the biosphere for hundreds of millions of years.The same goes for the formation of such structures as oceanic basins or continents with their myriads of possible configurations-their origins and disappearances impose new boundary conditions for geophysical and macroevolutionary dynamics for hundreds of millions of years (Nance 2022;Spiridonov et al. 2022).
We could further note that the long memory can be exploited to make future predictions.This is because for Gaussian forcing (eq.2), E + and the increments of T are longmemory fGn processes and-as discussed earlier-D is an ffRn.The predictability of the former has been mathematically studied in Gripenberg and Norros (1996) and has been exploited for monthly and seasonal forecasts in Del Rio Amador (2019,2021) and Lovejoy and Del Rio Amador (2023), and the predictability properties of ffRn processes have been studied in Lovejoy (2022).
Solving the Model.Readers only interested in results can skip ahead until the basic model equations (38,39) that are given at the end of the section "Full model statistics".
Fractional derivatives are generally convolutions (with power laws, eq.3), hence different ranges of integration in the convolution yield different fractional derivatives.Most often (e.g., the Riemann-Liouville and Caputo fractional derivatives), the latter are taken from time = 0 to t, in which case the initial conditions are important and dealing with them is technically somewhat complex.In these cases, the main tool is the Laplace transform.Here, however, we consider statistically stationary white noise forcing that starts at time = −∞.In this case, we can use the "Weyl" fractional derivative (a convolution from −∞ to t, eq. 3) whose Fourier transform ("F.T.") is particularly simple: where ω is the Fourier conjugate variable, that is, the frequency (when h is an integer, the formula may be familiar to the reader).
If we Fourier transform equations ( 1) and ( 2), we can write the model in matrix form as: (the single underlining indicates a vector, the double underlining, a matrix, the Fourier transform is denoted with a tilde).
As noted earlier, the D forcing is a linear combination of white noises (eq.7), such that the sum on the right-hand side of equations ( 7) and ( 13) is a correlated white noise.However, from the data (see fig. 5), we see that E + and T are themselves correlated.We therefore rewrite the model in terms of two statistically independent ( g 1 g 2 = 0) unit ( g 2 1 = g 2 2 = 1) white noise drivers γ 1 , γ 2 : So that: where σ T is the standard deviation of γ T , σ E of γ E , and ρ E is the T, E + correlation.Equation ( 14) is the standard Cholesky decomposition for two correlated random variables, noises.Fourier transforming equation ( 14) and using equation ( 13), we can write the model as: Where the parameters: depend on both the driver statistics (σ T , σ E , and ρ E ) and the model parameters s T , s E .While σ D does parametrize the amplitude of the diversity fluctuations, unlike σ T , σ E (which must be ≥ 0), it is not a true standard deviation: if s T < 0, it will be negative.Similarly, we will see that ρ D determines the D, E + , and T correlations but is not itself a correlation coefficient and depends on the sign of the ratio r.

Stochastic Response to White Noise Forcing
Scaling Processes: fGn and fBm.We are interested in the statistical properties of the solutions S(v).These can be expressed in terms of fGn, fBm, and ffRn processes.Before discussing the full statistics that include the cross-correlations, let us therefore discuss the statistics of each of the main variables individually.
Let us start with the scaling processes T, E + that are of the form: For the statistics, we can determine the power spectrum: where β X is the spectral exponent, and we have used the fact that the spectrum of a Gaussian white noise is flat: Fractional MacroEvolution Model E X (ω) is thus the basic form of the T, E + spectra (not to be confused with the extinction rate E).From the Wiener-Khintchin theorem, the (real-space) autocorrelation function R X (Δt) is the inverse transform: The technical difficulty is that due to a low-frequency divergence, the inverse transform of pure power spectra (eq.19) only converges for β X < 1 (i.e., α X < ½, H X < 0); this is the fGn regime appropriate for E + .Even here, R X (Δt) is infinite for Δt = 0.Because R X (0) is the variance, fGn processes are (like the white noise special case α X = 0) generalized functions that must be averaged (integrated) over finite intervals in order to represent physical processes.Averaging to yield a finite resolution process is adequate for After X is averaged over a finite resolution τ r , the result is Xτ r with root-mean-square (RMS) statistics X 2 t r 1/2 /t Hx r .Because H X < 0, the empirical statistics will be highly sensitive to the resolution τ r .
When α X ≥ ½, the low-frequency divergences imply that the X(t) process is nonstationary (the process generally "wanders off" to plus or minus infinity).However, for 1 < β X < 3 (i.e., ½ < α X < 3/2, 0 < H X < 1; this is the range appropriate for T: H T ≈ 0.25, β T ≈ 3/2), its increments are (stationary) fGn processes; this regime defines the fBm process.Finally, because all physical scaling processes exist over finite ranges of scale, there will be finite outer (longest) timescale (smallest frequency) such that truncating the spectrum at low frequencies (as for the ffRn processes, see the section "Two Scaling Regimes: fRn and ffRn") leads to an overall stationary process.
When analyzing paleo-series, it is convenient to analyze the statistics in real space, the main reason being that these are easier to interpret (the difficulty in interpretation is the cause of the recently discovered quadrillion error in climate temperature spectra [Lovejoy 2015]).An additional reason is that uniformly sampled paleo-series are typically not available: indeed, the geochronologies themselves are typically scaling (see Appendix 3 and Fig. A3.1 for more discussion).For the moment, the problem of spectral estimation from data with scaling measurement densities (i.e., scaling number of measurements per time interval) is an unsolved problem.
We have already noted that the autocorrelation functions are only adequate for H X < 0 (α X < ½, β X < 1), this is why, when 0 < H X < 1, it is conventional to define fluctuations using differences ΔX(Δt) = X(t − Δt) − X(t), which are stationary over this range.Differences avoid low-frequency divergences but will still have highfrequency divergences when H X < 0. To avoid problems at both the small scale (resolution dependencies) and large scales (nonstationarity), it is convenient to use Haar fluctuations.Over the interval Δt, the Haar fluctuation ΔX(Δt) is defined as the difference between the average of the first and second halves of the interval.
(valid for Haar fluctuations).Over the indicated range of parameters, the Haar fluctuations are stationary and are independent of the resolution.
Comparing equations ( 7) and (2), we find: Two Scaling Regimes: fRn and ffRn.From equations ( 8) and ( 9), the basic Fourier transforms of ffRn processes and their impulse responses are: The fractional relaxation noise (fRn) process is the special case where α = 0.The ffRn power spectrum is therefore: The full statistical properties of ffRn processes (including series expansions) are discussed in Lovejoy (2022); however, for our purposes, the low-and high-frequency scaling exponents are sufficient.For these, equation ( 25) yields: ("h" for high frequency, "l" for low frequency).To obtain the basic fluctuation statistics, it is sufficient to apply equation ( 22) to each regime separately.Indeed, more generally, "Tauberian theorems" (e.g., Feller 1971) imply that if the spectrum is a power law over a wide enough range, then the corresponding (second-order) real-space statistics will also be scaling.Therefore: Using the empirical values α ≈ 0.25, h ≈ 0.5, we see that E + is a fractional Gaussian noise and that T is an fBm process.Also, we find (cf.eqs.7, 27) that

The Full-Model Statistics: Spectra, Correlations
The Basic Model.The model is linear and has stationary Gaussian (white noise) forcing (T, E + ), therefore D, E − , E, O are also Gaussian, such that their statistics are determined by spectra and cross-spectra, or equivalently in real space (via the Wiener-Khintchin theorem), by the autocorrelations and cross-correlations: (the diagonal terms are the spectra of the components: R ii (v) = E i (v), the asterisk indicates the complex conjugate).In matrix notation: where the superscript T indicates the transpose, the asterisk indicates the complex conjugate, and we have used: The key correlation matrix (from eq.16) is: where and Completing the Model: The Diagnostic Equation for E − .Before writing down the final spectra, let us complete the system with the help of the diagnostic equation that allows us to determine E − from D (and hence E, O, eq. 6).
The Fourier transform of the diagnostic equation (eq.5) is: Therefore the full system is: From this, we can find E, O: The explicit formulae for E ± are: The overall final statistics are:

Fractional MacroEvolution Model
Using equations ( 36) and ( 37), the spectra of E, O can be determined: The far-right approximation can be seen from equation ( 37) using the fact that τ D is the resolution of the series, such that for the full range of empirically accessible frequencies, we have ωτ D < 1.In addition, because τ > τ D , the factor |ωτ D /(1 + (iωτ) h )| < <1.

Scaling Properties
High-and Low-Frequency Exponents.Given that both the model equations and the corresponding model statistics are both readily expressible in the spectral domain, it is tempting to empirically evaluate the model directly using spectra.Unfortunately, although spectra are commonly used, they suffer from various technical issues such as spectral leakage (the "smearing out" of spectral variance across a range of frequencies), effects that are important whenever strong spectral peaks are present.Problematic peaks occur not only in quasi-periodic signals (where most of the effort has been made [e.g., Springford et al. 2020]) but also scaling signals, especially when the spectral exponent β (eq.19) is significant (β = 0 for white noise [Hébert et al. 2021]).As we discuss in Appendix 3, the challenges for estimating spectra are much greater when the data are sampled at irregular intervals (as they typically are in paleobiology), and in particular, when the chronology itself (i.e., the temporal measurement density) is scaling (Appendix 3, Fig. A3.1).Indeed, the problem of how best to estimate the spectra of data with scaling chronologies (i.e., with holes over a wide range of scales) is still unsolved, and many nonstandard approaches give large biases.Even if one can carefully handle the technical aspects, spectra still have the difficulty that their interpretation remains problematic.This was dramatically illustrated when Lovejoy (2015) discovered that the iconic (and still frequently cited) spectrum of Mitchell (1976) was in error by a factor of 10 15 , an error that had not been noticed for four decades and was probably a consequence of spectra that are often not plotted with amplitude units at all or with undetected errors in the units.If Mitchell's spectrum had been accurate, two consecutive million year average Earth temperatures would only have differed by about 10 microKelvins (μK)-yet this patently false implication was not noticed because the units of the spectrum (K 2 yr) were not intuitive, whereas an RMS Haar fluctuation of 10 μK would have been obviously problematic.Even spectral updates as recent as 2020 are in error by a factor 10 11 (see the review by Lovejoy [2023]).The comparison of the fluctuation analyses in this paper with those of the spectra (Appendix 3) highlights the power of fluctuation analysis when applied to irregularly sampled data.
Fortunately, over scaling ranges, the fluctuation and spectral statistics are both scaling with exponents as indicated in the previous section.Therefore, to interpret the statistics (eqs. 38, 39) in real space, it suffices to use the fact that Fourier scaling implies real-space scaling and to use the above relations between real-space and Fourier scaling exponents (eq.22).In matrix form, the spectral exponents are therefore: (The elements correspond to T, E + , D, and E − , left to right, top to bottom).Using the relationship between H and β (eq.22), the high and low frequency (here small and large times, t) have exponents: (41) While at low frequencies, large Δt (i.e., large lags) we have: We should add here that because E, O are linear combinations of E + , E − , their exponents will the maximum of those of E + , E − , so that: We see that for the physically relevant parameters, H = α − ½ = −0.25 for both E, O, over the whole range (close to the data, see SL and Fig. 4).
To get a concrete idea of the implications of model, let us use the rough empirical estimates from SL of α = 0.25, h = 0.5.Plugging these values into equations ( 41) and ( 42), we obtain: (again, for T, E + , D, and E − , left to right, top to bottom).We can see that the Haar fluctuations will be useful for all the series over the whole range of frequencies/scales, the only exception being ΔD(Δt) at long lags (H l < −1, lower right corner of the H l matrix with H l < −1).In this case, the Haar fluctuations "saturate," and the spurious (limiting) value H l = −1 is obtained.
With these results, we can comment on the issue of the stationarity/nonstationarity of the statistics.In the real world, scaling regimes only exist over finite ranges of scale, they have finite inner and outer limits.However, the outer scaling limits in mathematical series may be infinite.In this case, scaling series with low frequency H l < 0 are stationary, whereas if H l > 0, they will be nonstationary (although even here, if H l < 1, the increments of the series will be stationary).From equation ( 44), we see that the only correlation with H l > 0 is for the temperature; hence all the evolutionary responses (including cross-correlations) are stationary.However, even the temperature will be stationary at timescales beyond the outer limit of the scaling regime (which we argue is at least of the order of the length of the Phanerozoic).That is why we prefer the term "wandering" when H l > 0: for such scaling processes, the nonstationarity may be spurious, the series only appears to be nonstationary over the observed (finite) range of scales.On a more fundamental note, no empirical series is stationary or nonstationary, the latter are properties of models, not data.
Normalized Correlations.The cross-spectra and crosscovariances (eq.38) can be used to determine the normalized correlations that were estimated in SL: (45) (using Haar fluctuations).However, from equations ( 41) and ( 42), we find that their exponents (whether at high or low frequencies) are 2H jk − (H jj + H kk ) = 0, that is, they are not power laws and only vary at sub-power law rates, and they are therefore nontrivial (i.e., they are significant) over the whole range of Δt.Because there are six series (T, E, D, O, E + , E − ) there are 15 pairs whose fluctuation correlations may be determined over the observed range of 3 ≈ < Δt ≈ < 400 Myr; see Figure 5.The key correlations are those that correspond to the model parameters: ρ E = ρ TE , ρ D = ρ TD , see equation ( 32).We can already see that the correlations are quite noisy, a consequence of the low resolution and variable sampling of the series.To make a proper model-data comparison, we therefore turn to numerical simulations.

Numerical Simulations
The Statistics of the Simulated Series The model has two fundamental exponents (α, h), two basic correlations (ρ E = ρ TE+ , ρ D = ρ TD ), and a crossover timescale τ. 388 Shaun Lovejoy and Andrej Spiridonov The third correlation (ρ DE ) is a derived parameter (eq.32).In addition, there are two amplitude factors, σ T , σ E , but these will depend on the nondimensionalization/normalization of the series; on log-log plots, they correspond to an up-down shift, and on (normalized) correlation plots, the normalization eliminates them; they will not be considered further.We used the results of SL to fix the values α = 0.25, h = 0.5, τ = 32 Myr.(This is the nearest power of 2 to the slightly larger-but only roughly estimated-value τ = 40 Myr in SL.Also, due to the scaling of the model, the more significant parameter is log τ and log 2 40 = 5.3, not far from log 2 32 = 5.)This leaves the only unknown parameters as the TE and TD correlations (ρ E = ρ TE+ , ρ D = ρ TD ; Fig. 5).
Before comparing the model directly to the (noisy) data, we checked that we were able to numerically reproduce the theoretically expected behavior.The basic modeling technique is to use convolutions with various (impulse response) Green's functions; this is detailed in Appendix 1, but follows the methods described in Lovejoy (2022).The main numerical problems are the small scales that have singular power law filters that are not trivial to discretize, and there are some (easier to handle) long-time (lowfrequency) issues.
Rather than attempting to rigorously determine optimum parameters, as indicated earlier, we fixed the exponents α = 0.25, h = 0.5, and the crossover scale τ = 32 Myr.With guidance of the Figure 5 correlations for ρ TE+ , ρ TD and some numerical experimentation, we took ρ E = 0.5, ρ D = −0.1 (hence ρ TE+ = 0.5, ρ TD = −0.1,ρ DE+ = −0.9;i.e., the sign of r was taken as negative, eq.32).We then performed simulations at a resolution of 250 kyr resolution, with simulation length of ≈4 Gyr (2 14 = 16,384 points), shown in Figure 6.This single very large realization has statistics that are close to those of an infinite ensemble.We postpone a discussion of the significance of the correlations to the section "The Statistics of the Simulated Series Resampled at the Data Sampling Times." According to the model (see the diagonal elements in eq.44), the only series with positive low-frequency scaling exponent (H l > 0) is the temperature (H l = 0.25), which indeed shows "wandering" behavior (second from the bottom in Fig. 6); from the figure, one can see its long-range correlations as low-frequency undulations.This is also true for D, but only up to the crossover scale (≈32 Myr), after which consecutive 32 Myr intervals tend to cancel (H l < 0, eq.44).The other series are, on the contrary, "canceling" (H l < 0, H h < 0) especially E − (eq.44).We can also visually make out some of the correlations, but this is clearer at lower resolution, as discussed later.
On these simulations, we can check that the theoretical scaling is obeyed; this was done using Haar fluctuations; see Figure 7 where the theory slopes (from eqs.43, 44) are shown as reference lines.Note that because the Haar analysis "saturates" at H = −1, the low-frequency H l = −1.25 value for E − (eq.44, lower right-hand diagonal element) yields a slope of −1 (not −1.25); however, the other slopes are accurately estimated.Note that the theory-simulation agreement is not perfect, mostly because the theory is for the average statistics over an infinite ensemble, whereas Figure 7 is from a single-albeit large-simulation.
We can also work out the 15 correlations as functions of lag (Fig. 8).The figure shows the model parameters ρ TE+ = 0.5 (= ρ E = 0.5), ρ TD = -0.1 (= ρ D = -0.1)as solid black reference lines, and the derived correlation ρ DE+ = -0.9(eq.32) as a dashed reference lines.Also shown are dashed theory lines for the TE, TO correlations (predicted to be equal to equal to TE + at long lags, eq.39) and the DE, DO correlations (predicted to be equal to DE + at long lags, see eq. 39).We can see that the correlations approach the theoretical correlations at large lags, although the results are somewhat noisy.

The Statistics of the Simulated Series Resampled at the Data Sampling Times
Before making more effort at parameter fitting and comparing the model to data, it is important to take into account the small number of empirical data points and their irregular sampling (the corresponding geochronology itself turns out to be scaling; see Appendix 3, Fig. A3.1). Figure 9 shows the result for a simulation with the same parameters, but with a 1 Myr temporal resolution (right-hand side), resampled at the same times as the data (lefthand side).Because the model and data are only expected to have similar statistics, the detailed "bumps" and "wiggles" are unimportant, but one can nevertheless make out realistic-looking variability, including correlations between the series.Note that the model respects causality, so when there is a large extinction event, the curve is asymmetric with a rapid upturn being followed by a slower downturn (however, we have followed convention such that the present is at the left and the past at the right).
We can now consider the fluctuation scaling and correlation statistics on the resampled series and compare them with both the data and the results from the same simulations but at a regular   , eq. 32) and then TE, TO (predicted to be equal to equal to TE + at long lags, eq.39) and DE, DO (predicted to be equal to DE + , at long lags, eq.39).Note that these are from a single realization of the process, not the ensemble average.In addition, the statistics of some are fairly sensitive to irregularly sampled (and small size) of the empirical data; compare with Fig. 11. Figure 10 also gives important information about the effect of the sampling: compare the resampled (red) and uniformly sampled analyses (brown).The resampling is particularly important for E + , E − , E, O, although the effects are mostly at small lags for E + , E, O but at large lags for E − .This information should prove useful in interpreting a variety of real-world extinction and origination data.
Finally, we can compare the 15 pairwise correlations (Fig. 11).Again, to judge the realism of the model, compare the red and black correlations.Although-as expected-these are fairly noisy, we see that the agreement is quite good, significantly, it is generally much better than the agreement between the uniformly sampled correlations (brown curves) and data (black).By comparing the red (resampled) and brown (uniformly sampled) correlations, we see that the resampling is especially important for the DE + , DO, DE, E + E − , E − O, OE, correlations and to a lesser extent, the OE, TE + comparisons; for the others, it is about the same.We could note the successful prediction that the E + E, E + O, OE correlations should be ≈1 and the E − E correlations should be ≈ −1.Interestingly, the prediction that the E − O correlations should be ≈ −1 (eq.39) is verified with the uniform sampling (i.e., it is indeed a property of the model), yet the resampling (red in the lower left graph in Fig. 11) makes it > 0 and aligns it closely with the observations.In other words, when the pure model predictions are poor (brown vs. black), there are many instances where the effects of nonuniform sampling are particularly strong, such that overall the model explains the data fairly well: overall 6 fluctuation plots (Fig. 10) and 15 correlations (Fig. 11) with 5 adjustable parameters (α, h, τ, ρ E , ρ T ).

Discussion of the Model and Physical Significance of the Correlations
The model was motivated by an attempt to model the diversity process as a scaling crossover phenomenon with wandering climate (paleotemperature) and stabilizing life (turnover) scaling drivers.In the course of the model development, it became clear both theoretically (due to the definition of the diversity, eq. 5) and empirically that rather than E, O being fundamental, it was rather the turnover E + that is fundamental (indeed, the E + and E − statistics are quite different (Figs. 4,10), and the E + E − correlations are nearly zero (Figs. 5,11).In any event, the model predicted that E, O would follow the E + statistics (eq.39; Figs. 4, 10, and the E + E and E + O correlations in Figs. 4,11).
A more counterintuitive finding concerns the correlations.To start with, the model specifies that the diversity is primarily driven by the temperature up until the crossover scale, yet the temperature and diversity are negatively correlated over the entire range!Although at any given time lag, the DT correlation is small (−0.1), it means that there is a (weak) tendency for the diversity fluctuations to decrease when temperature fluctuations increase and vice versa, but this is not enough to offset the overall temperature control of the diversity that implies that consecutive temperature fluctuations tend to add up (H T = 0.25 > 0), and this is a stronger overall effect.
There is an additional more subtle effect.Consider that at each scale, the imposed TE + correlation is moderate and positive (ρ TE+ = 0.5), and together, ρ TD (the temperature diversity correlation) and ρ TE+ (the temperature turnover correlation with r < 0, eq.32) imply that at each lag, DE + is negatively correlated (reaching the theory value ρ DE+ ≈ −0.9, at long lags; see the DE + correlation, the brown curve in Fig. 11).As the turnover E + also drives the diversity (eq.1), at each scale, we thus have a tendency for T and E + fluctuations to increase (or decrease) together but for D and E + (and hence T and D) to have opposite tendencies.The overall result is that the weak anticorrelation of D with T and D with E + at any fixed scale is still dominated by the stronger effect of T fluctuations growing with scale and dominating the E + driver at lags < τ.
We could remark that ρ TE+ = 0.5 > 0 indicates a tendency for temperature changes to "stimulate" the turnover: periods of increasing temperatures tending to be associated with increasing turnovers, and periods of decreasing temperatures with decreasing turnovers.Also there is a strong anticorrelation between D and E + (ρ DE+ ≈ −0.9) that indicates that turnover decreases with diversity (note that this anticorrelation seems to nearly disappear after the nonuniform sampling; see Fig. 11, second in the top row).However over the range of scales that E + dominates dynamics of D (i.e., Δt > τ, as H E+ ≈ −0.25 < 0), successive E + fluctuations tend to cancel, and on long timescales, the latter effect is dominant, such that H D = H E+ ≈ −0.25-this is a scaling region of biotic self-regulation.

Discussion and Conclusions
The driver of macroevolutionary biodiversity has famously been reduced to a dichotomy between life and the environment: the metaphor of Red Queen versus Court Jester (Van Valen 1973;Barnosky 2001).Using genus-level time series from the PBDB, Spiridonov and Lovejoy (2022; SL) systematically analyzed fluctuations in extinction (E) and origination (O) rates, biodiversity (D), and paleotemperatures (T ) over the Phanerozoic.They did this as a function of timescale from the shortest (≈3 Myr) to longest lags available (≈400 Myr), and their analysis included the correlations of the fluctuations at each scale.They concluded that T, E, O-the basic climate and life parameters-showed evidence of wide range scaling, supporting the hypothesis that over this range, there is a single biogeological "megaclimate" (Lovejoy 2015) regime with no fundamental timescale.However, they found that D followed the T fluctuations up until a critical time τ ≈ 40 Myr, whereas at longer timescales, it followed life (E, O): D was a scaling crossover phenomenon.At the shorter timescales Δt < τ, following the temperature, the D scaling exponent H D ≈ +0.25 (i.e., > 0), indicating Fractional MacroEvolution Model that fluctuations tended to grow with scale, leading to "wandering" behavior.In contrast, for time lags Δt > τ, following E, O, its scaling exponent was H D ≈ −0.25 (i.e., < 0), hence successive fluctuations tended to cancel, resulting in long-time stabilization of diversity by life.
The model assumes, first, that in the Phanerozoic, dynamical processes occur over a wide range of timescales, and second, that the basic driving processes (here T, E + ) are also scaling, such that over a wide range, they define no characteristic timescale.This is the simplest assumption, yet it is compatible with rich behavior.The critical scale for the diversity (estimated at ≈40 Myr) is simply the scale at which one process (life/turnover) starts to dominate the other (climate/temperature).
The Phanerozoic evolution of life on our planet is full of contrasting patterns, quantitative and qualitative dynamical transitions, yet it may be compatible with an underlying scale symmetry.For example, transitions related to era boundaries include changes in complexity of marine animal communities in post-Paleozoic times (Wagner et al. 2006), the increase in longevities of animal genera in the same post-Paleozoic period (Miller and Foote 2003), or changes in the structure of bivalve communities after the K-Pg mass extinction (Aberhan and Kiessling 2015), yet each of these processes occur over wide ranges of scale.
In addition, it could be argued that the dynamics of biodiversity should be subdivided into these smaller (substage) time intervals, potentially extending the scaling range to shorter timescales (≈1 Myr or less).Important boundaries that modified macroevolutionary dynamics at global scales can also be found at period and subperiod boundaries: apparently the appearance of calcifying plankton at the beginning of the Jurassic (or so called "post-Conodontozoic"; Ferretti et al. 2020) should have had a significant effect in stabilizing hyperthermal events and thus decreasing the volatility of biospheric evolution (Wignall 2015;Eichenseer et al. 2019).In addition, presumably the appearance, global spread, and deep impact on the climate of terrestrial plants in the Silurian is highly significant (Lenton et al. 2016).There are also shorter timescale transitions-such as transitions toward more volatile dynamics of zooplankton between the Early and Middle Ordovician toward the Late Ordovician (Crampton et al. 2016).
Such changes are thus numerous and add to the plausibility that such diverse and sometimes sharply varying processes may be scaling.Therefore, the subdivision of the Phanerozoic into shorter time intervals in search of local scaling laws-in our view at least at this stage of the understanding and data availability-is rather premature and arbitrary and could preclude the search of overarching commonalities that can hide in this apparent heterogeneity.Moreover, scaling multifractal processes are more general than the Gaussian ones considered here; they are intermittent, typically involving occasional large changes in intensity (e.g., apparent transitions), occurrences of extremes.Physically, this could correspond to local dominance of controlling mechanisms (which can be hierarchically modulated by feedbacks from biota or external forcing) at all timescales where scaling regimes apply.
In this study, as well as in the study which first described scaling properties of Phanerozoic marine genus-level evolution, SL analyzed the processes and time series as a homogenous stationary system.Although this is an assumption that the statistical properties are constant over the eon, it is nevertheless consistent with high variability and sudden transitions (i.e., volatile and intermittent behavior).Although the exact FMEM model presented here was not intermittent, intermittency and its associated sharp transitions could easily be introduced simply by replacing the (Gaussian) white noise forcings by multifractal ones.
To clarify our ideas, to better understand the geobiodynamics, and to better understand and quantify the limitations, biases, and other data issues, we proposed the simple FMEM to reproduce the observations.It is a model of macroevolutionary biodiversity driven by paleotemperature (the climate proxy) and the turnover rate (E + = O + E), the life proxy.To fit with basic empirical scaling statistics and theoretical ideas about the macroclimate regime (form timescales of roughly 1 Myr to at least 400 Myr), these drivers were taken to be scaling with climate dominating at short timescales and life at long timescales.Therefore, FMEM suggests a possible way to combine into a single stochastic framework both: (1) the destabilizing geophysical (and possibly astrophysical) processes (Raup 1991(Raup , 1992a,b;,b;Melott and Bambach 2014;Fields et al. 2020) with (2) the stabilizing, densitydependent, and self-regulating biotic processes.The model is specified by a simple parametrization based on two scaling exponents and two pairwise correlations (between T and E + and between T and D).
Because the FMEM model is linear, it can be used to model a superposition of stochastic and deterministic processes such as bolide impacts: the responses to the deterministic (external) and stochastic (internal) forcings simply add.As the impulse response function is a (singular) power law, a short impulse on the system will generate a sharp initial response followed by a long power law decay (i.e., much slower than the classical exponentials), decaying with somewhat different exponents depending on whether the impulse is on the temperature or turnover or both (see Fig. 2).
The model has two unusual characteristics: first, it is stochastic, such that the crossover from climate to life dominance is thus a scaling (power law) not standard exponential (i.e., Markov process-type) transition.Stochastic models involve infinite dimensional probability spaces, they are therefore natural model types in systems with huge numbers of degrees of freedom.We believe that they are intrinsically more realistic than strongly nonlinear but deterministic chaos-type models (including those that are deterministic but are perturbed by noises).When the intermittency is strong, scaling stochastic models must be nonlinear (e.g., multifractal cascade processes), and this can easily be included in further model improvements-the Gaussian forcing (γ 1 , γ 2 , eq. 14) need only be replaced by a multifractal one.Here, intermittency is neglected, and linear stochastic equations with Gaussian white noise forcings are used (linear stochastic models can often be used even when the underlying dynamics are strongly nonlinear).
The other unusual FMEM characteristic is that it is a system of fractional differential equations.Unlike the familiar integerordered differential equations that typically have exponential impulse response functions (Green's functions), fractional equations typically have power law response functions and are natural ways to model scaling processes.These impulse response functions are physical models of bolide impacts and similar nearly instantaneous processes, and we discussed some implications.
The model is also highly parsimonious with two scaling exponents and a crossover time τ determined by the PBDB data as analyzed SL.These determined the basic scaling characteristics of the six series: T, E + , D, E − (= O − E), O, E. In addition, the model had two correlations that were specified: those between T and E + and between T and D. From these, the other 13 pairwise correlations (out the 15 possible pairs of the six series) were implicitly determined and were compared with the data.
The fractional derivatives were of the Weyl type such that their Fourier transforms are simply power laws.Because the system is ultimately forced by two Gaussian white noises, only the secondorder statistics (i.e., the spectra and correlation functions) are needed, and these are easily obtained: the basic solutions are ffRns that were recently introduced (Lovejoy 2022a).In future, more realistic intermittent (multifractal) forcings could be used instead of the Gaussian white noise.Beyond exhibiting the full solution to the equations with a full statistical characterization, we then implemented the model numerically, first verifying the model against the theoretically predicted behavior.By producing simulations at 1 Myr resolution, we are able to resample the output at the same irregular sampling times as the PBDB.The statistical characteristics of the results (the six scaling curves showing the fluctuations as functions of timescale) plus the 15 pairwise correlations as functions of timescale are all quite close to the data, and in several cases, the agreement could be clearly attributed to the limitations, biases, and so on of the data.In particular, this was the case of the DE + , DO, DE, E + E − , E − O, OE correlations that are much closer to the data following the irregular sampling than with the original model outputs uniformly sampled at 1 Myr resolution.
Given the model's simplicity, it thus is remarkably realistic.This is fortunate, as until higher-resolution (global-scale) time series become available (e.g., Fan et al. 2020), more complex models may not be warranted.In any case, the model is able to help explain some subtle points about the interaction of different correlated series that are also strongly self-correlated over wide ranges of timescales, and this with quantitatively and qualitatively different scaling behaviors ("wandering" vs. "canceling"/self-stabilizing).

Figure 1 .
Figure1.A schematic showing the way the various parts of the Fractional MacroEvolution Model (FMEM) fit together.The basic drivers are shown at the top; physical drivers are the temperature (T ) and turnover rate (E + ).These are shown at the right, because they have nontrivial properties, such that they are best modeled as the responses to more elementary causes-the temperature and turnover rate forcings ( f T , f E+ ).In the paper, we primarily discuss the simple case that reproduces the Paleobiology Database (PBDB) statistics, where these are Gaussian white noises ( f T = γ T , f E+ = γ E+ ), However, deterministic forcings such as bolide impacts are also discussed, shown here with both T and E + forced with a Dirac function of amplitude f 0,T , f 0,E+ , respectively.More general forcings can be used and their responses can be obtained using the impulse response functions.The middle line shows how the T, E + drivers determine the diversity (D).Finally, to complete (close) the model, we need a diagnostic equation that enables us to determine E − , E, O; this is shown in the bottom line.

Figure 2 .
Figure 2. The impulse (delta function) response G a,0 (t) = t a−1 /G(a) for fractional integrals of order α normalized for the same response after 1 Myr.The bottom corresponds to the turnover (E + ) response α = ¼, and the top corresponds to the temperature (T ) response with α = ¾.Notice the long-term effects.Due to causality, the impulse response is 0 for t < 0.

Figure 4 .
Figure 4.This shows the Phanerozoic marine animal macroevolutionary analysis of the six series discussed in this paper; D, T, O, E are replotted from SL.The dashed lines show the theory slopes (eq.44) with transition at Δt ≈ 40 Myr, i.e., log 10 Δt ≈ 1.6.

Figure 5 .
Figure 5.The (normalized) pairwise correlations of the 15 pairs of the six series as functions of lag.Several of these are reproduced from SL.

Figure 6 .
Figure 6.The previous 2 14 simulation degraded from ¼ Myr resolution to 1 Myr.Curves normalized by their standard deviations and then offset by 5 units in the vertical for clarity.

Figure 7 .
Figure 7. Simulation 2 14 =16,384 points with theoretical slopes indicated.The transition scale τ is 2 7 = 128 units, indicated by dashed vertical lines.The could represent a series modeled at 250 kyr resolution with a total simulation length of 4 Gyr and with crossover at 32 Myr.Due to its length, this simulation has statistics that are close to the ensemble averaged statistics.The parameters are: α = 0.25, h = 0.5, ρ E = ρ TE = 0.5, ρ D = ρ TD = −0.1 (with derived DE correlation ρ DE = −0.9).

Figure 8 .
Figure8.The 15 pairwise correlations from the 2 14 realization in Fig.7.Only two of the correlations were prescribed, and this only at a single resolution; the rest are consequences of the model, the two exponents α, h, and the crossover time τ = 2 7 (shown as short dashed vertical lines).The two prescribed correlations (DT, TE + ) are shown as solid horizontal lines, and the derived correlations are shown as dashed lines (DE + from DT, TE + , eq. 32) and then TE, TO (predicted to be equal to equal to TE + at long lags, eq.39) and DE, DO (predicted to be equal to DE + , at long lags, eq.39).Note that these are from a single realization of the process, not the ensemble average.In addition, the statistics of some are fairly sensitive to irregularly sampled (and small size) of the empirical data; compare with Fig.11.
Fractional MacroEvolution Model the top D, T comparisons are such that the fluctuations vary only over a small range (for the data, factors of ≈1.7 for D and ≈2 for T ) for lags varying over range of about a factor 100.In comparison, the E + , E − , E, O ranges are closer to factors of 10.Aside from this, these basic fluctuation statistics are fairly close to the data.

Figure 9 .
Figure 9. Model-simulation comparison with all series normalized by their standard deviations.The simulation was at 1 Myr resolution, and it was sampled at the same (irregular) times as the data (84 points over the last 500 Myr).Each curve was displaced by 5 units in the vertical for clarity.Due to causality, the series are asymmetric, with time running from right to left.The simulation is on the right.

Figure 10 .
Figure10.A comparison of the RMS Haar fluctuations for the 1 Myr resolution simulations discussed earlier (brown), from the simulation resampled at the data times (red), and from the data (black), these two irregularly sampled series are shown in Fig.9.The relative vertical offsets of the curves are not significant; they correspond to specific normalizations/nondimensionalizations.We see that in general, the resampling at the data times (red) yields a closer fit to the data (black) than the analysis of the simulation at uniform (1 Myr) intervals; this is especially true for E − , O, E, E + .FMEM, Fractional MacroEvolution Model.

Figure 11 .
Figure 11.The pairwise correlations from the same three series as in fig. 10 with the same color codings: i.e., data (black), brown the simulation at a uniform 1 Myr resolution, and (red), the simulation resampled at the data times.The resampling notably improves the correlations for DE + , DO, DE, E + E − , E − O, OE, and to a lesser extent the OE, TE + comparisons; for the others, it is about the same.FMEM, Fractional MacroEvolution Model.The solid and dashed horizontal lines are the same as those in figure 8.