Preasymptotic Taylor dispersion: evolution from the initial condition

Although the process of hydrodynamic dispersion has been studied for many years, the description of solute spreading at early times has proved to be challenging. In particular, for some kinds of initial conditions, the solute evolution may exhibit a second moment that decreases (rather than increases, as is typically observed) in time. Most classical approaches would predict a negative effective hydrodynamic dispersion coefficient for such a situation. This creates some difficulties: not only does a negative dispersion coefficient lead to a violation of the second law of thermodynamics, but it also creates a mathematically ill-posed problem. We outline a set of four desirable qualities in a well-structured theory of unsteady dispersion as follows: (i) positivity of the dispersion coefficient, (ii) non-dependence upon initial conditions, (iii) superposability of solutions and (iv) convergence of solutions to classical asymptotic results. We use averaging to develop an upscaled result that adheres to these qualities. We find that the upscaled equation contains a source term that accounts for the relaxation of the initial configuration. This term decreases exponentially fast in time, leading to correct asymptotic behaviour while also accounting for the early-time solute dynamics. Analytical solutions are presented for both the effective dispersion coefficient and the source term, and we compare our upscaled results with averaged solutions obtained from numerical simulations; both averaged concentrations and spatial moments are compared. Error estimates are quantified, and we find good correspondence between the upscaled theory and the numerical results for all times.

Although the process of hydrodynamic dispersion has been studied for many years, the description of solute spreading at early times has proved to be challenging. In particular, for some kinds of initial conditions, the solute evolution may exhibit a second moment that decreases (rather than increases, as is typically observed) in time. Most classical approaches would predict a negative effective hydrodynamic dispersion coefficient for such a situation. This creates some difficulties: not only does a negative dispersion coefficient lead to a violation of the second law of thermodynamics, but it also creates a mathematically ill-posed problem. We outline a set of four desirable qualities in a well-structured theory of unsteady dispersion as follows: (i) positivity of the dispersion coefficient, (ii) non-dependence upon initial conditions, (iii) superposability of solutions and (iv) convergence of solutions to classical asymptotic results. We use averaging to develop an upscaled result that adheres to these qualities. We find that the upscaled equation contains a source term that accounts for the relaxation of the initial configuration. This term decreases exponentially fast in time, leading to correct asymptotic behaviour while also accounting for the early-time solute dynamics. Analytical solutions are presented for both the effective dispersion coefficient and the source term, and we compare our upscaled results with averaged solutions obtained from numerical simulations; both averaged concentrations and spatial moments are compared. Error estimates are quantified, and we find good correspondence between the upscaled theory and the numerical results for all times.

Introduction
Taylor dispersion, the process of solute spreading in a capillary tube, is the archetype for hydrodynamic dispersive processes. The primary reason for this is that it has all of the essential physical features important to hydrodynamic dispersion, and yet it has a geometry that is simple enough that an analytical approach is feasible. 889 A5-2 E. Taghizadeh, F. J. Valdés-Parada and B. D. Wood As such, there have been literally thousands of papers on the topic since the early 1950s.
The particular focus of this paper is to address the case of non-uniform initial distributions of a solute in Taylor dispersion under steady flow conditions. Nonuniform distributions of solute have a number of applications. A few concrete examples are (i) the design of exchangers for transient heat transfer (Heris, Esfahany & Etemad 2007); (ii) the diffusion of solute from an initially spherical input to model drug delivery in blood flow (Gekle 2017); (iii) pulsed radially non-uniform inputs to tubular reactors used to promote mixing (Stonestreet & Harvey 2002;Abbott et al. 2013); (iv) the use of phosphorescently tagged particles for fluid velocimetry (Mohand et al. 2017); and (v) non-uniform injections of chemical solutes in electrophoretic separations (Liu & Ivory 2013).
The primary issue in unsteady solute dispersion is the transition of the solute distribution from an initial disequilibrium state, to a state that represents the quasi-steady Gaussian distribution conditions that are necessary for the Taylor dispersion regime to be valid. Rather than setting up the problem for a variety of special cases, we develop a general method that applies to a wide variety of initial conditions. Our particular interest is to develop a model for unsteady solute dispersion where the contribution from the initial configuration can be clearly identified. This contribution can be expected to relax over time in order to recover the classical Taylor dispersion model.
There are a few useful concepts that can be proposed to help ensure that a theory for dispersion is well structured (although it is possible to develop theories that do not adhere to these guidelines). These are as follows: (i) The effective dispersion coefficient should be positive. Although negative dispersion coefficients have been proposed, they suffer from a few significant problems. These include the ill posedness of the resulting balance equation (i.e. the inverse heat equation (Weber 1981)) and the fundamental incompatibility with macroscale thermodynamics (Gray & Miller 2009;Gray & Miller 2014, chap. 10). Negative dispersion coefficients lead to an equation that is no longer even of the diffusion type because it no longer obeys a strong maximum principle (Olver 2014). (ii) The effective dispersion coefficient should be independent of the particular initial conditions imposed. In other words, the effective dispersion coefficient should be a function of only the molecular diffusion coefficient, and the structure of the fluid velocity field. This allows the dispersion coefficient to be defined strictly on a molecular and hydrodynamic basis, without having to resort to the incorporation of particular initial configurations of otherwise passive solutes. If the dispersion coefficient were to be dependent upon particular initial configurations, one would then lose the principle of linear superposition for an (otherwise) entirely linear problem. This has substantial ramifications for the interpretation of initial-configuration-dependent formulations. (iii) Solutions to the effective convection-dispersion equation should be superposable in the sense defined by Taylor (1959). This criterion is based on a physical desideratum outlined by Taylor (1959). Essentially, the argument was that if a new source was injected after some time had elapsed (the 'two release' problem), the effective dispersion coefficient should be single valued at points where the two sources overlap. The time-dependent dispersion theories would predict two different values for the dispersion coefficient at such overlapping points, a situation to which Taylor (1959) objected.
(iv) Whatever the theory for preasymptotic dispersion is, the dispersion coefficient should approach over time the classical asymptotic values that have been developed in the literature. These classical asymptotic values have been scrutinized rigorously for decades; there is little uncertainty in those results.
Our goal in this work is to develop a theory that addresses each of these issues; we seek to make the theory as concrete as possible by expressing results in analytical form when feasible. We neglect reaction, although the methods proposed are extendable to that case (cf. Wang et al. 2015). Ultimately, our approach leads to an upscaled equation for a single chemical species that contains a non-conventional source term as follows: (1.1) Here, c is the concentration of the solute, the angled brackets indicate cross-section average, D * is the effective dispersion coefficient, U is the cross-sectional-averaged longitudinal velocity and s * is a non-conventional source term that is exponentially decaying in time; z and t are the independent variables representing space and time, respectively. While this form for a macroscale balance equation may seem unusual, the source term s * is an essential component that arises directly from the upscaling analysis. This source term creates an overall balance that meets the set of criteria defined above for a well-structured dispersion process. The remainder of the manuscript is outlined as follows. In the next section, we give an overview of the literature on Taylor dispersion, with a specific focus on methods that have been developed to handle early time evolution from the initial conditions. This is followed by a presentation of the microscale physics in § 3. In § 4, we define the upscaling process. In § 5, the problem of closure is presented and integral closure solutions are described. The closed problem is described in § 6; in this section we also define the effective dispersion coefficient, D * , and source term, s * , and provide explicit analytical solutions for these quantities. In § 7, we compute solutions to the dispersion problem for three different initial configurations, and compare the results derived from the upscaled model with those derived from numerical simulations (NS) computed at the microscale. Both averaged concentrations and spatial moments are compared. In § 8 we provide some discussion of the results of the theory applied to several interesting initial conditions. In § 9, we specifically address the application of the upscaled balance equation to conditions where the problem superposition is of interest. Finally, in § 10 a list of conclusions is presented. Supplementary materials available at https://doi.org/10.1017/jfm.2020.56 have been generated to support this work; these materials include specific details regarding error analysis, correspondences between the theory developed here and infinite-order expansion methods and a list of notation.

Background
In this section, we review a subset of the work conducted on the Taylor dispersion problem. An exhaustive literature review on the subject is challenging because of the volume of literature represented by this topic; therefore, our review is focused specifically on studies that seek to provide solutions for early-time behaviour. In the material that follows, we define the Péclet number by where D is the molecular diffusion coefficient, and a is the radius of the tube. Most studies of Taylor dispersion have adopted this definition for the Péclet number; it can be thought of as the ratio of the convective to the (radial) diffusive fluxes. This is slightly different from the work of Taylor (1953), who used the ratio of the diffusive to convective time scales (this difference is discussed in subsequent sections).
2.1. Review of the literature For the purposes of the review below, early time is qualitatively defined as the transient time period during which the relaxation of the initial configuration of the solute is important to the dynamics of the Taylor dispersion process. It is possible to broadly characterize the different mathematical approaches for describing Taylor dispersion in the transient, early-time regime as outlined below.
(i) Kramers-Moyal-like expansions. By far the most popular approach has been the proposition that the macroscale solution can be expressed as an infinite series of spatial derivatives of the average concentration. Several variations of this approach have been attempted, and they are represented in the works of Gill et al. (Gill & Sankarasubramanian 1970, 1971, Chatwin (O'Hara 1969;Chatwin 1970Chatwin , 1972, Degance & Johns (1978a,b) and Mauri (1991). Balakotaiah, Chang & Smith (1995) used centre manifold theory to develop a macroscale formulation that is identical to that of Gill & Sankarasubramanian (1970). The resolution of the centre manifold model was improved by Watt & Roberts (1996) by constructing a multi-mode model for dispersion in channels. These authors considered a two-component model that follows the same trend of thought of the zonal models by Chikwendu & Ojiakor (1985), Chikwendu (1986a,b). The work by Yu (1976Yu ( , 1979 is interesting in that it began as a formal eigenfunction expansion for an unsimplified version of the convection-diffusion equation by assuming that expansions in Bessel functions were possible. After a formal integral solution was developed, the term arising from the initial conditions was discarded, and the remaining integrations of the Bessel functions were expanded in a power series. The result was a solution in an infinite series of derivatives of all orders, much like the Kramers-Moyal-type expansions discussed above. Although originally promoted as being more capable than the models of Gill & Sankarasubramanian (1970), a sequence of comments (Gill & Subramanian 1980;Yu 1980) established that the two methods are, at least to order 3, identical. The works by Westerterp, Dil'man & Kronberg (1995) and Kronberg, Benneker & Westerterp (1996) are of this type, but in that work the iterative process was truncated at order 2. One unique feature of those works is that mixed derivative terms were maintained, and this ultimately led to a macroscale form that had hyperbolic (rather than parabolic) features. We note that the work by Gill & Sankarasubramanian (1970) also derived hyperbolic terms in their approach, but ultimately eliminated those terms on the basis of their assumed macroscale form. More recently, Wu & Chen (2014) used an expansion involving spatial derivatives up to ninth order. Although their results compared well with previously reported numerical results, they did not recognize that their solutions included significant negative concentrations outside of a limited domain. This kind of error was corrected in a later paper (Wang & Chen 2016). Other works (Mercer & Roberts 1990, 1994Balakotaiah et al. 1995) also considered higher-order expansions and addressed the issue of negative concentrations.
(ii) Direct solutions. In the direct solutions, the focus has been to solve the convection-diffusion mass balance equation in the tube directly using an eigenfunction Preasymptotic Taylor dispersion 889 A5-5 expansion. This is a challenging task, because the conventional methods (e.g. separation of variables) used to determine closed-form expressions for the Green's function do not work with this problem, primarily because the convection term mixes both radial and longitudinal independent variables. The first direct solutions appear to be those of Philip (1963). For that work, however, the solution sought was initially of the form of a diffusion equation (similar to the more general approach of Gill & Sankarasubramanian (1970)), which automatically excluded certain forms of the initial condition (with roughly the same reasoning discussed by Degance & Johns (1980)). The paper proposed a solution as a series of the confluent hypergeometric functions 1 F 1 . The papers by Tseng & Besant (1970, 1972 assumed the same solution as did Philip (1963) (and were subject to the same restrictions), but managed to find simplifications that yielded solutions in terms of Bessel J 0 functions, the roots of the Bessel J 1 function and the error function. In the work by Stokes & Barton (1990), a Fourier transformation method was used, but the results had to be inverted numerically; as such, it is difficult to compare these results with other efforts.
(iii) Asymptotic (perturbation) solutions. A number of perturbation-type solutions have been proposed, and it is not possible to cleanly categorize solutions from other approaches as being free from perturbation-like arguments. However, more classical perturbation-type expansions have been investigated by Vrentas & Vrentas (1988, 2000. The paper by Lighthill (1966) is of this form, as was the extension proposed by Chatwin (1976Chatwin ( , 1977; both of these solutions required the time to be small compared to a characteristic diffusive time scale. Fife & Nicholes (1975) conducted a conventional perturbation analysis, but, importantly, recognized the potential influence of initial source terms, which is somewhat unique. Phillips and Kaye examined the asymptotic (but direct) solutions for Pe → ∞ for both large z (1996), and short times (1997); both solutions neglected longitudinal diffusion. Such solutions have generally been very successful when used in their range of validity (which generally involves either long or short times, and/or large or small Péclet numbers).
(iv) Moment methods. The early paper by Aris (1956) presented the first momentbased method for Taylor dispersion. Although the paper purported to eliminate the constraints proposed by Taylor (1954), in fact the method is technically only suitable for asymptotic estimates of the dispersion coefficient, although some transient results were presented for particular initial conditions. Moment methods focus on determining the spatial moments of the concentration field rather than its mean value (which, quite usefully, simplifies the analysis), often with the tacit assumption that the effective parameters of the averaged mass balance equation can be determined directly from the moments. Again, it is difficult to categorize approaches as uniquely moment based, since many approaches compute moments regardless of the underpinning mathematical methods used. A number of researchers have extended moment method into the preasymptotic regime, including the works of Horn (1971), Chatwin (1977), Degance & Johns (1978b), Latini & Bernoff (2001) and Dentz & Carrera (2007). The moment technique has become particularly popular in describing Taylor-like flows in stratified media (e.g. Fried & Combarnous 1971;Lake & Hirasaki 1981;Valocchi 1989), downstream contaminant release in rivers (Smith 1984), among many others. The method inherently assumes that a finite number of moments suitably defines the transport behaviour of the system, an assumption that is technically only true for specific initial conditions or in the near-asymptotic regime (this issue is discussed further in § 2.2).
(v) Non-local formulations. Few non-local formulations (containing terms with either time or space integrations in the transport equation) have been proposed for this problem. Although not specific to the Taylor problem, the work of Deng & Cushman (1995) is of this type, and somewhat set the standard for non-local equations of the convection-dispersion type (however, in a more general context, the work of Eringen (1978) was ground-breaking for non-local balance equations). The paper by Wood & Valdés-Parada (2013) also proposed non-local formulations from the volume averaging perspective, although they were also not specific to the Taylor dispersion problem. In the report of Smith (1981a), one of the first non-local formulations to the Taylor problem was outlined. The developments of that paper were also guided by a Kramers-Moyal-type expansion, and thus lead to hyperbolic expressions with mixed derivatives as well as more general convolution-integral expressions (depending on the order of the truncation, and where in the analysis the truncations are performed). The work by Jones & Young (1994) also used a variation of this approach, but in an asymptotic framework (also capitalizing on the centre manifold theory). In this theory, the results were able to capture some features of the early-time behaviour, but were not able to capture exponentially decaying-in-time modes; most likely, this was because the initial condition was not represented as a source term in the averaged equation.
(vi) Formulations that include an initial condition source. Most studies on Taylor dispersion have not been overly interested in the relaxation of the initial condition; often, the only initial conditions examined are either uniform pulses, or delta-like impulses (Vedel, Hovad & Bruus 2014). Few papers have been developed with the realization that the initial condition can manifest as a decaying source term in the averaged mass balance when more complex initial conditions are considered. Philip (1963) was aware that essentially forcing the upscaled mass balance to be of a convection-dispersion form would limit the choices available for the initial condition. The importance of the initial condition was also apparent to Lighthill (1966), although he was not able to complete a general analysis for his solution. In a series of papers, Gill and Sankarasubramanian derived macroscale models for unsteady solute dispersion considering uniform (Gill & Sankarasubramanian 1970) and non-uniform (Gill & Sankarasubramanian 1971;Sankarasubramanian & Gill 1973) slugs as initial conditions. Connections between these works and the developments presented here are derived in additional detail in the supplementary material. The work by Yu (1979) began by including the initial condition in the macroscale balance, but later this term was dropped resulting in a final theory that does not have a source arising from the initial configuration of the system. In the papers by Johns (1978b, 1980), the influence of the initial condition appeared as part of the theory. However, this work specifically considered a subset of possible initial conditions that work with the theory; in particular, separable product forms for the initial condition function were allowable, but sums of any two such functions were excluded (Degance & Johns 1978a, § 4). Smith (1981b) identified and described three stages of unsteady dispersion from a point source using a ray-tracing method. Haber & Mauri (1988) used a stochastic Markovian particle approach to examine the role of the initial condition; explicit asymptotic solutions were derived that illustrated that the observed dispersion coefficient depended upon the initial location of the particles. From a moment-based approach, the paper by Dentz & Carrera (2007) similarly considered delta-type impulses placed near the centre versus the wall initially; in that work, a distinction between local and global dispersion was defined to distinguish between the dispersion of a single delta impulse versus the dispersion observed for an extended source. Meng & Yang (2018) adopted a perspective similar to that of Dentz & Carrera (2007), but solved for the effective dispersion coefficient via eigenfunction expansions. They also found that the dispersion coefficient was a function of the initial configuration. In a sequence of two papers, Wood (2009, equations (17) and (29)) and Wood & Valdés-Parada (2013) developed theories that explicitly included the influence of the initial condition as a source term in the averaged mass balance equation; this process was then used by Ostvar & Wood (2016) to describe the average diffusion and reaction from an initial (unmixed) configuration. Independently, Balakotaiah & Ratnakar (2010) also developed balance expressions specifically for the Taylor dispersion problem in which source terms arising from the initial conditions were employed; a Kramers-Moyal-like expansion was used, but truncated at second order. This model had the ability to predict significant skewness in the distribution (Ratnakar & Balakotaiah 2011). While not unique to the models discussed, the prediction of skewness is an important check on the physics of the solution, since skewness should become non-zero at early times (even if the initial condition is symmetric), and it should asymptotically approach zero in the long-time regime.
In the remainder of the background section, we elaborate a little more on two specific issues that arise in the previous work summarized above: (i) The use of infinite-order (Kramers-Moyal-type) expansions, and (ii) the neglect of source terms arising from the initial condition.

Comments regarding Kramers-Moyal-type expansions
There are two significant problems with the infinite Kramers-Moyal-type expansions. First, because of the specific macroscale representation that is chosen, the resulting theory generates dispersion coefficients that may depend upon the initial condition (Degance & Johns 1980). Second, research on the positivity of such series was apparently not known to progenitors of this approach. It is now known that such series either (i) converge exactly at second order and are strictly positive, or (ii) require all terms in order to converge and remain strictly positive (Marcinkiewicz 1939;Pawula 1967). Although Kramers-Moyal expansions do (under appropriate conditions) converge, they do not converge in a highly physical way when strictly positive results are sought (as in the case of concentration). In other words, any finite truncation of a Kramers-Moyal expansion of order greater than two generates results that are negative in some parts of the domain. This problem has been discussed in the literature (e.g. Risken & Vollmer 1979), but it is not generally well recognized in many disciplines outside of physics. The fact that negative concentrations are essentially guaranteed to occur for truncated expansions does not negate the usefulness of such expansions (Risken & Vollmer 1979), but it does demand careful analysis and cautiousness when interpreting results. The problem of negative concentrations is most acute at early times (cf. Wood & Valdés-Parada (2013, § 7)). Some of the problems associated with the Kramers-Moyal expansion can be eliminated by the recognition that such expansions can often be represented by non-local convolution expressions (Mercer & Roberts 1990, 1994. It is easy to show that a general non-local transport formulation can be written as an infinite-order expansion of local derivatives (this is illustrated in the supplementary material), although the converse of this is not always true. However, given that the macroscale equation form adopted in most of these examples is posited as an ansatz (e.g. Gill & Sankarasubramanian (1970, equations (5) and (8))), then it might be reasonable to start directly with the convolution rather than the series form. For the specific problem of Taylor dispersion investigated in this paper, we are able to show that the solution is generally of a convolution form, giving further support that this kind of approach might have a stronger physical basis. Regardless, the non-local formulations generally avoid the problems associated with negative concentrations, and, in appropriate limits, reduce to more familiar local macroscale equations.

Influence of the initial conditions
As the review of the literature above suggests, the importance of the initial configuration at early times has been identified (although not resolved) previously. The problem can be most easily thought about in the context of layered media which, although not a tube, is still frequently represented as a Taylor dispersion process, (e.g. Fried & Combarnous 1971;Gelhar, Gutjahr & Naff 1979;Lake & Hirasaki 1981;Chikwendu & Ojiakor 1985;Chikwendu 1986a,b). Consider an initially uniform slug of solute in a system of horizontal layers of alternating high and low conductivity that is transported by a forward-flow/reverse-flow transport cycle. If a constant pressure gradient is applied for a short period of time such that the mean flow is right to left, the solute is displaced more in the high-velocity layers than in the low ones, leading to a spatially staggered final condition. Now, suppose the flow is stopped, and the pressure gradient reversed. The staggered configuration is the initial condition for a new transport process. Assuming that the total time interval is small enough so that transverse diffusion does not spread the solute very much between layers, the resulting motion is largely reversible. Upon reversing the flow field, a seemingly spread-out initial condition converges to become more organized (and less spread out). In the limit of zero diffusion, the final result of this first forward and then reversed transport would be exactly identical to the initial condition.
This kind of process creates some conceptual problems. Seemingly, the second moment of the process described above first increases and then, upon flow reversal, decreases in time. If one interprets the effective dispersion coefficient as being the classical one half of the time rate of change of the second spatial moment, then one predicts a negative dispersion coefficient for the second half of the cycle. This problem has been recognized in part as one that involves the distinction between spreading and mixing (e.g. Dentz & Carrera 2007). However, one is still left with a macroscale equation that contains an apparent dispersion coefficient that can be negative under some conditions. The problem is of the inverse heat equation type, which is known to be ill posed (Weber 1981). The solutions to such problems are lossy in general and can lead to significant problems in interpretation. Of more importance, however, is that such expressions are inconsistent with macroscale thermodynamics (Gray & Miller 2009;Gray & Miller 2014, chap. 10;Miller et al. 2018). Thus, if one hopes to solve problems in a manner consistent with the thermodynamics appropriate to the upscaled system, a negative dispersion coefficient is not a suitable choice. The use of higher-order diffusive-type equations (such as the method of Gill & Sankarasubramanian (1970)) does not entirely remediate this problem. Although there are more degrees of freedom in these kinds of expansions to accommodate the influence of initial conditions, the fundamental problem is that the initial conditions enter the problem as source terms that are independent of the spatial derivatives of the average concentration. Thus, some initial configurations, when forced to fit a transport equation that contains no term representing a source, may predict non-physical behaviour for the dispersion coefficient, depending upon the underlying theory. Each vector (e.g. r, z and y) can be expressed in a cylindrical coordinate system for analytical or numerical computations. Note the distinction between the coordinates r and z versus the vectors r and z. In this figure and the figures that follow, the aspect ratio of the radial, r, to longitudinal, z, coordinates has been increased to improve the presentation of results.

Problem formulation
One of the primary purposes of this work is to examine the early-time dispersion process in the context of an averaging theory, but maintaining the influence of the initial configuration throughout the averaging process. Our proposed model seeks to remedy the problems identified above by assuring that both the concentration and the dispersion coefficient remain positive over the entire (time and space) domain of the problem.
We consider passive transport of a chemical species in a cylindrical domain specified by r ∈ V, where r ≡ (r, θ, z) (figure 1). For the cases considered in this analysis, we assume that the solute does not interact significantly with the external boundaries perpendicular to the axis of the tube. Thus, for concreteness, we set the external planes perpendicular to the tube axis at locations z → ±∞, respectively. The initial solute distribution is assumed to be a known function of position denoted by ϕ(r, θ , z). The transport equations at the microscale for a solute in terms of molar concentrations can be written in cylindrical coordinates as

Microscale mass balance
I.C. c(r, θ, z, 0) = ϕ(r, θ , z). (3.1e) Here, and for the equations that follow, 'B.C.' indicates a boundary condition, and 'I.C.' an initial condition. For the formulation given by (3.1a)-(3.1e), it has been assumed that the mole fraction of the solute is small enough compared to unity so that Fick's law is applicable. In addition, the following reasonable assumptions have been made in order to simplify the volume averaging process: (i) the fluid flow is incompressible, Newtonian, and steady; and (ii) the fluid pressure at the external boundaries is uniform leading to a θ-symmetric velocity field given by the well-known expression depending only on the distance, r, from the centreline of the tube v z (r) = 2U 1 − r 2 a 2 . (3.2) With this information, the problem is fully specified.

Upscaling process
4.1. Preliminaries In this paper, we upscale (or coarse grain) the system concentration field dynamics by simple spatial averaging against a weighting function. The underlying assumption in this approach is that the system can be sensibly represented by governing equations at more than one scale of resolution due to the multiscale (in space, time or both) characteristics of the system. Practically, this means that there exist averaging operators that sufficiently smooth spatial fluctuations in concentration such that an averaged behaviour is useful for understanding the system evolution. In this work, the averaging process is a somewhat modified form of volume averaging theory (which is frequently applied to porous materials, (Whitaker 1999)), but we do not stress this point further.
In the remainder of this paper, we adopt cylindrical coordinates (figure 1); thus, for any function ψ, we have ψ(r) = ψ(r, θ, z). The symbol z (set in bold font) is used exclusively to indicate the centroid of the averaging domain; therefore, this vector always has the component form z = (0, 0, z). The averaging operator · for Taylor dispersion is often defined with respect to weighting function w by (Degance & Johns 1978a) ψ | (z,t) = r∈V(z) ψ(r, t)w(r − z) dV(r). (4.1) Or, in the cylindrical coordinate system For this work, we adopt the classical area average, so that w(r − z) = w(r, θ , ζ − z) = 1/(πa 2 )δ(z − ζ ), which yields the conventional area average (note, other weighting functions have been adopted for Taylor dispersion, cf. Degance & Johns (1978a)) We refer to the filtered functions ψ as macroscale quantities to distinguish them from their microscale counterparts, ψ.
In addition to the definition of the averaging operator, it is also useful to define the following decompositions: (4.4b) Here, U = v z | z whereasc andṽ z denote spatial deviations from their corresponding averages. Note that these deviations are defined in a manner that is different from the conventional definition used in the volume averaging theory. The above decompositions are the same as those proposed by Gray (1975); that is, the deviations in each averaging domain are defined relative to the value of the average at the longitudinal centroid of the tube. There are some advantages to this definition; in particular, we have the following identities: We will occasionally use the term cross-sectional average (CSA) to indicate averaged properties.

Averaging the microscale balance equation
To obtain the macroscale mass balance equation, we apply the averaging operator defined above to the microscale balance. Note that, although we have been careful to explicitly list the independent variables in the definitions above, we do so in the remainder of the paper only when it is necessary for emphasis or clarity. Upon applying the averaging operator (4.3) to the microscale mass balance equations, the upscaled problem takes the form Several steps have been taken into account in order to derive (4.6a). First, interchange of the time derivative and the averaging operator is allowed within the accumulation term due to the fact that the averaging domain does not change with time. Thus we have ∂c ∂t = ∂ c ∂t . (4.7) Directing our attention to the diffusion term, the microscale boundary conditions lead to the identity For the convection term, we have (on the basis of (4.4a)-(4.5b)) Finally, for the two macroscopic boundary conditions given in (4.6b) and (4.6c), we have imposed the condition thatc(r, θ, z) → 0 and ∂c/∂z(r, θ , z) → 0 as |z| → ∞. Up to this point, we have employed no approximations to the result; the averaged model represented by (4.6a)-(4.6d) is exact, at least to the extent that the original microscale balance equations are valid.

Deviation equations
Equation (4.6a) is unclosed because it is not expressed exclusively in terms of the average concentration. To close the problem, a set of ancillary balances are developed for the microscale deviations ofc. Motivated by the definition of the decompositions, we develop the deviation balance by subtracting the averaged equation (4.6a) from the microscale equation (3.1a). In addition, the boundary conditions can be developed by using the decomposition given by (4.4a), and the initial condition is determined by subtracting the averaged initial condition from its microscale counterpart. The resulting equations can be written as follows: For the developments that follow, it is convenient to adopt a Lagrangian perspective based upon an inertial frame of reference that moves with the centre of mass with the system. To this end, let us define z = z − Ut, t = t, so that We omit the explicit writing of the prime coordinates in the developments that follow to keep the notation simple. Making the translation to Lagrangian coordinates, the governing differential equation for the concentration deviations becomes In (5.3), the term ṽ z (∂c/∂z) is non-local in the variable r. In this context, we use the term non-local to denote quantities that involve time or space locations in addition to the local ones. Solutions to this balance equation that maintain the non-local term are mathematically tractable, but such solutions remain an active area of research 5.1. Elimination of the non-local term Rather than maintaining the non-local term, our approach is to develop reasonable constraints that allow us to neglect its influence. We do this by establishing characteristic time and length scales, and then developing constraints that indicate when the desired simplification is valid. Although these time and length scales can be given formal, computable definitions (e.g. such as defining them by the integral scale of the fields involved (Wood & Valdés-Parada 2013)), formal evaluation of scales is usually not necessary.
For the Taylor dispersion problem, there are several distinct length scales that can be defined. We have identified two important length scales in figure 2. The size of the macroscale length, L 0 , is described as being approximately the size of the initial condition; however, the validity of this estimate depends upon the particular structure of the initial condition (this is discussed further in § 8.2). The characteristic length of the radial diffusion process is of the order of the tube radius, a, for this particular application.
The non-local term and the convection term on the left-hand side of (5.3) are of the same order of magnitude. Thus, our desired restriction at this juncture would be to imposeṽ For complex problems where one wishes to find all possible asymptotically simplified models, there are specific methods and algorithms that can be followed to construct the simplified models (see Yip (1996) for a fuller discussion). Because our restriction is rather simple (and because a full analysis is tedious), we only sketch out the result here. To start, we note that the left-hand side of (5.4) is always less than or equal to v z ∂c/∂z. Then, we non-dimensionalize and rearrange the problem as follows (where Z = z/L 0 , R = r/a,Ṽ z =ṽ z /U,C =c/C 0 and C 0 = max (φ)): Now, we set ε = a 2 U/(L 0 D). By assumption,Ṽ z ∂C/∂Z = O(1) and the first term on the right-hand side of (5.5) is of O(1). Examining it is clear that the left-hand side of this expression (and, hence, the left-hand side of (5.4)) can be neglected under the conditions The symbols ' ' and 'O' have the conventional meanings adopted in perturbation theory (cf. Bender & Orszag (1978, chapters 3.4 and 7)).
A few additional comments are helpful here. First, we note that this approximation may fail at early times if the concentration gradients in the initial condition are large (e.g. step functions); for these early-time solutions, the longitudinal derivative term would need to be maintained. In this work, we consider initial conditions that are reasonably smooth at t = 0 (i.e. the first two derivatives are continuous, and correspond to a value of L 0 that is not much smaller than a). Second, this restriction is almost certainly overly severe once the relaxation of the initial condition has progressed for even a relatively small time. Often an approximation such as the inequality (5.4) is valid, even when the two sides are of the same order of magnitude. We describe the results of the error involved in this approximation in additional detail in § 8.2, where we compute it directly to show that the approximation is reasonable for the Péclet numbers investigated in this paper. It is worth noting that the constraint given in (5.7) is identical (except for a factor of 4) to the one imposed by Taylor (1954) Here, Pe T is the Taylor-type Péclet number, which arises naturally when we non-dimensionalize the transport equation. The diffusion time scale arises from considerations of the conventional free-field diffusion relationship with variance σ 2 = 4Dt (and, hence, L ∼ √ 4Dt). The convective time scale comes from a simple estimate of the z−derivative of the initial condition. With the exception of Taylor (1954) and Aris (1956), the literature has tended to use the definition that is based on a simple ratio of the convective to diffusive fluxes, of the form (assuming both convective and diffusive fluxes are characterized by a length scale equal to the tube radius, a) Note that for the simulations reported below with the parameters in With the approximation given by the inequality (5.4) and the use of Lagrangian coordinates, the resulting set of equations forc becomes Notice that there are only two source terms in the initial and boundary-value problem. Thus, it is the interplay of the initial configuration and the local convective source term appearing in the deviation balance equation that essentially drives the Taylor dispersion process. The above simplified closure problem is a local, linear, non-homogeneous parabolic equation which has well-known solutions. With the conventional assumptions (e.g. boundedness of the initial condition, integrability of the initial condition, smoothness of the boundary conditions, etc.), the solution to the problem for solute transport can be put in an integral formulation as (cf. Polyanin & Nazaikinskii 2015) c(r, θ ,z, t) where G(r, ρ, θ , ϑ,z, ζ , t − τ ) is the Green's function for this problem, which solves the initial and boundary-value problem given in appendix A by (A 2a)-(A 2e).
6. Closed problem 6.1. Non-local solution Using the unsimplified integral solution defined above by (5.11), we can develop a non-local but closed macroscale problem that takes the form (6.1) where, for simplicity in notation, we have introduced Non-local solutions have been proposed to describe a number of interesting transport phenomena (cf. Eringen 1978;Smith 1981a;Cushman & Ginn 1993;Deng, Cushman & Delleur 1993;Neuman 1993;Deng & Cushman 1995). While they have some advantages, they also come with significant costs. Among these are that (i) analytical solutions are virtually non-existent, and (ii) because the solution at every point depends upon every other point in the domain, numerical schemes often have to deal with full rather than sparse matrices.
6.2. Localized solution At this juncture in the analysis, we consider the effects of imposing two additional approximations: (i) the length scale constraint a/L 0 1, and (ii) a time scale constraint t * /T * 1 (where t * is a characteristic microscale process time, and T * is a characteristic macroscale process time). In appendix A we show that, for the Taylor problem in the range of Péclet numbers that we examine, the single constraint is sufficient to ensure that both the length and time scale constraints are met (a detailed derivation can be found in Wood & Valdés-Parada (2013)). Note that this constraint is generally less restrictive (at least for Pe > 1) than the constraint already imposed by the inequality (5.7). Technically, it is not necessary to impose any constraints on the problem. However, if we do not impose the inequalities (5.7) and (6.3), the result is a fully time-and space-non-local theory whose Green's functions would be determined by an integro-differential equation with no known analytical solutions. Such a result would be unlikely to be simpler than directly solving the microscale balance given by (3.1a)-(3.1e). With the constraints given by (5.7) and (6.3) in place, the localized solution for the set of balance equations associated with the solute ((5.10a)-(5.10e)) is given by where b and Φ are known as closure variables. These functions are defined in terms of the Green's function by (6.5b) We note that both b and Φ are functionals (ofṽ z andφ, respectively); ordinarily, the explicit dependence on the function will be supressed in the notation unless it is needed for clarity. In § A.4 we show that Φ is linear in the initial condition deviation functionφ. We also note that Φ is an exponentially decaying function of time. At early times, the magnitude of this function can be significant, and can create (apparent) deviations from Fickian behaviour (especially in the second moment), even though the spreading process is a Fickian one. The solution form given by (6.4) is similar in form to the solution proposed by Paine, Carbonell & Whitaker (1983, equation (30)) for Taylor dispersion. Paine et al. (1983) correctly predicted that the solution for the deviation concentration should involve an independent function of time, they did not understand that this function is related to the initial condition for the problem (see appendix A). A fully local averaged balance equation can be developed, taking the form: Here, the effective parameters are defined by (6.7c) The effective parameters D and D * are the hydrodynamic dispersion and the total dispersion coefficients, respectively. This latter quantity is also called the Taylor dispersion coefficient, as proposed by Aris (1956). Note that both our non-local and local averaged equations are unusual in that they contain a source term, s * . We also note that the effective dispersion coefficient, D * depends only upon the closure variable b (and, is therefore independent of the initial condition, as shown in § A.3), whereas the non-conventional source term, s * depends only upon the closure variable Φ. It is interesting to compare these effective parameters with the results of Mercer & Roberts (1994, equation (3.2)); the correction terms derived by Mercer & Roberts (1994) show similarities with the effective parameters defined in (6.7a)-(6.7c). The role of the source term is to account for the early-time memory of the initial distribution of the spatial deviations of the concentration. Although it is a source term in the balance equation, it is easy to show that the integral of this term over the domain −∞ < z < ∞ is identically zero ( § A.3). This indicates that the role of this term is only to redistribute mass within the domain; no mass is created or destroyed by this term.
Finally, a few comments regarding the effective parameters s * and D * are in order. First, we note that it can be shown that the effective dispersion coefficient, D * is a strictly positive function for all time, regardless of the initial conditions ( § A.5). As mentioned previously, this is a necessary condition for the problem if one wants the solution to be consistent with macroscopic scale thermodynamics (Miller et al. 2018). This has important repercussions for previous works that have suggested the use of negative dispersion coefficients; this is examined in detail in the Discussion section. Second, we note that the source term, s * , like the function Φ, is an exponentially decreasing function of time ( § A.5). Therefore, Φ decays toward zero as time grows large enough, recovering the conventional Taylor-Aris theory at asymptotic times. Although this term does decay to zero in time, the effect of this source term on the early stages of the diffusion-convection transport phenomenon is significant in some cases. This is discussed in additional detail in the following sections.
In the remainder of this paper, we adopt a strictly local representation; ultimately, we are able to show that this is an appropriate approximation for our system and range of parameters. Readers interested in examining how the non-local formulation can be solved numerically are directed to the details outlined by Deng et al. (1993).

Analytical solutions for the closure variables
Equations (6.5a)-(6.5b) provide the general integral form of the solutions; however, the Green's functions for particular cases must be determined and subsequently integrated to obtain explicit (series-form) solutions for b and Φ. Because the b-field depends only uponṽ z , the solution for this problem is relatively straightforward. For Φ, the solution depends explicitly upon the particular initial condition selected.

Analytical solution for the b-field
In appendix A, the Green's function for the general problem is identified, and integrated. The result is Note that b is not a function of θ or z. The eigenvalues λ n can be computed by solving J 1 (λ n ) = 0, n = 1, 2, 3, . . . . (7.2) Equation (7.1) is equivalent to equation (16) in the work of Gill & Sankarasubramanian (1970). We note additionally here that in this derivation, it has been explicitly assumed that the initial time is represented by t = 0. The effective parameters, D * and s * associated with the upscaled equation are given by (6.7a) and (6.7c). Substituting the two solutions above for b and Φ as required into the general expressions for the effective parameters gives the following results ( § A.5): where we note that 1 48 We can also express this result in a form that is independent of Pe Here, we have defined the dimensionless diffusion time variable where a 2 /D is the diffusive time scale as defined in (B 32). The dispersion coefficient predicted from this expression is plotted in figure 3. Note that (7.5) is an evolving function of time, reaching more than 99 % of the expected asymptotic value of 1/48 for τ * d 0.3.

7.2.
Analytical solution for the Φ-field For the Φ field, the final results depend on the particular initial condition that is investigated. However, we can provide the following general integral solution (the solutions for particular initial conditions are discussed in the sections following): where, for convenience, we introduced the functions B 0,0 (z, t) and B n,m (θ ,z, t), which are defined by Here, G 2 is the Green's function in the axial direction and it is defined in appendix A (A 4b). The constants A n are given by A 1 = 1 and A n = 2, n > 1. The eigenvalues λ nm are the positive roots of the transcendental equation J n (λ nm ) = 0.
The results for the s * field depend upon the particular initial configuration that is specified; therefore, we cannot provide explicit expressions until the initial condition is identified. In general, the expression is given by (6.7a), which, using (7.7), can be put in the form Because s * is a source term, the characteristics of the source can have a large impact on the solution. Explicit analytical expressions for a few specific initial conditions are derived in the next section. We note that for radially symmetric initial conditions, these general solutions are simplified somewhat due to symmetry (see § A.4). As a final note, the functions D * and s * were developed explicitly for the case where the initial time is defined by t = 0. If the initial time is some positive value, T 1 , then the functions D * and s * should be specified as functions of the time since injection of the initial condition, t , where t = t − T 1 . The important point is that D * and s * depend on time since injection of the initial condition, not on absolute time.

Analytical solution for specific initial conditions
For this work, we consider three different initial conditions (illustrated in figure 4, where flow in the tube is from left to right), and provide the analytical solutions for s * for these three cases. Each initial condition is radially symmetric; this is not necessary for the general solution developed to this point, but to date we have found explicit series solutions only for these symmetry conditions (cf. the radially symmetric solution form for Φ given in § A.4). Each case consists of initial conditions in which two concentration distributions are separated spatially (specifically, the centre of mass is separated by L 0 = 20 cm, see table 1). In order to avoid the singular points that arise when differentiating discontinuous initial distributions, these were weighted in the z-direction by a non-compact smoothing function. For each initial condition considered, the configuration could be decomposed into multiplicative radial and longitudinal components as follows: This multiplicative decomposition is not a necessary part of the solution, but it does aid in the subsequent determination of analytical solutions to the closure problem. In the longitudinal direction, the initial condition function was the same for each of the three cases, and was specified by (7.11b) The coefficients α 1 , α 2 , β 1 , β 2 , σ 1 and σ 2 are given in table 1. The functions R 1 and R 2 vary for each of the three cases considered, and are detailed below. We note that a more accurate value for L 0 might be of the order of σ 1 and σ 2 rather than the spacing between the pulses. Note that with these estimates we have a/σ 1 = a/σ 2 = 1/3. While this quantity is less than one, it is in a region of the constraint a/L 0 1 that is somewhat unclear in terms of validity. However, we explicitly compute errors associated with these approximations in § 8.2; those results suggest that the approximation of separation of length scales is reasonably well met for the three problems investigated here.

Case A: radially uniform slugs
For this case, the initial condition is defined by two radially uniform slugs whose centre of mass is separated by L 0 . The functions R 1 and R 2 are given by i.e. there is no variation in the radial direction. For initial conditions with no radial variation, we have shown in appendix A that the solution for the source term is s * (z, t) = 0. This can also be seen through the closure problem for Φ specified above. When initial conditions contain radially constant concentrations, the deviation concentrations are identically zero initially. This means that there are no source terms in the closure for Φ, and the problem generates only the zero solution.

Case B: linear radial distributions
For this case, each part of the initial condition was specified by a linear function. On the left (centred at z = 12.5 cm) the concentration was maximum in the centre, and decreased linearly toward the wall. On the right (centred at z = 32.5 cm) the concentration was greatest at the wall, and decreased linearly toward the centre. The functions R 1 and R 2 are given by For this case, the source term has been derived in appendix B. The result is where H 1 is the Struve H function (cf. equation (11.1.7) in Abramowitz & Stegun 1965). The quantities Ξ 1 and Ξ 2 are the functions defining the motion of the centre of mass of the two portions (left and right) of the initial condition. They are specified by 7.3.3. Case C: step radial distributions For the third case, each part of the initial condition is specified by radial step functions that are complements of one another. The result is an initial condition that resembles a cylinder (with radius a/2) concentrated on the axis centred at z = 12.5 cm, and a hollow cylinder (with radial thickness equal to a/2) centred at z = 32.5 cm. The functions R 1 and R 2 are given by 1, a 2 < r a. (7.16b) These initial conditions were investigated separately by Degance & Johns (1978b). The solution for the source term is derived in appendix B, and the result is Examples of this function are plotted in figure 5. Similar to the developments for the radially linear case, the centre of mass for the left and right components of the initial condition are given by (7.18b) Following the material in appendix B as a guide, it should be relatively easy to develop analytical solutions for s * for other initial conditions of multiplicative form, assuming that the integration defined by (7.8b) can be computed. . Surfaces representing s * (z, t) for Case C, Pe = 10 (a) and Pe = 100 (b). Note that the vertical scales of the left and right figures differ by a factor of 10. As time increases, the source term is exponentially damped. The magnitude of the source term is larger for increasing Pe.

Results and discussion
The finite element software COMSOL Multiphysics5.3 was used to solve the microscale partial differential equations (PDEs); we refer to these as NS. Post-processing of data was done primarily in MATLAB R2017 . The physical parameters for the simulations are reported in table 2. The spatial dimension configuration was the two-dimensional axisymmetric case, and the 'transport of diluted species' was selected as the physics package. The software was used to generate the domain and boundary conditions necessary to solve (3.1a)-(3.1e). An adaptive mesh refinement was performed in order to increase the mesh resolution near the area under higher convection. Backward differentiation formulas (BDF) of order 3 or 4 were used with linear multistep methods in the PARDISO solver. Stability of the solutions was found to be sensitive to the order of the BDF. Both streamline and cross-wind diffusion were employed as two consistent stabilization methods. The  streamline diffusion method recovers the streamline upwind Petrov-Galerkin method and the Galerkin least-squares method. Periodic boundary conditions were applied at the external boundaries perpendicular to the tube axis, although the domain was sufficiently long that no significant mass approached the domain ends over the entire simulation period. A no-flux condition was imposed at the cylinder walls. Computing averages and moments was done by extracting the results on a grid using MATLAB . A convergence analysis based on Richardson extrapolation was performed following Roache (1994Roache ( , 2003 in order to quantify the stability of numerical results. The details of the convergence study are summarized in the supplementary material. The grid convergence index, which provides a relative error estimate of the calculated solutions, was found to be of the order of 10 −5 . The estimated convergence errors were below 2% for all simulations. 8.1. Microscale concentration fields from numerical solution Visualizations of the microscale concentration field at early times provide some direct evidence as to why studying Taylor dispersion at early times has been a challenging process. In figures 6 and 7, visualizations of the concentration field as it evolves from each of the three initial conditions are shown. To compare the results at different Péclet numbers, we have chosen the dimensionless time variable This facilitates the direct comparison of plots that would, otherwise, have dramatically different time scales. The downside is that the actual time scale is somewhat obscured; we provide absolute times parenthetically to help maintain a feeling for the actual time scales involved. Some caution is necessary when interpreting these plots because the aspect ratio has been greatly increased for visualization purposes (i.e. the radial direction is magnified by a factor 10). In addition, to help with the visualization of the concentration at longer times (where the concentrations would be otherwise visually undetectable) the concentration field has been normalized and logarithmically transformed by c = log 10 (c/c A0 + ε), where ε = 0.1, and c A0 = max[c(r, z, t)]. For reference, the area-averaged concentrations for each case are presented to the right of the microscale concentration plots. FIGURE 6. Dynamics of the microscale and macroscale concentration profiles for the three cases of initial configuration, Pe = 10 (all plots are presented in terms of the dimensionless time variable τ * ). Note that the aspect ratio, a/L, has been increased by a factor of 10 to aid visualization.
From figures 6 and 7, it is clear that the concentration evolution at early times is dominated by the particular initial configuration. Depending upon how the configuration is distributed across the velocity space, dramatically different kinds of behaviours can occur, and this is influenced by the relative importance of diffusion and convection as well as the local shear rate. It is useful to briefly discuss how this is related to the concept of mixing. In short, mixing is stretching-enhanced diffusion (Villermaux 2019). The process of mixing in frequently divided into two conceptual components: (i) convective mixing (Lacey 1954 FIGURE 7. Dynamics of the microscale and macroscale concentration profiles for the three cases of initial configuration, Pe = 100 (all plots are presented in terms of the dimensionless time variable τ * ). Note that the aspect ratio, a/L, has been increased by a factor of 10 to aid visualization. stirring) (Villermaux 2019), and (ii) diffusive mixing (or micromixing) (Epstein 1990;Bourne 2003).
From Newton's law of viscosity the shear rate is given by (Bird, Stewart & Lightfoot 2007) whereγ is shear rate. Note that highest convective mixing occurs in places where shear rate is maximum (the region near the wall). Although the convective transport is highest at the centre of tube, it experiences the lowest shear rate and, consequently, lowest convection-mediated mixing. Radial diffusion conveys solute away from the centreline, creating diffusive mixing. More correctly, convection and diffusion create mixing in concert; shearing of the initial solute distribution ultimately creates a deformed configuration of the solute where diffusion is much more effective. Note that at small times for some of the initial configurations studied here (specifically Cases B and C), convection may actually impose conditions that transiently create less convective mixing before increasing again. Even for seemingly simple initial conditions (e.g. Case A), we find that the earlytime behaviour can be quite complex. For example, for Pe = 100 at least three peaks are observable at τ * = 15 (t = 250 min) (figure 7a,b the third peak occurs near z = 0.6 m), even though there were only two peaks initially. At the beginning of the process, the concentration was uniform across the cross-section where the initial pulses were placed. Thus as time progresses, the concentration near the centre of the tube experiences more convection, whereas the solute located near the walls is transported mainly by diffusion. For Cases B and C, we find that the two initial peaks converge more rapidly for both Péclet numbers than for Case A; this is because the initial distribution on the left-hand side for these cases is distributed in the highest-velocity portion of the flow field (and the converse is true for the right-hand component of the initial condition). Thus, the centre of mass of the left-hand component catches up with the right-hand component. The effect of this is even more evident when we examine the behaviour of the moments of the solute. For Pe = 100, by τ * = 60 (t = 1000 min) we have essentially a single mode concentration field. This is a result, primarily, of the separation distance of the two components of the initial condition. Were they to have been separated by a greater distance, the time required to achieve a single modal concentration distribution in space would be longer.

Error analysis
In addition to issues of convergence, the numerical results also allow making estimates for the amount of error induced by the approximation made for the deviation balance equation. Specifically, we have imposed the following approximations in the analysis to this point: Recall that the first of these two constraints was imposed in § 5 and led to the constraint Pe(a/L 0 ) 1. Here, we have used the raw restriction (before generating the much simpler constraint) so that we can assess numerically the magnitudes of the terms are being neglected (the left-hand side) versus those retained (the right-hand side). The second of these constraints was associated with approximating the non-local solution by a local one ( § 6). For the second of these two constraints, we have shown in appendix A that the time constraint is automatically as long as a/L 0 1. It is clear that the length scale constraint itself is reasonably met (a/L 0 ≈ 1/20), thus we do not pursue the validity of this constraint further. Therefore, the remaining questions regarding the error associated with the restriction given by the inequality (8.3a). This is best measured by the error observed between the upscaled model and the direct microscale numerical results. We can compute each of the terms in the inequalities in (8.3a)-(8.3b); however, the dimension of the result is as large as the numerical solution itself. To generate a reasonable summary metric, we propose to compute the following error (noting that ṽ zc =ṽ zc − ṽ zc is the spatial deviation of the quantityṽ zc ).
Here, 0 is a small number used only to assure that division by zero errors do not occur. A few plots of the metric 1 (z, t) appear in the supplementary materials. This metric is still of high dimension, so to further reduce dimension (and generate a summary metric) we compute the first moment of 1 (z, t) about the z-axis as follows: This result measures the average error arising from the approximations imposed by the constraints (8.3a) and (8.3b). For interested readers, the detailed results of this analysis appear in figure 2 of the supplementary materials. A few general conclusions can be made regarding the error arising from this approximation are summarized as follows: (i) The error of the approximation (8.3a) tends to increase as Pe increases.
(ii) The error of the approximation decreases (not necessarily monotonically) with increasing time. (iii) The observed error is influenced by the particular initial condition examined. (iv) The maximum value of closure for the Pe = 10 case was less than 10 %, and the error rapidly decreased (within τ * < 5) to less than 5 %. (v) The maximum value of closure was of the order of 10 %-12 %; this was observed for Pe = 100.
Overall, these results seem encouraging and reasonable. The comparison of the averaged result with the numerical results (described below) provide further evidence that the approximation is valid for the Péclet numbers considered here. For larger Péclet numbers, the error associated with the approximation given by (8.3a) tends to increase. For such cases, it may become necessary to solve the closure problems associated with b and Φ numerically without neglecting terms (an example of the numerical computation of s * is discussed in § 9 and in appendix A § A.6).
On the basis of the numerical results and the discussion above, it is possible to specify more useful constraints to increase the range of validity of the analysis. We suggest slightly more liberal constraints of the form where is a unitless parameter; for our analysis ≈ 1. These constraints should generate results that are at least as accurate as those described in the present paper. We note that (8.6a) is consistent with Mercer & Roberts (1994, equation (2.17)). Our constraint is identical to theirs when the left-hand side of (8.6a) is multiplied by a factor of = 1/2.2, lending further support of these slightly more liberal constraints in practice. We note that in the case of Taylor (1954), the value of = 1/4 was adopted; when fixing the value a/L 0 ≈ 1/10 as approximated by Taylor (1954), equations (8.6a)-(8.6b) are also consistent with that work.

Computation using the upscaled balance equation
The upscaled problem given by (6.6a)-(6.6d) was solved as a one-dimensional model for the three initial condition cases. Note that the dispersion coefficient is independent of the initial condition, thus it remains identical for each case. Simulations were conducted using COMSOL in a manner consistent with what has been described in § 8.1 for the microscale simulations. The analytical solutions for D * and s * were computed using a custom-built code constructed in MATLAB; results were then imported into COMSOL, and interpolated to the grid using built-in routines. Simulations were assured to be convergent (using the approach described for the microscale simulations), and the average grid size was 1 × 10 −4 m. Other solver parameters are identical to those reported in § 8.1 and table 2.
In figure 8, the averaged concentrations computed from (i) the numerical solution, and (ii) the upscaled, one-dimensional equation are compared. For Pe = 10, the average concentration computed from the numerical solution is nearly indistinguishable from those predicted by the upscaled one-dimensional theory. We defined the following error metric for the averaged spatial concentration at a specified time, t p by where the maximum in the denominator is taken as the maximum average concentration observed at time t p . We computed the error from pairs of points separated by the minimum distance between the curves (as is done in computing total least squares) so that the error was not skewed by small displacement discrepancies. For the Pe = 10 cases, the maximum error observed was less than 1.5 % for all times and all cases. For the Pe = 100 cases, the maximum error was less than 7 % for all times and cases. For reference, plots of the errors are available in the supplementary materials. The correspondence between the two methods for computing the averaged concentration provide validation that the approximations given by (8.3a) and (8.3b) are reasonable for these Péclet numbers.

Moment analysis
The spatial moments of the concentration field are frequently used to characterize the behaviour of the concentration field, both because of their appearance in the underlying theory, and because they provide information about the characteristics of the concentration distribution. The spatial moments in the z-direction of order n (n = 0, 1, 2, 3, . . .), M n , are defined by Often, the centred spatial moments of order n(n = 0, 1, 2, 3, . . .), µ n are adopted for characterization. These are given by For this work, we are primarily interested in the second and third moments; specifically, we track the centred second moment, which can be expressed by and the skewness, which is a normalized centred third moment, specified by . (8.11) In figures 9 and 10 we plot the second centred moment and the skewness as a function of time for each of the three initial conditions. One interesting feature that can be observed in these data is the distinct differences between the apparent time scales for the second centred moment and the skewness to attain their near-asymptotic behaviour. For the second moment, the asymptotic behaviour is represented by a linear increase in the moment with increasing τ * ; for the skewness, the asymptotic behaviour is the approach to a value of zero (cf. Chatwin 1970). Examining the Pe = 10 plots in figure 9, the slope of the second moment (dµ 2 /dt) reaches its asymptotic value for τ * near 2. The skewness, however, continues to show substantial evolution through the entire period plotted (up to τ * = 60). Similar kinds of behaviour can be observed for the Pe = 100 data; the second moment is near asymptotic after approximately τ * = 20, whereas the skewness continues to evolve for the entire time period plotted. We have plotted the skew for both Péclet numbers on a log-log scale for a time interval up to τ * = 300 in figure 11; the long-time dynamics of the skew is more easily seen in that figure.
For Pe = 100, each of the three initial condition cases more obviously shows substantial transience in the skew. For each of the cases, the skewness starts at a negative value, and then crosses the zero to become positive. The approach to the asymptotic value then occurs slowly from the condition of positive skewness. Overall, these results indicate that skewness generally increases at early times, and this increase may ultimately lead to non-monotonic behaviour as the skewness approaches the asymptotic value. This is consistent with Chatwin (1970), who also found non-monotonic behaviour in third moment terms. Similar results can be seen in the Pe = 10 cases (see the inset in figure 11a), although the time scales simulated do not allow a full analysis of the dynamics of the skewness at this value for Pe. Because our simulation times do not extend far enough, we cannot accurately assess the asymptotic behaviour of the skewness (such as those reported by Aris (1956) and Chatwin (1970)).
These results indicate that the approach to normality should not necessarily be measured by, for example, the approach of only the second moment to its asymptotic behaviour (i.e. achieving constant slope). While this is a useful metric, higher-order measures such as the skew may continue to show substantial evolution over a longer characteristic time scale indicating that the system as a whole is not necessarily near its asymptotic state of behaviour.

Derivative of the second moment
In many theories of dispersion, the time derivative of the second moment is used to generate a de facto definition of the dispersion coefficient; usually, this is expressed in one dimension by  Here, we have used the notation D µ 2 to distinguish this value from D * . While such definitions are true asymptotically, they are not necessarily true for times that are near to the initial configuration time. One of the desirable traits for the dispersion coefficient identified in the Introduction is that it should be positive. This prevents conflicts with macroscale thermodynamics, and also assures that the resulting balance equations are well posed. However, there are initial configurations for which the second moment decreases in time before increasing again. This can be seen in the curves for µ 2 (Cases B and C) for both Pe = 10 and Pe = 100 (figures 9 and 10). If one were to use the definition above then, for early times, a negative value for the dispersion coefficient would be produced. As an example, we have computed this quantity for Pe = 100 and plotted it in figure 12. Note that for Case A, the definition for D µ 2 is identical to that defined for D * ; this is because s * is identically zero for that  Adopting the conventional definition of dispersion as 1/2dµ 2 /dt is not valid at early times for some initial conditions. In general, the definition based on 1/2dµ 2 /dt is only valid at asymptotic times. In contrast, the definition proposed using the upscaled theory presented is valid at all times and for all initial conditions.
case. However, for the other two cases, we have an initial configuration that transiently reduces the second moment. This occurs because the initial configuration contains two parts: the first part (on the left) contains mass focused in the high-velocity regions, and is upstream from the second part. The second part (on the right) contains mass that is focused near the walls. The net result is that the centre of mass of the left part of the initial distribution moves faster than the right; thus, the left part eventually catches up to the right. This manifests physically as a decrease in the second moment. Thus the definition given by D µ 2 yields non-physical results for those cases because the effective dispersion coefficient is negative at some times; in contrast, D * (t) presented in (6.6a)-(6.7c) is strictly positive for all times. These results provide some additional insight into the physical role of the s * field. Essentially, this field is a source term that assures that the spreading that is encoded by the initial configuration is correctly represented at early times. Because this is independent from the effective dispersion coefficient, there is no need to posit dispersion coefficients that, for example, violate macroscale thermodynamics by attaining negative values.

Examples addressing the superposition problem
Taylor himself was not necessarily a proponent of dispersion coefficients that were expressed as functions of time. In particular, he was concerned about possible paradoxes that could occur when pulses were injected into a system at two different times (with the question being, 'which dispersion coefficient applies?'). In his 1959 paper (Taylor 1959) he stated the following about time-dependent dispersion coefficients (using the term diffusion instead of dispersion as is now more common).
It seems to me that this is an illogical conception. The one thing that seems to be agreed, whatever theory one may have about diffusion, is that diffusing distributions are superposable. If therefore you attempt to analyse the distribution of concentration from two sources which started at different times by this method, it would be necessary to assume, in places where the distributions overlapped, that the diffusion constant had two different values at the same time and at the same point in space.
Because there are many senses in which the concept of superposition may be applied, we offer the following definition for the superposition of non-homogeneous partial differential equations (after Olver (2014, appendix B)): THEOREM. If u 1 and u 2 are particular solutions to the non-homogenous linear equation L[u] = f , then u = u 1 + u 2 solves L[u] = f 1 + f 2 .
Implicit in this definition is the idea the linear operators involved must be the same over any time and space intervals considered. Because the problem we are considering involves a parameter, D * (t), that is a function of time, then the principle of superposition specified above requires that the two solutions, u 1 and u 2 be defined over the same time interval. This is one of the difficulties that has been encountered for the slightly more restrictive conditions that Taylor (1959) required. For the situation outlined by Taylor (1959) there was a stated desideratum (on the basis of physical reasoning) that two solutions defined over different time intervals would be superposable.
While formulations that do not account for a source term, s * , lose the ability to be sensibly superposed, this is not true for the theory that we have presented. In fact, this is one of the strengths of the proposed theory: superposition is maintained (cf. Mercer & Roberts 1994). In short, the source term, s * , and the effective dispersion coefficient, D * , work together in a way as to maintain the principle of superposition. The clearest way to see this is through concrete examples. We present two examples below (with Pe = 100) where we compute both the microscale and the corresponding macroscale solution. Although ordinarily one would not compute the microscale solution, doing so allows us to compare the average computed directly from the microscale solution with the average computed from the upscaled balance equations. It also provides an opportunity to better understand the complicated microscale dynamics involved in the transport process, which provides useful context for interpreting the upscaled results.
We illustrate superposition using two examples, as follows.
(i) For the first example, we provide a problem that makes the purpose of the source term, s * , more apparent. In this example, we consider a single injection, but break the total time (0 < t < T f ) up into two time intervals. During the first interval (S 1 : 0 < t < T 1 ), the initial condition evolves from a uniform slug input to a complex distribution of space (with non-zero radial gradients). The new configuration at time t = T 1 is then treated as the initial condition for the second time interval (S 2 : 0 < t < T ∆ ). For the second interval, the time variable is reset such that the initial time for the problem is equal to t = 0. The problem at this juncture is simply a new initial value problem with a complicated initial condition. Importantly, for this second interval, the dispersion coefficient D * (t ) evolves exactly as indicated by (7.3). In other words, the dispersion tensor starts at a value of D * (t = 0) = D, and grows in time (in variable t ) according to (7.3). Although for this second time interval the value of the dispersion coefficient is reset to its zero-time value, the second moment of the solute continues is shown to grow at the rate that was growing at the end of the first time interval. This illustrates how the memory of the system is encoded in our solution by the source function, s * , rather than in the dispersion coefficient.
(ii) For the second example, we build off of the discussion of superposability introduced in the first example. We consider specifically the 'two release' case identified by Taylor (1959), where solute is injected first at t = 0 (which we refer to as I 1 ), and the system is allowed to evolve up to the time t = T 1 . At time t = T 1 , a second solute injection occurs (I 2 ), and the system again is allowed to evolve. For this problem, we show that if separate transport equations are written for each release (adopting the notation c 1 and c 2 for the first and second injections, respectively), that these two equations can be superposed to derive a single transport equation for c = c 1 + c 2 . Notably, for the second time interval, the function D * is single valued everywhere in space, including locations where the two solute injections overlap. In other words, even though the residence times for the two solute injections are not equal, they are described by a single upscaled dispersion coefficient. As with the first example, this example shows in detail how the history of the solutes are encoded by s * rather than by the dispersion coefficient, D * . This example directly addresses the objection raised by Taylor (1959); in addressing this question, we show that our upscaled transport equation is superposable in the sense defined by Taylor (1959), and that the disturbing problem where the dispersion coefficient appears to have multiple values at a single point does not occur. 9.1. Macroscale dispersion: example 1 In order to isolate the effect of the s * field on the overall solution to the macroscale equations, we provide an example where its influence can be more readily understood. In this example, we illustrate that the conventional notion of superposition is still valid for the upscaled theory outlined previously.
For this example, a single solute injection is simulated as a sequence of two initial value problems. It is always possible to break initial value problems into parts like this when superposition is valid. The solution at the end of any one time interval is simply a concentration field; this concentration field can then be treated as the initial condition that is transported over a second time interval via the same balance equation.
For this example, we examine the transport of a single uniform pulse. We allow the initial condition to evolve until t = T 1 = 250 min. The solution at this time, c (t=T 1 ,z) is used to define the initial condition for the second time interval, S 2 , at starting at t = 0 min, and ending at t = T ∆ = 750 min. The problem can be stated by the following sequence of two problems. The two problems can be written as Note that the second interval is written in terms of a new time variable, t = t − T 1 such that 0 < t < T ∆ . After the solution c | (z,t ) is computed, it is straightforward to translate this solution to the original time variable via t = t + T 1 . The notable feature of this solution is that the dispersion coefficient has exactly the same dynamics for each of the two time intervals. To be explicit, recall T 1 = 250 min, and T f = 1000 min; the dispersion coefficients for the two intervals are given by Interval 2 (0 < t < 750 min) This means that D * (t = 0) = D at the start of the first time interval, and that D * (t = 0) = D at the start of the second time interval. For both intervals, the dispersion coefficient increases from its initial value with identical dependence upon time according to (9.2a) and (9.2b). In other words, there is no 'memory' for the dispersion coefficient.
We note that for the second time interval, the initial concentration field is not uniform in the radial direction; this means that the s * field is non-zero. This initial condition is also quite complex (i.e. it involves computing the s * -field for the initial condition c | (z,t =0) = c 0 Z 1 (z) + c 1 | (z,t=T 1 ) ), so there is no analytical solution for the s * field. Thus, s * is computed numerically, as described in § A.6. The s * field for the superposed initial condition at time t = 0 min is given in figure 13.
This example allows for a unique opportunity to directly assess the effect of the s * field. To do so, we consider the following three cases for obtaining a solution: (i) The solution to the problem is determined by a single computation, spanning the interval 0 t 1000 min. (ii) The solution is obtained for two time intervals, where the time intervals are given by S 1 : 0 t 750 min and S 2 : 0 t 750 min. The s * field is computed as specified previously. (iii) Strictly for comparison, the solution is obtained as the superposition of the two time intervals described above. The s * field, however, is set (incorrectly) to zero so that the solutions both with and without the s * field can be compared. This corresponds to the form of the time-dependent dispersion equation that is used frequently (cf. Sankarasubramanian & Gill (1973, equation (11))), and it exhibits the lack of time translation symmetry pointed out by Taylor (1959).
The second moment is an effective measure for examining the result of these three cases. In figure 14 we have plotted the results of the solutions as computed for each of the three cases listed. In these results, we can consider the results from the first case (the black line in figure 14) to be the ground truth for comparison.
We note that, for the solution computed in two steps with the correct value for s * , the second moment is continuous, and it matches the single-step solution almost identically. This occurs even though the dispersion coefficient returns to its zero-time value when the problem is restarted at time t = 250 min. This behaviour occurs because of the unique interplay between the effective dispersion coefficient, D * , and the source term given by s * . Thus, the concern posed by Taylor (1959) about the paradox of which dispersion coefficient to use (the one at the end of the first time interval, or the one corresponding to zero elapsed time for the second time interval) does not arise. The dispersion coefficient is unequivocally defined for both time periods by (9.2a)-(9.2b), and superposition of the solutions for the two time intervals leads to a solution that contains no discontinuities in the slope of the second moment.
For Case 3, we note that without the proper s * field correction, the second moment develops a cusp at t = 250 min. In other words, the rate of spreading for the second time interval is too small, and this occurs because the source term s * has been neglected. This creates a technical problem, because the derivative of the second Case 2 (second step) The solution, S 2 for 0 < t < 750 min (250 < t < 1000 min) with the source term computed numerically, and illustrated in figure 13. (iv) Case 3. The solution, S 2 for 0 < t < 750 min with s * = 0. moment does not exist at that point. Additionally, the second moment then grows too slowly for the remaining time, under-predicting the actual value significantly. This case not only allows a comparison to better illustrate how the s * field influences the solution to account for the initial distribution of matter, but it helps to resolve the apparent paradox that troubled Taylor (1959). Specifically, for dispersion in tubes, the dispersion process is not independent of the initial condition at early times; however, this dependence is best represented through s * rather than D * . For initial conditions that are uniform in the radial direction, no additional modification from conventional theory is needed. However, for initial conditions that are not uniform in the radial direction, the initial condition itself generates a behaviour that is accounted for by the appearance of the non-conventional source term s * in the macroscale transport equation. The inclusion of the additional source term is essential for predicting the correct macroscale dynamics of the system. 9.2. Macroscale dispersion: example 2 In this example, we illustrate through direct deviation and computation that our solution is superposable in the sense that was desired by Taylor (1959). For this problem, the system starts with a specified initial condition and progresses for some time (0 < t < T 1 ). Then, a second source is instantaneously injected into the system, and transport continues over the interval T 1 < t < T f . This situation, then, yields a condition where two solutes are in the system for varying amounts of time (and, hence, would experience different time-dependent effective diffusion coefficients). Taylor's 1959 concern was the logical incongruity associated with assigning two different values of the dispersion coefficient at the same point where the solute bodies overlap. In the following, we show that the framework developed above avoids this problem.

The superposition problem for a two release case
This example is much like example 1, except now we consider the case where there are two injections of solute at two different times. This represents exactly the conditions that which were addressed by Taylor (1959). In other words, the case of two releases leads to the problem where two solute distributions, each having been in the system for different amounts of time, might overlap. The conventional approach (e.g. Sankarasubramanian & Gill (1973, equation (11))) using a time-dependent dispersion coefficient (without a source term for correction) would lead to a problem where two different values of the dispersion coefficient would occur at points where the solute distributions overlapped. In this example, we illustrate that the balance equation for the average concentration that we derive can be sensibly superposed.
To start the discussion, we first need to determine mathematically how to handle a second pulse injected at some time t = T 1 after progression from a defined initial condition at t = 0. The sudden injection of solute into the system at t = T 1 introduces a discontinuity of the solute field in time. Although there are several ways that this might be represented, the simplest one is to use superposition in the following way. For the first time interval (0 < t < T 1 ), the system evolves from the specified initial condition (for these examples, we will use the function Z(z) defined previously in (7.11a), such that c | (z,t=0) = c 0 Z 1 (z)). At time t = T 1 , a new solute source is instantaneously injected into the system. The new initial condition for the second time interval is then the superposition of the second pulse configuration with the configuration of the existing solute distribution field, c | (z,T 1 ) . This new field forms an initial condition for the second time interval, beginning at time t = T 1 and ending at time t = T f . In the comments by Taylor (1959), this is ostensibly what is meant by the use of the word 'superposition'. Thus, we consider two problems that together provide the necessary solution.
To make this concrete, again let T 1 = 250 min and T f = 1000 min. The two problems can be written as follows. We define the associated independent transport equations for the solute injections (indicated by I 1 and I 2 ) for the entire time interval (0 < t < 1000 min) to be described by Problem c 1 I 1 : 0 < t < 1000 min For the second injection, the time coordinate for the second interval is started at the time t = t − T 1 , as described in example 1. Without this translation, the second injection would not start with the correct dispersion coefficient, which is D * (t = 0) = D, as indicated by (9.2b). The boundary conditions are identical to those posed in (3.1a)-(3.1e). For reference, the microscale solution to this problem is provided for a number of time points in figure 15. We note that problem c 2 must be put in the time coordinates for which the injection happens at time equal to zero (this was noted in § 7.2) to properly define D * and s * . We now explore how these two solutions can be superimposed. First, note that the classical principle of superposition (Polyanin 2000;Olver 2014) for linear partial differential equations with time-dependent coefficients requires that the superposition be defined over the same time interval. Thus, problems I 1 and I 2 can be equivalently written as (recalling that for the general problem, the initial condition is represented by ϕ(z); cf. (3.1e)) Problem c 1 I 1 : 0 < t < 1000 min Interval 2 0 < t < 750 min Problem c 2 I 2 : 0 < t < 1000 min We make the following important notes about the formulation above.
(i) We have divided Problem 1 into two intervals 0 < t < 250 min and 250 < t < 1000 min. This is not an arbitrary choice; it is necessary to obtain solutions that involve multiple initial conditions.
(ii) In our formulation, the dispersion coefficient D * does not have to 'remember' its prior history. Hence, exactly the same function D * is used in both intervals. Being explicit, for the problem for c 1 in the first time interval, the dispersion coefficient starts at D * (t = 0) = D, and grows as predicted by (7.3). When time is restarted in the second interval, exactly the same behaviour is repeated. The dispersion coefficient for the second time interval starts at D * (t = 0) = D, and again grows as predicted by (7.3). During the second time interval, the source function s * 1 automatically accounts for the spreading inherent in the second initial condition. This function is essentially the 'memory' in the system, but one that encodes that memory in a local rather than non-local in time representation.
(9.5) (iv) These features make the two problems superposable; the two problems can be added to generate one equation that suffices to describe the dispersion process over the two intervals.

Superposition of the two concentrations
We now show how superposition can be accomplished for the problem of two releases of solute at two different times. We begin by defining the superposed concentration c = c 1 + c 2 . For the first time interval, the superposition is trivial, noting that c 2 ≡ 0. Thus, c = c 1 . Superposition for the first interval yields 889 A5-44 E. Taghizadeh, F. J. Valdés-Parada and B. D. Wood Superposition for interval 1 0 < t < 250 min Note that the concentration at the final time t = T 1 = 250 min, forms the initial condition for c 1 during the second interval. For reference, we denote this concentration by c 1 | (z,t=T 1 ) . Because (9.4d) and (9.4g) are (i) linear, and (ii) defined over the same time interval 0 < t < (T f − T 1 ), the superposition of these equations is straightforward. Adding (9.4c) and (9.4f ) yields Superposition for interval 2 0 < t < 750 min Here, we have used the linearity of the source term: s * (z, t ; ϕ) = s * 1 (z, t ; ϕ 1 ) + s * 2 (z, t ; ϕ 2 ) to generate the single source term field, s * , for the superposed problem. The initial condition for the second interval is found by adding the final concentration from time interval 1, c 1 | (z,t=T 1 ) , to the initial condition for the second solute release. (9.6d) The Taylor dispersion problem is frequently represented by an equation of the form of (9.4d) and (9.4g), but with s * 1 = s * 2 = 0 (cf. Smith 1981a). Superposition for this case fails because even if one divides the problem into two time intervals. Upon restarting time (setting t = 0) for the second interval, the concentrations c 1 and c 2 are represented by two different linear operators (one for c 1 where D * (t) begins at t = T 1 , and a second for c 2 where D * (t ) begins at t = 0). In the formulation we provide, D * (t) is always given by a single function of time everywhere in the domain; any 'time history' associated with multiple injections at different times is accounted for strictly through the source term field, s * . This avoids the possibility of having D * being multi-valued at a single spatial location.

Proof of validity of superposition by direct computation
The superposition of c 1 and c 2 is given by (9.6a)-(9.6d). For clarity, we re-emphasize the following regarding the interpretation of these equations: (i) During the first time interval (0 < t < T 1 ) the dispersion coefficient for the time interval starts at D * (t = 0) = D, and again grows as predicted by (7.3). This is described by the single transport equation given by (9.6a,b). Because the radial derivative of the initial condition for this interval is zero, we have that s * 1 (z, t) is zero.
(ii) For the second time interval, the initial condition is the solution obtained from the end of the first time interval (at t = T 1 ) plus the initial condition representing the second solute release. This is described by the single transport equation given by (9.6c,b). . The macroscopic concentration field predicted by the upscaled problem statements given by I 1 and I 2 . Continuous lines represents the averaged microscale simulation results; points represent the solutions to the upscaled equations given by (9.3a)-(9.3d). The concentration curves are provided as two plots for ease in visualization. Note that the curve in dark blue at t = 250 min represents the macroscale initial condition for problem I 2 computed by superposition.
(iii) During the second time interval (0 < t < T 1 ) the dispersion coefficient for the time interval starts at D * (t = 0) = D, and again grows as predicted by (7.3). Note that the radial derivative of the initial condition for the second time interval is nonzero, thus there is a contribution from the s * -field for the second time interval.
The initial condition for this problem is similar to that for Problem 1. In fact, the s * field is identical to that for Problem 1, and is computed numerically as described for Problem 1.
In figure 16 we have illustrated the solutions for this problem computed by (i) averaging the microscale numerical solutions directly, and (ii) via the two-step (superposed) macroscale simulation (solving (9.6a)-(9.6d)) as described above. We note that the two solutions are in good agreement, and, in particular, the amount of dispersion predicted by the upscaled model (both before and after the second pulse) is consistent with the averaged microscale data (for which there are no approximations of any kind). There is a slight mismatch in between the two solutions near the peaks for times t = 100, 350 and 500 min. Most likely, these errors are associated with the approximations in the upscaled equation, as discussed previously (cf. the peaks for Pe = 100, Case A in figure 8).
In summary, we have provided two examples that meet the desideratum of Taylor (1959). For each solution, we find that (i) we can define a sensible notion of superposition for the upscaled equation of the form that we have derived, (ii) the associated dispersion coefficient is always a single-valued function of space and (iii) the source terms are superposable, and they generate a field correction that assures that the proper rate of spreading (or, equivalently, second moment) is achieved. Because the source terms are responsible for correcting the rate of spreading at early times, this function is not imposed on the dispersion coefficient. It is this component of our solution that generates the necessary conditions for the upscaled equation to be superposable.

Conclusions
In this study, we were able to address a long-standing problem of early-time behaviour from arbitrary initial conditions. The approach we adopted generates an effective macroscale balance equation that is consistent with the conventional one-dimensional convection-dispersion-type equation. One notable difference is that the formulation presented here also contains an exponentially decaying-in-time source term that represents memory of the initial condition. Because our results were compared with numerical simulations, we were able to compute the errors associated with the approximations made in our development. In short, these approximations are represented by In addition, the four guidelines that we proposed in the introduction for generating a well-structured dispersion theory were met. Specifically, the dispersion coefficient for Taylor dispersion was shown to be positive, independent of initial conditions, superposable and converges to the classical result at large times. Of these, the most important result was illustrating that the non-conventional source term is necessary to assure that solutions are superposable. This addresses a long-standing problem about maintaining the principle of superposition when a time-dependent dispersion coefficient has been defined.
where the sum of termsṽ z ∂c/∂z − ṽ z ∂c/∂z has been neglected with respect to radial diffusion and a Lagrangian frame of coordinates is being used. This problem has exactly two source terms: one due to macroscopic convection and the second one due to the initial condition. The solution of this problem can be carried out by means of integral equations formulations in terms of Green's functions (Wood & Valdés-Parada 2013;Polyanin & Nazaikinskii 2015) with the associated Green's function (G) satisfying the following initial and boundary-value problem: The solution to this problem can be written as G(r, ρ, θ, ϑ, z, ζ , t, τ ) = G 1 (r, ρ, θ − ϑ, t − τ )G 2 (z − ζ , t − τ ), (A 3) where, for convenience, the solution is expressed in terms of the Green's functions of unsteady polar (G 1 ) and axial coordinates (G 2 ). These Green's functions are given by A n λ 2 nm J n (λ nm r/a)J n (λ nm ρ/a) a 2 (λ 2 nm − n 2 )J 2 n (λ nm ) In (A 4a) A n is a constant whose value is 1 for n = 0 and 2 for n = 1, 2, . . . In addition, λ nm are the positive roots of the transcendental equation J n (λ nm ) = 0. The solution of the problem given in (A 1e) can be obtained by using the unsteady version of Green's formula and the resulting expression can be written in the form given in (5.11). Direct substitution of this result into the unclosed macroscale balance equation yields the closed but non-local model given in (6.1).

A.2. Localized solutions
The relative advantages and disadvantages of non-local formulations have been discussed in the body of the paper. Regardless, we are ultimately interested in finding analytical solutions for the effective parameters that arise from averaging, so a localized form is desirable. Recall, this is valid when there is separation between the characteristic space and time scales for the microscale and macroscale variables, i.e.
where L 0 is a measure of the size of the initial configuration in the longitudinal direction, t * is the characteristic time scale for the microscale processes and T * is the characteristic time scale for the macroscale processes. The length scale constraint is fairly clear in that it is expressed in terms of physical properties of the system. For the time scales, the restriction is less clear. We can attempt to (very roughly) characterize these time scales as follows. There are two reasonable time scales associated with the microscale balance given by (A 1a). First, the convection term involved in the material time derivative has a time scale that would be estimated by T * = O(L/U). The diffusive time scale in the radial direction is estimated by t * = O(a 2 /(4D)). If we assume that Pe T O(1), then the diffusive time scale can be used for estimates. For the macroscale equation, (6.6a), estimates are complicated somewhat by the fact that the length scale of interest, L, is the longitudinal size of the solution as it spreads. Noting that we are attempting to generate estimates for the length scale associated with ∂ c /∂z in the convolutions given by (5.11), we can neglect the convection term (since we are in a moving coordinate system). The time scale can be estimated from the dispersive time scale, which we can think of as containing an initial time scale plus a transient time scale. In other words, we are making the estimate Recalling t * = O(a 2 /(4D)), it is clear that the constraint t * T * is automatically met when a L 0 . Therefore, the single length scale constraint appears to be sufficient to justify the analysis, and this has been adopted in the body of the paper. We note that an alternate analysis of approximations of this sort is available in Mercer & Roberts (1990, appendix A), where the analysis is carried out in Fourier space in the context of a centre manifold theory approach to upscaling.
Additional details (and refinements about specific metrics for the time and length scales) for this approximation can be found in Wood & Valdés-Parada (2013). Regardless, exact conditions for when this approximation can be imposed are difficult to define; instead, we opt for the approach here of making the assumption that this approximation holds, and then examining the fidelity of the results. We do note that the removal of the axial derivative of the concentration is consistent with the second-order truncation presented by Gill & Sankarasubramanian (1970). The work of Gill & Sankarasubramanian (1970) suggests that the higher-order terms are much smaller than the second-order terms, which provides an independent validation of the approximation that we have employed.

A.3. Analytical solutions for the b-field
Assuming that the constraints allowing a local solution are valid, we can extract the axial derivative of the average concentration from the second integral in (5.11), we set b(r, θ , z, t) = − τ =t τ =0 ζ =∞ ζ =−∞ ϑ=2π ϑ=0 ρ=a ρ=0 G(r, ρ, θ , ϑ,z, ζ , t − τ )ṽ z (ρ)ρ dρ dϑ dζ dτ Note, we can make use of the identity It is clear from these results that the b-field is entirely independent of the particular initial condition adopted. Also, because of symmetry, the velocity deviations do not depend on the angular direction, θ or longitudinal coordinate, z. Direct integration leads to a result that is only a function of r and t b(r, t) = 1 4 A.4. Analytical solutions for the Φ-field Following an analysis analogous to that for the b-field, we define the following functional, Φ(φ): Φ(r, θ ,z, t;φ) = ζ =∞ ζ =−∞ ϑ=2π ϑ=0 ρ=a ρ=0 G(r, ρ, θ, ϑ, z, ζ , t)φ(ρ, ϑ,ζ )ρ dρ dϑ dζ . (A 13) Evidently, in order to draw a specific expression for this closure variable it is necessary to provide a particular initial condition,φ. Substituting the Green's function G into (A 13), and rearranging, we recover the result given in (7.7) of the main text, which is applicable to a general initial condition Φ(r, θ , z, t) = B 0,0 (z, t) + ∞ n=0 ∞ m=1 B n,m (θ, z, t)J n λ nm r a exp − λ 2 nm D a 2 t , (A 14) where the functions B 0,0 (z, t) and B n,m (θ,z, t), are defined by In principle, this expression (combined with (A 15a) and (A 15b)) provides the integral solution for any initial condition. Regardless of the particular initial condition chosen, note that Φ is linear in the initial condition function, i.e. ϕ = ϕ +φ, (A 16) hence, multiplying the initial condition by a constant K 1 yields The linearity in the initial condition is then established by as determined from (A 13). As mentioned in the main body of the paper, we have found (non-trivial) particular solutions in analytical form (i.e. no unresolved integrations remain, and the result is given as a series expression) only for the case of θ-symmetric initial conditions. For the case of θ-symmetric initial conditions, the resulting expression for the closure variable Φ simplifies to Φ(r, z, t) = where B n (z, t) is defined by In these expressions, the first index indicates the order of the moment, and subsequent indexes indicate for which coordinate the moments are taken. With these definitions in place, the mass-averaged solute velocity is defined by v z,M = ∂M 1 (t) ∂t .

(B 25)
Before moving on, it is worth recalling that, among the simplifications adopted in this work, it was assumed that the termṽ z ∂c/∂z and its spatial average are negligible in comparison to the radial diffusive term (valid for Pe(a/L 0 ) = O(1)). In the material that follows, we make corrections to the coordinate used to displace the solutions, but we do not alter the value U that appears in these solutions. The following constraint is adopted to indicate under what conditions this assumption is valid: B 26) or, making conventional order-of-magnitude estimates, a reasonable constraint is Here, it has been assumed that reasonable estimates of the characteristic lengths of variations forc and c are a and L, respectively. In the solutions below, it will be apparent that this constraint is easily met for the particular initial conditions studied. It is convenient to denote the portion of the total initial mass per unit length of the tube given by each of the two parts of the initial condition be specified by m 1 (z) = 2πZ 1 (z) r=a r=0 R 1 (r)r dr, (B 28a) m 2 (z) = 2πZ 2 (z) r=a r=0 R 2 (r)r dr. (B 28b) From the above definitions, the mass weightings for the two portions of the initial condition are defined as Note that these weightings are distinctly different from the spatial weighting functions discussed in the main body of the paper. Using these weighting functions, the initial mass-averaged velocities are given by The difficulty now is to determine a time scale for which the initial condition spreads sufficiently to sample the entire velocity distribution. To this end, consider