Turbulent shear layers in a uniformly stratified background: DNS at high Reynolds number

Abstract Direct numerical simulations are performed to investigate a stratified shear layer at high Reynolds number ($Re$) in a study where the Richardson number ($Ri$) is varied among cases. Unlike previous work on a two-layer configuration in which the shear layer resides between two layers with constant density, an unbounded fluid with uniform stratification is considered here. The evolution of the shear layer includes a primary Kelvin–Helmholtz shear instability followed by a wide range of secondary shear and convective instabilities, similar to the two-layer configuration. During transition to turbulence, the shear layers at low $Ri$ exhibit a period of thickness contraction (not observed at lower $Re$) when the momentum and buoyancy fluxes are counter-gradient. The behaviour in the turbulent regime is significantly different from the case with a two-layer density profile. The transition layers, which are zones with elevated shear and stratification that form at the shear-layer edges, are stronger and also able to support a significant internal wave flux. After the shear layer becomes turbulent, mixing in the transition layers is shown to be more efficient than that which develops in the centre of the shear layer. Overall, the cumulative mixing efficiency ($E^C$) is larger than the often assumed value of 1/6. Also, $E^C$ is found to be smaller than that in the two-layer configuration at moderate Ri. It is relatively less sensitive to background stratification, exhibiting little variation for $0.08 \leqslant Ri \leqslant 0.2$. The dependence of mixing efficiency on buoyancy Reynolds number during the turbulence phase is qualitatively similar to homogeneous sheared turbulence.

inhibit shear instabilities and shear-driven turbulence. Quantifying the rate of mixing has important implications in large-scale ocean and atmospheric models. Field observations and general circulation models rely heavily on the parametrization of mixing efficiency (E) to quantify or prescribe the effect of turbulence, where E is understood to be the ratio of the gain in the background potential energy over the sum of the gain plus the dissipation rate of turbulent kinetic energy. For example, some ocean observations use E to estimate turbulent diffusivity (K ρ ) and thereby vertical heat flux, while ocean models use E for subgrid turbulence closure (Jayne 2009;Whalen et al. 2015;Gregg et al. 2018). An accurate description of E is key in data interpretation and model prediction and the present study aims to improve our existing understanding of the mixing processes in stratified shear flows in nature and engineering.
One popular model is that suggested in Osborn (1980) in which the turbulent diffusivity is simply parameterized as K ρ = Γ κRe b , where κ is the molecular diffusivity, Γ = E/(1 − E) is the flux coefficient (similar to the flux Richardson number) and Re b = ε/νN 2 is the buoyancy Reynolds number defined using the turbulent dissipation rate (ε) and the squared buoyancy frequency (N 2 ). Here, Osborn's formula is interpreted in the context of irreversible mixing (Ivey & Imberger 1991;Venayagamoorthy & Koseff 2016). Osborn (1980) suggested an upper-bound value of 0.2 for Γ . The validity of Osborn's model and, more specifically, the question of whether Γ has a fixed value, have received much attention. It has been shown that the flux coefficient Γ varies with Prandtl number (Pr), buoyancy Reynolds number, turbulent Froude number (Fr = ε/NK where K denotes turbulent kinetic energy) and Richardson number (Ri) (Strang & Fernando 2001;Shih et al. 2005;Mater & Venayagamoorthy 2014;Maffioli, Brethouwer & Lindborg 2016;Venayagamoorthy & Koseff 2016).
For homogeneous stratified shear turbulence in which shear and stratification are uniform over a turbulent region, Shih et al. (2005) proposed three regimes of mixing: a diffusive regime in which K ρ = κ for Re b < 7, an intermediate regime in which K ρ = 0.2νRe b for 7 < Re b < 10 2 (similar to Osborn's model) and an energetic regime where K ρ = 2νRe 1/2 b for Re b > 10 2 . In contrast, Portwood, de Bruyn Kops & Caulfield (2019) argued that the flux coefficient Γ does not vary over a wide range of Re b and the regimes seen in Shih et al. (2005) are possibly due to transient effects. Salehipour et al. (2016) and Mashayek et al. (2017b) proposed two regimes for turbulence in a mixing layer with a two-layer density profile: a buoyancy-dominated regime in which K ρ ∝ Re 3/2 b for Re b < Re * b and a shear-dominated regime where K ρ ∝ Re where Re * b ≈ 100-300 is the buoyancy Reynolds number at which the mixing efficiency is at its maximum. Stratification in the natural environment is ubiquitous. It is presently unknown how mixing physics in such an environment, where the mixing layer is exposed to a uniform stratification outside the sheared zone, differs from the canonical problems of homogeneous stratified shear flow and two constant-density layers.
In the past two decades there has been a sustained effort to more accurately quantify mixing using three-dimensional, turbulence-resolving direct numerical simulations (DNS). Smyth & Moum (2000) and Caulfield & Peltier (2000) performed some of the first DNS of three-dimensional turbulence in a stratified shear layer to address the importance of shear-driven mixing. Since then, due to the benefits of increased computational power, DNS have been able to examine the rate of mixing at higher Reynolds numbers, most notably in the recent work by , Salehipour, Peltier & Mashayek (2015),  and Kaminski, Caulfield & Taylor (2017). Some significant results regarding Γ from these simulations are: (1) Γ has a significantly higher value than 0.2, (2) Γ peaks at an intermediate value of stratification where secondary shear instabilities are the richest, and (3) the value of Γ varies significantly with Reynolds number, Prandtl number, Richardson number and even the amount of pre-existing turbulence in the shear layer (Brucker & Sarkar 2007;Kaminski & Smyth 2019).
Nearly all DNS of stratified layers at high Reynolds numbers use a hyperbolic tangent profile for both velocity and density in order to represent a shear layer that develops between two layers having different but constant density. In the oceans and atmosphere, it is typical that the stratification extends beyond the region of shear such that a spatially extensive stratification is a more appropriate representation of the density gradient than the spatially compact stratification of the two-layer profile. We are therefore motivated to simulate the evolution of a shear layer in fluid with space-filling uniform stratification.
There are intrinsic differences between these two configurations. First, the profiles of shear and stratification evolve similarly during the evolution of shear instabilities in the two-layer configuration since both have similar initial hyperbolic tangent profiles. However, stratification can evolve differently from shear in a uniformly stratified fluid. For the same value of Ri at the centre of the shear layer, the density difference across the layer is larger in the case with uniform stratification. In other words, the average value of the stratification is larger, leading to differences in the evolution of turbulence. Furthermore, ambient stratification, when sufficiently strong, can support propagating internal waves that transport momentum and energy into the far field. For example, in the problem of a moderately stratified shear layer adjacent to a pycnocline where the pycnocline N was varied in a DNS study, Pham, Sarkar & Brucker (2009) show that far-field internal waves are supported for a sufficiently large value of pycnocline N. At low Re = 1280 the internal wave flux is shown to be as large as 17 % of the turbulent production generated in the shear layer. The wave flux reduces as Re increases to 5000 (Pham & Sarkar 2010). Recent DNS of shear layers with uniform stratification in Watanabe et al. (2018) at Re = 6000 reveal interesting turbulent dynamics at the turbulent/non-turbulent interface (TNTI). The authors also found that at low Ri, although internal waves do not propagate far from the shear layer, the wave flux measured at the TNTI can be comparable to the dissipation generated inside the shear layer. The DNS of Fritts et al. (2014) at a Reynolds number up to 10 000 indicate the development of secondary instabilities during the transition to turbulence, some of which are similar to those observed in the DNS with a two-layer density profile in . However, it is unclear whether the secondary instabilities in Fritts et al. (2014) would enhance mixing efficiency. The effect of ambient stratification (external to the sheared zone) on the mixing rate is yet to be studied at high Reynolds number.
The present study seeks to investigate stratified turbulence and mixing at high Reynolds number in a shear layer with spatially compact shear using a density profile of space-filling, constant stratification. In addition to studying the route to turbulence and diagnostics of mixing, the present results are compared with the mixing parameterizations proposed by Shih et al. (2005), Salehipour et al. (2016), Mashayek et al. (2017b) and Osborn (1980). We address the following questions. (1) How does the evolution of the turbulent shear layer with uniform stratification differ from the two-layer configuration at high Re?. (2) Does the effect of uniform stratification enhance or reduce the mixing efficiency with respect to that which has been observed in prior studies of homogeneous stratified shear turbulence or mixing layer simulations with a two-layer density profile?
Besides the mixing efficiency in stratified shear flows, the present study also focuses on the dynamics of the transition layer (TL) which develops at the edges of the shear layer during transition to fully three-dimensional turbulence. Previous simulations have shown TL with enhanced shear and stratification in the shear layer with uniform stratification (Pham et al. 2009;Watanabe et al. 2018). Simulations of the two-layer configuration also show formation of a TL although the enhancement of shear and stratification is significantly weaker . How the uniform stratification of the ambient can impact the development of the TL as well as the turbulence and mixing therein remains to be answered. Furthermore, the implications of the TL to mixing parameterizations requires thorough investigation. This work is organized as follows. In § 2 the initialization and numerical formulation of the stratified shear layer is introduced. Section 3 provides a discussion of the evolution of the shear layer with specific emphasis on instabilities and subsequent turbulence. The structure of the TL, its turbulence and the wave flux across it are examined in § 4. A discussion of the mixing efficiency and its parameterization follows in § 5. The findings are discussed in § 6 and conclusions are drawn.

Stratified shear layer
The problem of a temporally evolving stratified shear layer with uniform stratification is considered. A sketch of the shear layer with relevant initialization parameters is shown in figure 1. The flow is constructed with a streamwise velocity field given by where U * denotes the velocity difference across the shear layer and δ * ω,0 = U * /(d u * /dz * ) max is the initial vorticity thickness. Note that a superscript * denotes a dimensional quantity while the · operator indicates horizontal averaging.
Motivated by atmospheric and ocean observations in which stratification extends beyond regions of shear (Fritts 1982;Smyth, Moum & Caldwell 2001), this work utilizes a spatially extensive stratification. The density profile is initialized by a time-invariant uniformly stratified background density profile (ρ * b ). The background buoyancy frequency of the ambient fluid (N * 0 2 ) has a constant value given by N *  Table 1. Parameters used to construct the computational domain: domain length (L x , L y , L z ), region with uniform grid spacing in the vertical direction (L z,sl ), number of grid points (N x , N y , N z ), grid spacing in the shear layer (Δ), grid spacing normalized by the smallest Kolmogorov length scale and peak wavenumber k 0 of the energy spectrum used to generate the initial velocity perturbations. The thickness of the upper TL (δ TL ) at late time is given in the last column.
The governing equations for this problem are the three-dimensional Navier-Stokes equations for an unsteady, incompressible flow with the Boussinesq approximation. A Cartesian frame of reference is used to represent the streamwise, spanwise and vertical coordinates such that spatial orientation and velocities are given by x i = (x, y, z) and u i = (u, v, w), respectively. The density equation is solved for the non-dimensional density deviation (ρ) from the uniform background. Similarly, the pressure (p), which denotes the deviation from the pressure which is in hydrostatic balance with the background density (ρ * b ), is solved for. The governing equations are non-dimensionalized using the following reference quantities: velocity difference ( U * ), initial vorticity thickness (δ * ω,0 ) and a reference value for density deviation ( ρ * = δ * ω,0 dρ * b /dz * ). The resulting non-dimensional equations are given by Non-dimensional parameters, namely the initial Reynolds number (Re), initial gradient Richardson number at the centre of the shear layer (Ri) and Prandtl number (Pr) are as follows: Here, ν * and κ * are the kinematic viscosity and thermal diffusivity, respectively. Chongsiripinyo & Sarkar 2018). The Williamson low-storage, third-order Runge-Kutta method is employed for time advancement while the discretization of spatial derivatives is achieved using a second-order, central finite difference scheme. The Poisson equation for pressure is solved using a parallel multigrid solver. The streamwise and spanwise directions have periodic boundary conditions. In the vertical direction, vertical velocity has a homogeneous Dirichlet boundary condition while homogeneous Neumann boundary conditions are enforced for horizontal velocity components, density deviation and pressure. In a sponge region near the vertical boundaries in the regions z > 10 and z < 10 (sufficiently far from the shear layer), a Rayleigh damping function gradually relaxes the density and velocities to their corresponding boundary values in order to damp propagating fluctuations and prevent reflections of features such as internal waves which have propagated far from the shear layer.
For this work, an isotropic grid is used in the central region of the shear layer with a grid spacing of x = y = z = Δ as indicated in table 1. The streamwise and spanwise grids have uniform spacing while mild stretching is used in the vertical outside the shear layer. Throughout the entire grid, the grid spacing is less than 2.2η, where η = (ν 3 /ε) 1/4 (ε denotes turbulent kinetic energy dissipation rate) is the Kolmogorov length scale, thus indicating appropriate resolution for capturing small-scale fluctuations. The number of grid points in the streamwise, spanwise and vertical directions are given by (N x , N y , N z ) and the domain extent is given by (L x , L y , L z ) as denoted in table 1. The computational domain is large enough to accommodate two wavelengths of the most unstable Kelvin-Helmholtz (K-H) mode in the streamwise direction and one wavelength in the spanwise direction. The sufficiently large spanwise domain is required in order to accurately capture the transition from two-dimensional K-H shear instability to three-dimensional turbulence and also to ensure good convergence of statistics. The spanwise domain size is about two times and the number of grid points is about four times larger than in the cases of Mashayek, Caulfield & Peltier (2013) with the equivalent Re. Due to the different characteristic length and velocity scales used in the non-dimensionalization of the governing equations, the Reynolds number of 24 000 in the present study is equivalent to Re = 6000 in . We choose this value of Re since their results suggest the influence of secondary instabilities on the rate of mixing is most profound when Re is at or larger than this value.
The flow is initialized using velocity perturbations for which the broadband spectrum is given by E(k) ∝ k 4 exp[−2(k/k 0 ) 2 ], where the peak value of E(k) is found at k 0 (values given in table 1) corresponding to the fastest growing mode of the K-H instability. The root mean square (r.m.s.) of each velocity component has a peak of 0.1 % U at the centre of the shear layer. The following shape function, F(z) = exp[−(z/δ ω,0 ) 2 ], is used to confine the fluctuations to inside the shear layer.

Linear stability analysis of K-H shear instability
In order to construct the computational grid, linear stability analysis (LSA) is performed to search for the fastest growing normal mode of the K-H shear instability. Stability theory of a stratified shear layer with a two-layer hyperbolic density profile suggests a condition of Ri < 0.25 for the shear instability to develop. The growth rate (σ ) of the fastest growing mode (FGM) is significantly reduced as Ri increases toward 0.25 in an analysis (Hazel 1972) which assumes that the fluid is inviscid and that the shear layer is located inside an infinite domain.
Here, the LSA is performed, taking into account the effect of viscosity, diffusivity and a finite domain, to examine the FGM when the shear layer is uniformly stratified. The theory and numerical implementation of the LSA is given in Smyth, Moum & Nash (2011). In the LSA, the Reynolds number, Prandtl number and domain size (L z ) have the same values as in the DNS. The grid spacing is z = 0.025. Free-slip and fixed buoyancy conditions are used at the top and bottom boundaries. The LSA of the two-layer profile is also performed for comparison. Figure 2(a,b) contrasts the mean profiles of the squared buoyancy frequency (N 2 ), squared rate of shear (S 2 ) and gradient Richardson number (Ri g ) in the two-layer shear layer to those in the linearly stratified shear layer. The same level of stratification at the centre of the shear layer, N 2 = 0.12, is used. Away from the centre, N 2 decreases to zero in the two-layer case while it remains constant in the other case. As a result, Ri g in the two-layer case is smaller than that in the case with linear stratification throughout the shear layer except at the centre where Ri g = 0.12 in both cases. With the same amount of mean kinetic energy, e.g. the same velocity profile, the potential energy barrier inside the shear layer is significantly higher in the case with linear stratification, thereby reducing the growth rate of the K-H shear instability.
Results of the LSA are shown in figure 2(c,d). In the two-layer case, the growth rate (σ ) is similar to that of Hazel (1972) in the region with low k and Ri. However, as k and Ri increase, the growth rate becomes slightly smaller than the value from Hazel (1972) due to the effect of viscosity, diffusivity and a finite domain. The location of FGMs (marked by a white line) in kδ ω,0 − Ri space indicates a significant difference between the two configurations. While the wavenumber of the K-H modes varies little with Ri in the two-layer case, the K-H modes show a larger value of k and thus a shorter wavelength as Ri increases in the case with uniform stratification. Figure 2(e, f ) contrasts the growth rate between the two cases for the five values of Ri used in the DNS to be discussed. At the same Ri, the growth rate of the K-H mode is higher in the two-layer case which suggests the K-H shear instability in the case with uniform stratification is weaker.

Statistical analysis
The velocity, density and pressure fields are decomposed using Reynolds decomposition into mean (horizontal average) and fluctuating components denoted by and , respectively. In later discussions, the turbulent kinetic energy (TKE) budget will be used to examine the routes to turbulence and is given by with TKE (K), production (P), dissipation (ε), buoyancy flux (B) and the transport term (T 3 ) specified as  (Hazel 1972) and Ri = k 2 (1/4 − k 2 /16) (Drazin 1958), respectively. Solid white lines in (c,d) indicate the location of the FGMs of the K-H instability.
In order to calculate mixing efficiency, numerous studies have used a method which quantifies available and background potential energy by sorting the density field (Winters et al. 1995). This method produces a bulk value of the mixing efficiency which is representative of the entire shear layer. Instead, we use the dissipation of the density field as a surrogate to the change in background potential energy such that the mixing efficiency (E) is given by where ε ρ is the dissipation rate of turbulent available potential energy (TAPE = g 2 ρ 2 /2ρ 2 0 N 2 0 ). Scotti & White (2014) have demonstrated that this method of computing the mixing efficiency produces accurate results for turbulence in a continuously stratified fluid. Furthermore, this method is able to provide the spatial variability of the mixing efficiency throughout the shear layer as opposed to a single bulk value.

Flow evolution
3.1. Routes to turbulence: K-H shear instability and secondary instabilities In a shear layer with inflectional shear it is understood that there is a strong primary instability in the form of a K-H shear instability. This instability manifests as a series of vortices which roll up over time (termed billows) and are connected by vorticity filaments (termed braids). As the K-H billows evolve, secondary instabilities develop throughout the shear layer (Mashayek & Peltier 2012a,b;Thorpe 2012;Arratia, Caulfield & Chomaz 2013;. In the following discussion we use the visualization of density and vorticity fields to illustrate that, as in the shear layer between two layers of constant density, the continuously stratified shear layer exhibits rich dynamics of secondary instabilities. In the present study we do not perform LSA for each type of instability since they are already discussed in depth in previous studies. Instead, we focus on the effect of the continuous stratification and the resulting mixing driven by the secondary instabilities. Our identification of secondary instabilities is based on visual inspection and comparison with previous work in the two-layer problem. It is also noted that pairing of K-H billows is not found at the high Re of the present study. At sufficiently high Re, secondary instabilities break down the billows before their co-rotation and pairing as was found by Pham & Sarkar (2010) when Re was increased to 5000 relative to their earlier work at Re = 1280.
The Ri = 0.04 case is used to outline the general development of the stratified shear layer, and differences between the various Ri cases are discussed thereafter. Figure 3 shows cross-sectional snapshots of the density (ρ) and spanwise vorticity (ω 2 = ∂u/∂z − ∂w/∂x). Specific snapshots in time are shown where the time (t) represents the non-dimensional time, S * 0 t * , where S * 0 is the initial centreline shear. The primary shear instability is evident in figure 3(a) in the form of two K-H billows. As the billows grow vertically, they extract kinetic energy from the shear layer. Once they reach the maximum vertical extent, the K-H billows spread in the streamwise direction and deform into elliptical shape similar to the observation by Arratia (2011). A spanwise variability also develops early to further disrupt the flow as indicated in the spanwise snapshots of ρ and ω in figure 3(a). The billow continues to develop horizontal motions and r.m.s. spanwise velocity fluctuations (not shown) increase in this case. Note that there is no broadband turbulence as yet in the development. As small-scale fluctuations are allowed to develop due to low viscosity (high Re), the billows begin to break down by secondary instabilities. A counter-rotating vortex pair denoting secondary convective instability (SCI) is seen at (x, z) ≈ (7, 0) in the streamwise snapshot of ω 2 in figure 3(b). Multiple occurrences of SCI develop in the eyelids of the two billows. The spanwise snapshot of ω 2 suggests the SCI is highly three dimensional unlike the two-dimensional K-H billows with little spanwise variability seen at the earlier time. Strong vortices which develop in the centre of the billow cores at (x, z) ≈ (3, 0) and (8,0) also contribute towards transition to turbulence. In the DNS with uniform stratification at Re = 10 000 and Ri = 0.05, Fritts et al. (2014) observed that secondary instabilities arise initially in the eyelid and the resulting turbulence spreads inward into the core. Here, we observe growth of SCI in the eyelids and strong vortices in the billow cores at the same time. Eventually, the breaking billows become localized  . Cross-sectional snapshots of the density (ρ) and spanwise vorticity (ω 2 ) fields for the Ri = 0.04 case at (a) t = 59, (b) t = 83 and (c) t = 86. From left to right, the columns show a streamwise cross-section of ρ at y = L y /2, spanwise cross-section of ρ along the dotted line shown in the first column, streamwise cross-section of ω 2 at y = L y /2 and a spanwise cross-section of ω 2 along the dotted line shown in the third column. In this plot and henceforth, x, y and z are noted to be the streamwise, spanwise and vertical coordinates, respectively, made non-dimensional using δ ω,0 . patches of turbulence with the majority of overturning occurring at the core of each billow as shown in figure 3(c).
The flow evolves differently at moderate stratification (0.08 Ri 0.16) with the K-H billows exhibiting smaller vertical growth followed by faster spreading in the streamwise direction and deformation into elliptical billows as illustrated in figure 4. Unlike the Ri = 0.04 case in which the SCI initiates the transition to three-dimensional turbulence, the streamwise snapshots of ρ and ω 2 in figure 4(a) indicates the secondary shear instability (SSI) also contributes to the transition. The negative-ω 2 vortices which grow along the braids in the regions 1 < x < 2 and 9.5 < x < 10.5 are indicative of SSI. As Ri increases, SSI overtakes SCI as the dominant mechanism that breaks down the K-H billows when contrasting figures 4(a) through 4(c). In Ri = 0.08-0.16 cases, vortices that grow at the top of the left billow are suggestive of secondary core deformation instability (SCDI). Mashayek & Peltier (2012b) found SCDI is associated with an observable inflation of the core. They also suggested that the growth rate of SCDI decreases with increasing Ri. The vortices that break down the left billow in figure 4(a-c) are also less energetic as Ri increases.
Beside SCI, SSI and SCDI, we also observe the development of localized core vortex instability (LCVI). When the core vortex bands grow and achieve a sufficiently large magnitude of ω 2 , LCVIs form at the tips of the vorticity bands. The LCVI vortices and the bands from which they develop have spanwise vorticity with sign opposite to the background shear-layer vorticity (Mashayek & Peltier 2012b). The appearance of LCVI is particularly apparent at high Re because the K-H billows roll-up faster and there is less time for the vorticity bands in the core to diffuse. The streamwise snapshot of ω 2 shows  . Cross-sectional snapshots of the density (ρ) and spanwise vorticity (ω 2 ) fields for the four cases with Ri 0.08. The column order is similar to figure 3: streamwise and spanwise cross-sections of ρ followed by streamwise and spanwise cross-sections of ω 2 . Note that the panels have differing aspect ratios and cover different ranges of x and z coordinates.
positive (red) vortices at (x, z) ≈ (6, 0) and (7, 0) in figure 4(a). The two vortices develop from the tips of the negative (green) vortex bands in the eyelid indicating the growth of a LCVI.
At the high stratification of the Ri = 0.2 case, the transition to turbulence is induced by SCDI and SCI. The multiple coherent vortices which develop in the lower half of the billow at (x, z) ≈ (6, −0.5) in the streamwise snapshots of figure 4(d) are indicative of SCDI. The vertical strips of positive density at y = 2 and negative density at y = 4.2 in the spanwise snaphot are indicative of SCI. The strips resemble mushroom-like structures, a typical morphology of convective instabilities. We note the use of a large aspect ratio in figure 4(d) which distorts the mushroom-like structures into the vertical strips. While the mushroom-like structures in the Ri = 0.04 case develop on the thin eyelid region of the K-H billows, the structures in this case with Ri = 0.2 are significantly larger and they penetrate across the entire vertical extent of the billow. The DNS of Fritts et al. (2014) at the same Ri but with Re = 10 000 shows the prevalence of secondary instabilities in the billow cores. The absence of SCDI in their study is possibly due to a low Reynolds number effect. There are multiple modes that have similar vorticity manifestation such as the stagnation point instability (SPI) and secondary vortex band instability (SVBI). In fact,  found that SVBI is a combination of both SPI and LCVI. Therefore, it is difficult to differentiate LCVI from SVBI without performing the stability analysis. Furthermore, the growth of secondary instabilities is highly sensitive to the initial broadband velocity fluctuations (Dong et al. 2019). We have simulated the Ri = 0.16 case with two different choices for the initial velocity perturbations: one where the energy spectrum peaks at k 0 = 1.13 and another with a peak at k 0 = 1.24. The horizontal domain length is chosen to accommodate two wavelengths of the most energetic mode in the initial velocity perturbations. While the evolution of the shear layer is statistically similar between the two cases, we find SPI to develop in only the former simulation with k 0 = 1.13 and not the latter. The SPI manifests as a single vortex which develops at the stagnation point in the braid. Overall, with regard to the secondary instabilities, the transition to turbulence in the shear layer with uniform stratification is as dynamically rich as that observed in the case with a two-layer density profile. Readers should be aware that there are other secondary instabilities that have not been observed in either  or the present study such as knot and tube instabilities (Thorpe 2012).

Effect of stratification on the growth of shear-layer thickness
The visualization of the evolving shear layer suggests that the thickness of the shear layer varies significantly with the stratification. Here, we quantify the thickness of the shear layer through two quantities: (1) the momentum thickness (I u ) and (2) the layer bounded by the outer edges of the TLs denoted by L sl . The first quantity provides an integral length scale which is used to compute non-dimensional numbers such as the bulk Richardson number, an important parameter typically used in the parameterization of shear-driven turbulence (Smyth & Moum 2000;Mashayek, Caulfield & Peltier 2017a). In contrast, the second quantity includes the mixing region at the edges of the shear layer. In the present study we identify the TL as a region with enhanced stratification (i.e. N 2 /N 2 0 > 1) formed at the edge of the shear layer due to vertical turbulent transport from the core of the shear layer to its edge. Figure 5 indicates a significant difference between the two quantities with L sl up to 60 % larger than I u in the Ri = 0.16 case at late time. In this section we focus the discussion on I u and defer consideration of L sl to the next section.
The momentum thickness, which is defined as has the imprint of distinct flow regimes. At early time (approximately 30 < t < 60 in the Ri = 0.04 and Ri = 0.08 cases, and 50 < t < 90 in the Ri = 0.12 case), the shear layer thickens rapidly due to the enlargement of the K-H billows. In all cases, except for the Ri 0.16 cases, there is a period where the shear layer briefly shrinks or stops growing before resuming its growth (approximately 60 < t < 80 in the Ri = 0.04 and Ri = 0.08 cases, and 100 < t < 110 in the Ri = 0.12 case). At Ri = 0.12, the layer does not contract significantly as in the Ri = 0.04 and Ri = 0.08 cases but its growth pauses. The contraction of the shear layer persists longer at smaller Ri and is not seen at all in the Ri 0.16 cases indicating a key link of stratification to this flow feature. The period of contraction occurs during the transition from flow dominated by two-dimensional K-H rollers to fully three-dimensional turbulence. After its contraction or pause, I u resumes growing before it asymptotes to a constant value at late time. At this time, buoyancy effects become sufficiently strong to cause turbulence decay and the shear layer can no longer thicken. Overall, the rate of growth decreases with increasing Ri. The ultimate thickness of the shear layer is also much smaller at higher Ri, e.g. barely 1.3 times its initial value in the Ri = 0.2 case. The contraction of the shear layer is linked to the energetics of turbulence. The momentum thickness, defined by (3.1), involves the mean kinetic energy (MKE) defined asK = ( u 2 )/2. Therefore, the change in momentum thickness is given by From the Navier-Stokes equation, the evolution of MKE is governed by with viscous dissipation (ε) and the transport term (T 3 ) specified as The turbulent production (P), which is defined previously in (2.4), acts as a transfer term between the MKE and TKE budgets. Equations (3.2) and (3.3) are combined to yield where the small contribution of the viscous dissipation of MKE inside the shear layer as well as the small transport term at z = ±10 have been neglected. During the contraction of the shear layer when dI u /dt < 0, MKE increases due to negative production as per (3.5). Figure 6 illustrates the relationship between the MKE and P during the contraction period. As the K-H billows develop, they transport heavy fluid upward and light fluid downward which stirs the density gradient. Consequently, density increases in the upper half of the shear layer while it decreases in the lower half, as shown in figure 6(a). As the shear layer contracts, the denser fluid in the upper half of the shear layer is displaced downward, releasing the available potential energy that was previously gained. During this period, the MKE shown in figure 6(b) increases notably at the edges of the shear layer (z = ±1). At the time when the shear layer begins to contract marked by the first vertical dotted white line in figure 6(c), the momentum flux (− u w ) changes sign from negative to positive values signifying a counter-gradient momentum transport (CGMT). The CGMT occurs when the momentum flux does not follow the mean velocity gradient (Hussain 1986;Moser & Rogers 1993). Prior to the contraction, both the momentum flux and the mean velocity gradient have negative values, so the transport is down-gradient and the shear production is positive. In contrast, while the mean velocity gradient remains negative during the contraction, the momentum flux is counter-gradient and, thus, the negative production. After the contraction, the momentum transport reverts back to down-gradient and the production has positive values. Gerz & Schumann (1996) suggested that the energy of CGMT motions is provided by conversion of available potential energy to kinetic energy in homogenous stratified shear flows. Takamure et al. (2018) also found CGMT and negative production to occur in coherent vortices which develop during the transition from laminar to turbulent regimes in an unstratified mixing layer. It should be noted that contraction and CGMT were not observed in the DNS of a uniformly stratified shear layer at Re = 5000 and Ri = 0.05 (Pham & Sarkar 2010). It is interesting that the higher Re = 24 000 in the present study would enhance the CGMT.  Komori & Nagata (1996) showed that CGMT in stratified flows is often accompanied by counter-gradient buoyancy flux (CGBF). We also observe CGBF during the contraction period as shown in figure 6(d). Prior to the contraction, the buoyancy flux (B) is negative throughout the shear layer with a peak value locating at the centre of the shear layer. Once the shear layer contracts, the buoyancy flux switches sign from negative to positive values. The positive B concentrates near the edges of the shear layer with the centre of the layer having smaller positive values. The positive B supports the growth of SCI during the transition to turbulence. To better understand the roles of CGBF on the transition, figure 7 compares streamwise snapshots of the density and buoyancy flux before and during the contraction. As previously discussed, the primary K-H instability grows vertically until the potential energy barrier becomes too large, at which point, the billow contracts vertically and expands laterally in the streamwise direction. The deformation of the billows occurs coherently without inciting broadband turbulence. The change in vertical extent is clear between figures 7(a) and 7(d). Before the contraction at time t = 66, the instantaneous field of the buoyancy flux in figure 7(b) shows regions of both positive and negative values due to the stirring by the K-H billows. When averaged over the horizontal (x-y) plane, the net buoyancy flux (B) has negative values as shown in figure 7(c). As the K-H billows deform during the contraction at time t = 78, the spatial distribution of the buoyancy flux changes significantly. The regions with positive values in the billows extend wider than that with negative values. As a result, the horizontal averaged values become positive as shown in figure 7( f ). A wider area of positive buoyancy flux implies a higher tendency for SCI to grow. For example, the patch of positive buoyancy flux at (x, z) ≈ (8, 0.5) in figure 7(e) induces the growth of the counter-rotating vortex pair at (x, z) ≈ (7, 0) shown previously in figure 3(b). It is important to note that strong stratification inhibits CGMT and CGBF since the contraction does not occur in the Ri 0.12 cases. This is consistent with the finding of  who suggested the role of SCI becomes less important than SSI during the transition to turbulence at large Ri.

Effect of stratification on TKE budget and mixing efficiency
The evolution of depth-integrated terms in the TKE budget (figure 8) provides a comparison of turbulence energetics among the cases. The time for development of turbulence is substantially larger for the larger Ri, which is qualitatively consistent with the decrease of maximal growth rate with increasing Ri, shown previously in figure 2. The turbulent production P exhibits multiple peaks as time progresses, e.g. two distinct peaks in the cases with Ri 0.12. When the stratification is weak as in the Ri = 0.04 and Ri = 0.08 cases, the integrated production (figure 8a) has significant negative values and the buoyancy flux (figure 8b) has significant positive values due to the CGMT and CGBF during the time interval of shear-layer contraction discussed in the previous section. It is after the contraction period that the shear layer becomes fully turbulent and the integrated ε in figure 8(c) increases sharply. The largest peak value of integrated ε occurs in the Ri = 0.04 case, while the peak values are comparable in the three cases with 0.08 Ri 0.16. The peak value decreases significantly in the Ri = 0.2 case. Unlike the dissipation rate, the dissipation rate of the potential energy (ε ρ ) in figure 8(d) develops earlier and during the contraction period. Noting that ε is insignificant during the contraction period, the flux coefficient has a high value (up to 0.7) during this period due to fine-scale coherent density structures inside the K-H billows. As previously shown, during the deformation of the K-H billows, density filaments/wisps inside the billows become significantly thinner. The filaments sharpen the density gradient in the shear layer down to the diffusive scale where it is dissipated by molecular diffusion. Interestingly, turbulence does not have a role in the mixing during this period despite the high Reynolds number. Furthermore, the peak values of ε ρ are comparable among the Ri 0.16 cases unlike ε.
The cumulative (time-and space-integrated) TKE budget terms (P, B, and ε) and the dissipation rate (ε ρ ) of the TAPE are shown in table 2. There is a large decrease of cumulative P, by approximately a factor of eight, with increasing Ri. The other quantities also decrease, but not proportionally, for instance, P/ increases from the weakly stratified value of 1.5 to about 1.7 in the more stratified cases indicating that stratified shear-driven turbulence (under conditions which support its formation) is somewhat more dissipative than its neutral counterpart. The ratio of −B/ , known as the flux Richardson number (Ri f ), peaks in the Ri = 0.12 case similar to the flux coefficient (Γ C ). There is significant difference between reversible and irreversible mixing since Ri f is substantially larger than Γ C . The difference, which is approximately 27 % in the Ri = 0.04 case, decreases with increasing Ri.
We move to the mixing efficiency and examine its variation among the simulated cases. Figure 9(a) shows the cumulative mixing efficiency (E C ), obtained by integrating ε ρ and ε over the time duration of the simulations. The maximum value of E C is approximately equal to 0.33 and occurs in the Ri = 0.12 case. The Ri = 0.08, 0.16 and 0.2 cases have slightly smaller values. The smallest value of 0.23 occurs in the case with weakest stratification Ri = 0.04. Relative to the results of  (denoted in figure 9 as MCP13) for the two-layer case, the peak value of E C in the present study is smaller. A key result of  is that there exists a narrow range of Ri for which the mixing efficiency is optimal. They report optimal mixing to occur at Ri = 0.16. Unlike their study, we do not find a narrow range of optimal efficiency. The mixing efficiency is similar for a wide range of Ri between 0.08 to 0.2. Furthermore, the   values of E C are significantly larger in the two-layer problem, by almost 50 % at Ri = 0.16.
There are a few reasons for the smaller mixing efficiency in the present study. First, the K-H billows in the present study with stratification external to the sheared zone do not grow as large as with the two-layer density profile. Second, the use of the larger spanwise domain here allows secondary spanwise instabilities to break down the K-H billows so that the billows cannot grow as large. It is noted that the larger spanwise domain allows the seeding of broadband velocity fluctuations at the initial time to include longer-wavelength spanwise perturbations which also influence the breakdown (e.g. see  Osborn (1980). the spanwise snapshots in figure 3a). The study of Kaminski & Smyth (2019) indicates that strong turbulence in the shear layer at initial time can reduce the growth of K-H billows. Smaller K-H billows result in less available potential energy which is important for the subsequent turbulent mixing.
The mixing efficiency in the stage of three-dimensional turbulence (E C 3d ) is found by starting the integration from the time of peak integrated dissipation rate in figure 8(c). The maximum value of E C 3d of 0.29 occurs in the Ri = 0.12 and 0.16 cases and is slightly smaller than the value seen in the two-layer problem. In the simulations with Ri ≥ 0.08, when integrated across the shear layer and over time, the net dissipation rate decreases faster as Ri increases than the net scalar dissipation rate. The low mixing efficiency seen in the Ri = 0.04 case is due to the uniquely high TKE dissipation rate. It is noted that the cumulative mixing efficiency (E C ) and that due to fully developed turbulence (E C 3d ) are not dramatically different as reported in . In other words, the mixing efficiency induced by the rich dynamics of the secondary shear instabilities during the transition to turbulence is only as significant as the subsequent fully developed turbulence. Nonetheless, both measures of the mixing efficiency are significantly larger than the upper-bound value of 1/6 proposed by Osborn (1980). While the mixing efficiency E C is similar between the Ri = 0.08 and 0.2 cases, the integrated dissipation and scalar dissipation rates listed in table 2 are approximately seven times smaller in the case with stronger stratification. A large value of E C does not imply large net mixing by K-H billows or turbulence.

The transition layer
As illustrated in the previous section, shear instabilities and the resulting turbulence transport a significant amount of momentum and energy toward the edges of the shear layer. These turbulent fluxes induce the formation of a TL in which the local stratification N 2 (z) and shear S(z) peak. Figure 9 suggests turbulent mixing in the TL can influence the overall mixing efficiency across the entire shear layer. When the dissipation and scalar dissipation rates are integrated over the shear-layer momentum thickness, i.e. ±I u /2, both E C and E C 3d have smaller values than when they are integrated over the shear-layer thickness defined by the TL (L sl ). Including the TL physics into the computation of mixing efficiency can increase E C 3d by 17 % in the Ri = 0.04 case. Furthermore, internal waves are generated inside the TL and it is unclear how the wave excitation affects the mixing efficiency (Watanabe et al. 2018). Therefore, it is important to understand how turbulence and wave physics in the TL influence the mixing efficiency and its parameterization.
While the mixing efficiency is significant (E C = 0.28) in the Ri = 0.2 case, the net dissipation and scalar dissipation rate are considerably smaller than in the other cases as listed in table 2. Because of the lack of vigorous mixing in the Ri = 0.2 case and, for brevity, we exclude this case from the following discussion of the TL.

Development of the TL
Each of the two edges of the shear layer has a TL. For the purposes of this work, the boundaries of a TL are defined using the normalized squared buoyancy frequency, N 2 /N 2 0 , whose evolution is shown in figure 10. The inner and outer boundaries of the TL are demarcated by N 2 /N 2 0 = 1 such that the interior of the TL has N 2 /N 2 0 > 1. Since the layer of enhanced N 2 first develops near the centre of the shear layer during the early stage of growing K-H instability, we use I u /2 to mark the inner boundary of the TL. Note that the location of maximum N 2 /N 2 0 (magenta dashed line in figure 10) varies; it is closer to the inner boundary of the TL at early time and located more centrally between the two boundaries at late time. There is a sharp increase of N 2 /N 2 0 in the lower half of the TL before and during the growth stagnation/contraction regime for the Ri ≤ 0.12 cases. In all cases, as the flow transitions from being dominated by two-dimensional instabilities to fully three-dimensional turbulence, there is a time period when the peak N 2 /N 2 0 becomes smaller than before (approximately 80 < t < 110 in the Ri = 0.04 and 0.08 cases, 110 < t < 150 in the Ri = 0.12 case and 140 < t < 170 in the Ri = 0.16 case). After turbulence decays at late time, N 2 /N 2 0 increases and concentrates at the centre of the TL. The overall value of N 2 /N 2 0 decreases with strengthening background stratification (N 2 0 ) among cases due to the decreased turbulent mixing of momentum in the central shear layer when N 2 0 is increased. At late time, the flow has arranged itself into layers with varying N 2 /N 2 0 . Take the Ri = 0.16 case of figure 10 in which this is most evident. As seen in the vertical profile panel to the right of figure 10(d), at late time (t ≈ 250), there is a region at the centre of the shear layer where N 2 /N 2 0 ≈ 1 above which is a layer with N 2 /N 2 0 < 1. Moving outwards from this layer, there is a region of moderate N 2 /N 2 0 before reaching the maximum value in the centre of the TL of N 2 /N 2 0 ≈ 1.6. At the outer edge of the TL, N 2 /N 2 0 is reduced and values of O(1) are seen outside of the TL. In general, as background stratification increases, the TL becomes thinner as listed in table 1. The thickness is approximately half of the difference between the momentum thickness I u and the shear-layer thickness L sl shown previously in figure 5.
The evolution of the normalized squared rate of shear (S 2 /S 2 0 , where S = ∂ u /∂z) for all simulated cases is given in figure 11 where the lines bounding the TL are also shown. The TL develops as shear is reduced inside the shear layer by the influence of K-H instabilities extracting kinetic energy from the mean flow at early time. However, at the peripheries of the shear layer, shear becomes elevated as turbulence induces momentum transport away from the centre of the shear layer outwards. As such, the strongest S 2 /S 2 0 at late 916 A42-19 time is located in the TL, close to its inner boundary, with TL shear intensity among cases increasing with strengthening N 0 . At early time in all cases, a region of strong shear directly corresponds to the region of large N 2 /N 2 0 in the TL. The previously discussed reduction in N 2 /N 2 0 coincides with a brief reduction in shear in the Ri = 0.04 and Ri = 0.08 cases. In the more strongly stratified cases there is less significant reduction in shear. At late time in the highly stratified cases, S 2 has a layered structure similar to N 2 . The panel to the right of figure 11(d) shows the centre of the shear layer to have a region of moderate shear bounded by a layer of weaker shear. Farther from the centre, S 2 /S 2 0 increases to a peak value of approximately 0.23 before becoming negligible outside the TL. Figure 12 shows the gradient Richardson number (Ri g = N 2 /S 2 ) which is a measure of the balance between buoyancy and shear. In all cases, the inner portion of the TL has lower Ri g than the outer half indicating that the inner portion is more influenced by effects of shear. As the flow evolves, turbulence mixes the density and momentum fields and Ri g increases exceeding the critical value of 0.25 from linear stability theory (Hazel 1972). In all cases, this behaviour is observed within the TL with Ri g beginning small and eventually becoming much larger than Ri c . At late time, the interior of the shear layer is dominated by Ri g > 0.5 in all cases except for the Ri = 0.04 case in which Ri g takes values between Ri c = 0.25 and 0.5. In the Ri = 0.12 case intermittent layers of Ri g > 0.75 and Ri g > 1 are observed, in contrast to the two-layer simulations of  who noted that Ri g ≈ 0.5 across the entire shear layer at late time in their comparable simulation. The higher Ri g found here is further evidence of the difference in the distribution of S 2 and N 2 between the present case of uniformly stratified background and the case with two constant-density layers.
The development of small-scale fluctuations as the flow becomes fully turbulent leads to a rapid increase in the dissipation rate (ε in figure 13). The time of peak ε, seen when the K-H billows breakdown to three-dimensional turbulence, is delayed with increasing background stratification (N 2 0 ) as follows: t ≈ 100 in the Ri = 0.04 and Ri = 0.08 cases, t ≈ 126 in the Ri = 0.12 case and t ≈ 145 in the Ri = 0.16 case. Furthermore, as N 2 0 increases among cases, ε tends to be elevated in the TL at late time. Dissipation is strongest at the periphery near the inner boundary of the TL due to the evolving late-time instabilities.

Internal wave flux across TLs
Simulations by Watanabe et al. (2018) at Re = 6000 and Ri up to 0.08 suggest the internal wave flux, p w , at the TNTI to be strong. They report that the wave energy flux at the TNTI can be comparable to the dissipation in the shear layer. It is of interest to compare the role of p w at the shear-layer edge as the Reynolds number increases from 6000 to 24 000. Figure 14(a) illustrates the evolution of p w in the Ri = 0.08 case in which its magnitude is largest during the transition from two-dimensional K-H billows to three-dimensional turbulence. As the billows grow, they create perturbations in the pressure and velocity fields that extend beyond the boundaries of the shear layer (denoted by the dashed lines in figure 14a). The perturbations generate evanescent waves whose amplitude decays exponentially with the distance away from the shear layer. The wave flux is initially positive in the upper shear layer and negative in the lower shear layer, and as a result, TKE is transported away from inside the shear layer to the outside during the growth of the K-H billows. It is noted that, since the waves are evanescent, energy does not propagate into the far field. As the shear layer grows in size, the energy that was previously transported outside contributes to the turbulent mixing in the TL. The internal wave flux in the TL is significantly weaker when the shear layer is turbulent. The role of the wave flux is further examined by integrating the TKE budget from the outer edge of the lower TL to that of the upper TL. The result is shown in figure 14(b) where p w I = − p w up + p w low denotes the net wave energy that crosses the upper (up) and lower (low) TL interfaces. During the growth of the billows, waves transport energy outside the shear layer. As the shear layer becomes turbulent, the wave flux changes sign which deposits energy from the outside into the shear layer. During the period of turbulence, the peak inflow of the wave energy is approximately 33 % of the peak integrated dissipation rate, 22 % of the peak integrated production and 70 % of the peak integrated buoyancy flux. The wave flux seen in the present higher-Re study is smaller than the value reported by Watanabe et al. (2018).

Mixing efficiency and its parameterization
Since mixing efficiency (E) and flux coefficient (Γ ) have important applications in ocean measurements and modelling, there has been a sustained effort in parametrizing such problems. It has been shown that the variability of E can be described well by the buoyancy Reynolds number (Re b = ε/νN 2 ) or the turbulent Froude number (Fr = ε/NK) for homogenous stratified turbulence driven by uniform shear in an unbounded domain with ). The mixing efficiency E is found to also depend on Pr and Ri . Salehipour et al. (2016) proposed a mixing parameterization that is based on Re b and Ri while Mashayek et al. (2017b) suggested a parameterization that only relies on Re b . In the following discussion we examine the spatial variability of E in the present configuration of a localized shear layer in a uniformly stratified fluid and discuss the results in light of the parameterization schemes proposed in the aforementioned studies. Since Pr = 1 here, only the dependence on Re b , Fr and Ri are to be explored.
We find that the strong TL associated with the present configuration plays an important role through its mixing during the later-time period of decaying TKE. The significant turbulent activity in the TL is illustrated in figure 15 for the four cases with Ri 0.16. Small-scale shear instabilities can be clearly seen in the vorticity field at times when the integrated turbulent dissipation across the shear layer is larger than the integrated production. These later-time shear instabilities grow and persist in the TL at the upper and lower edges of the shear layer and they contribute significantly to the bulk mixing in the shear layer. Therefore, it is critical for parameterization schemes to capture their effect. Figure 16 shows the evolution of the mixing efficiency (E) given by (2.6a,b) with the boundaries of the TL also depicted. Overall, E is much higher throughout the shear layer as K-H billows are forming. As they break down into turbulence, strong E is seen at the outer boundary of the TL while the core of the shear layer becomes relatively quiet with low E. This large E occurs after the secondary peak in production in the low Ri cases and is associated with the concentration of TKE in lobed structures. The inner boundary of the TL has relatively low E. As stratification increases and buoyancy effects suppress vertical motions, E becomes smaller at early time as can be seen by comparing the Ri = 0.16 and Ri = 0.04 cases in figure 16. At late time, the majority of high-efficiency mixing occurs within or above the TL.
To parametrize mixing efficiency, it is necessary to relate a bulk mixing efficiency to the bulk values of Reynolds number, Froude number and Richardson number. The time-dependent bulk values are obtained by integration across the shear layer from the outer edge of the bottom TL to that of the top TL. This choice of integration domain (thickness denoted by L sl ) encompasses the spatial region of significant turbulent dissipation. We also focus the analysis on the temporal period with significant mixing, namely the regime of fully developed turbulence which commences after the time of the peak integrated ε, similar to Salehipour et al. (2016) and Mashayek et al. (2017b). In the discussion to follow, the bulk mixing efficiency (E 3d ), bulk buoyancy Reynolds number (Re b ), bulk turbulent Froude number (Fr) and bulk Richardson number (Ri b ) are computed as follows: Here ρ sl and U sl denote the density and velocity change across the spatial integration domain.
The dependence of mixing efficiency (E 3d ) on Re b is shown in figure 17(a). Shih et al. (2005) suggested three regimes of turbulent mixing: an energetic regime (Re b > 10 2 ), The flux coefficient (Γ 3d ) is also often used to quantify mixing. From mixing efficiency, the flux coefficient can be computed as Γ 3d = E 3d /(1 − E 3d ). Figure 17(b) shows the flux coefficient also varies with Re b in three distinctive regimes similar to the mixing efficiency. The peak value of Γ 3d is approximately 0.43 which is more than twice larger than the upper-bound value of 0.2 suggested by Osborn (1980). Furthermore, Γ 3d remains larger than 0.2 over the entire lifespan of turbulence in the cases with Ri 0.08. The flux coefficient Γ 3d peaks at a smaller value of 0.32 in the Ri = 0.04 case and decreases to below 0.2 in the diffusive regime.  for Re b > 10 2 in the Ri = 0.04 case. The solid black and red lines in panel (b) denote the upper and lower bounds, respectively, of the parameterization proposed by Mashayek et al. (2017b). Shih et al. (2005) indicated that the flux coefficient decreases as Γ 3d = 2Re −1/2 b in the energetic regime. We also find that Γ 3d ∝ Re −1/2 b in the Ri = 0.04 case although its value is substantially larger here leading to a proportionality coefficient of 4 as shown in figure 17(b). During the intermediate regime, the flux coefficient remains relatively constant with a value ranging from 0.33 in the Ri = 0.04 case to approximately 0.43 in the other three cases. These values are larger than that of 0.2 reported in Shih et al. (2005). While Shih et al. (2005) asserted that turbulent mixing in the diffusive regime is driven mainly by molecular diffusivity and independent of Re b , the turbulent mixing in the present study is significantly higher than the molecular counterpart and, indeed, varies with Re b when 7 ≤ Re b ≤ 40. Nonetheless, the mixing convention of Shih et al. (2005) is kept in the present study for ease of comparison.
The upper and lower bounds for Γ 3d suggested by Mashayek et al. (2017b) are also included in figure 17(b) for comparison. Inherently, the dependency of the flux coefficient on Re b in the present study agrees better with the suggested parameterization using homogenous stratified turbulence in Shih et al. (2005) than the one in Mashayek et al. (2017b). In the latter scheme, Γ 3d peaks at a transitional Re b with values ranging from 100 to 300. As Re b increases in the shear-dominated regime or decreases in the buoyancy-dominated regime, Γ 3d decreases similarly at the same rate. The Ri ≤ 0.16 cases in the present study indicate Γ 3d peaks at Re b ≈ 10 2 and, furthermore, it does not decrease similarly as Re b deviates from the transitional value. The Ri ≤ 0.16 cases shows the presence of an intermediate regime (40 Re b 10 2 ) in which Γ 3d remains relatively constant at values closer to the upper bound than the lower bound given by Mashayek et al. (2017b). We note that the parameterization suggested in Mashayek et al. (2017b) is based on both DNS of a shear layer with a two-layer density profile and observational data collected at sites where stratification and shear profiles are space-filling unlike their DNS set-up. It is unclear how the disparity between the DNS and the observational data influences the suggested parameterization. Nonetheless, the Re b dependence of Γ 3d seen in the present study suggests further evaluation of the parameterization.
Beside Re b , the turbulent Froude number (Fr(t)) can be a well-suited parameter for mixing parameterization of homogenous stratified turbulence (Ivey & Imberger 1991;Shih et al. 2000;Howland, Taylor & Caulfield 2020). The metric shows the competition between turbulent time scale (K/ε) and buoyancy time scale (N −1 ) so it can be used to describe the local state of stratified shear turbulence. For weakly stratified turbulence (Fr > 1), the flux coefficient decreases as Fr −2 while it remains relatively constant for strongly stratified turbulence (Fr < 1) (Garanaik & Venayagamoorthy 2019). In the present study we also observe two regimes of Fr as shown in figure 17(c). In the Ri = 0.04 case the two regimes are delineated by Fr ≈ 0.2, a value somewhat smaller than unity. As Fr increases or decreases from this value, Γ 3d decreases which suggests the optimal rate of mixing occurs at Fr ≈ 0.2 in this case. The other four cases with Ri 0.08 show Γ 3d also peaks at Fr less than 0.2 and it decreases as Fr decreases. Our results agree well with Maffioli et al. (2016) who, in homogenous stationary stratified turbulence DNS, found optimal mixing at ε/Nu 2 = 0.3, which converts to Fr = ε/NK = 0.2 upon taking u 2 = 2K/3. The bulk Richardson number (Ri b (t)) is also used as a descriptor of the state of stratified shear turbulence and a quantity to potentially correlate the flux coefficient. Figure 17 The dependence on Ri b shows a qualitative change among the different Ri cases and it is not possible to infer a common correlative trend. It is critical to emphasize that efficient turbulent mixing persists even at values of Ri b as large as 1.2, approximately two times larger than with the two-layer density profile. This is a direct consequence of strong turbulent activity in the sheared TLs at the edge of the shear layer which expands the vertical extent of the stratified region with active mixing. Salehipour et al. (2016) proposed a mixing parameterization in which the maximum mixing efficiency occurs at Ri b = 0.4 and turbulent mixing cuts off at Ri b = 1. The cases with Ri 0.08 in the present study shows that the mixing efficiency and flux coefficient peak as Ri b reaches values as large as 1.2. Clearly, the formation of TLs modifies the range of Ri b for which turbulent mixing is important. Therefore, it is important for any Ri b -based mixing parameterizations to include TL dynamics.
A choice for the turbulent diffusivity (K ρ ) provides a turbulence closure for the buoyancy flux since, by definition, K ρ = B/N 2 . Retaining only the irreversible part of the buoyancy flux leads to K ρ = ε ρ /N 2 = Γ 3d ε/N 2 . For Pr = 1, it follows that K ρ = Γ 3d κ Re b . Osborn (1980) chose Γ 3d = 0.2 leading to a linear dependence of K ρ = 0.2κ Re b on the buoyancy Reynolds number for all values of Re b . To assess the Osborn parameterization, a bulk value of turbulent diffusivity is computed from the simulation   Osborn (1980) with the assumption of Pr = Pr t = 1. The dashed magenta lines in panels (a,c) indicate the parameterization, K ρ /κ = K m /ν = 2Re 1/2 b for Re b > 10 2 , from Shih et al. (2005) with the same assumption of Pr t = 1. data as K ρ (t) = ε ρ dz/ N 2 dz where the z-integration is over L sl . Figure 18(a) shows the turbulent diffusivity to exhibit piecewise dependence on Re b similar to the piecewise dependence of Γ 3d on Re b . Disregarding the decrease of Γ 3d in the diffusive regime, K ρ exhibits a linear dependence on Re b for Re b 10 2 similar to Osborn's model. However, the model underestimates K ρ which suggest a higher constant value of the flux coefficient can improve said model. In the energetic regime (Re b > 10 2 ), K ρ is found to be proportional to Re 1/2 b similar to the results of Shih et al. (2005) although the proportionality coefficient is larger in the present study. It should be noted that  also found the Re b -dependence for K ρ in shear layers with a two-layer density profile to be similar to the result in Shih et al. (2005), e.g. see their figure 5.
Taking into account the effects of both Re b and Ri b , it is found that K ρ can be parametrized based on the product of Re b and Ri b as shown in figure 18(b). The choice of Re b Ri b ≈ ε/νS 2 is equivalent to using S as a characteristic inverse time scale rather than N in the mixing parameterization. This parameterization scheme is promising because it prescribes K ρ over the entire range of Re b , unlike the piecewise dependence observed when only Re b is used. It is noted that the bulk parameters in the present study do not extend sufficiently into the energetic regime (only the Ri = 0.04 case includes a large range of Re b ) and additional simulations at a higher Reynolds number are required to test whether this parameterization works well for the energetic regime.
We now move to the turbulent viscosity (K m ) and its parameterization. By definition of the turbulent viscosity, it follows that K m = P/S 2 , where P is the turbulent production. The equilibrium assumption for the TKE equation is used to write P = ε + B and the part of B responsible for irreversible mixing (ε ρ ) is retained so that the turbulent viscosity becomes K m = (ε + ε ρ )/S 2 . A bulk value of K m (t) is computed here using bulk (integrals over L sl ) values of ε, ε ρ and S 2 . Figure 18(c) shows that K m cannot be described by the Osborn model in this flow. Results from Shih et al. (2005) for homogeneous shear flow indicate that Pr t = 1 in the intermediate regime and decreases in the energetic regime (where Ri b is also low) to Pr t ≈ 0.6. In the present study Pr t exceeds 2.5 in all simulated cases for all mixing regimes. These higher values of Pr t are related to the higher Ri b in this problem. For example, Ri b increases from its initial value of 0.04 to 0.5 during 30 < t < 60 when the flow transitions to turbulence and into the energetic regime. Thus, unlike homogenous shear flow, the energetic regime with large Re b is accompanied by a significant increase of Ri b in this flow and, consequently, Pr t ∼ Ri b /E 3d is large even in the energetic regime. Furthermore, once buoyancy becomes sufficiently strong to bring Re b down to ≈100, Pr t (t) increases with decreasing Re b (t) because E 3d commences a decrease from its peak value.

Discussion and conclusions
In the present study DNS of a shear layer with uniform density stratification were performed to investigate turbulence and mixing at a Reynolds number (Re) of 24 000, a high value for DNS. The stratification in the chosen problem is extensive or domain filling in contrast to the well-studied case of continuous stratification between two layers, each with constant density, where the stratification is spatially compact and limited to the zone with shear. The background stratification (N 0 ) is varied over a wide range to examine the effects of buoyancy, Ri = 0.04-0.2, where Ri is the gradient Richardson number at the shear-layer centre. The simulation results are analysed to highlight the important effects of the uniform stratification outside the shear layer on the evolution of shear-layer turbulence. Furthermore, the results on mixing are contrasted with those of homogenous stratified shear turbulence (e.g. Shih et al. 2005) and of shear-layer turbulence between two layers of uniform density (a canonical two-layer density profile, e.g. Salehipour et al. 2016;Mashayek et al. 2017b) as well as the mixing parameterization scheme of Osborn (1980). Kelvin-Helmholtz shear instabilities develop in all cases. After the K-H instability develops, a myriad of secondary instabilities are seen to follow. The dynamics of secondary instabilities are as rich as those that arise in the stratified shear layer with a two-layer density profile. The secondary instabilities are driven by both enhanced shear in the eyelids and in the braids between K-H billows as well as by convection due to the unstable density gradient associated with the roll-up of the billows. Secondary convective instability is seen in the core as well as the eyelids of the K-H billows. As the secondary instabilities develop, the shear layer contracts, as indicated by the decrease of momentum thickness, for a short time in the Ri = 0.04 and Ri = 0.08 cases, a feature that has not been observed in similar simulations at lower Re (Pham et al. 2009;Pham & Sarkar 2010;Watanabe et al. 2018). Counter-gradient momentum transport (CGMT) and counter-gradient buoyancy flux (CGBF) occur during the contraction period leading to negative shear production and an increase in the MKE. Strong stratification tends to inhibit the growth of CGMT and CGBF since they do not occur in the Ri 0.12 cases.
The shear and stratification in the sheared zone evolve to profiles which are different from those found in the canonical two-layer problem.  and Salehipour et al. (2016) find that the largest value of the evolving bulk Richardson number Ri b (t) in the problem with a two-layer density profile is 0.5. Here, Ri b (t) can reach up to 1.2. Furthermore, multiple layers with differing Ri g including Ri g > 1 form.
While the rich dynamics of the secondary instabilities in the present case with domain-filling stratification are similar to those for a two-layer density profile, the mixing efficiency in the present study is significantly different.  who used a two-layer density profile found a narrow range of Ri with an optimal rate of turbulent mixing, and the cumulative mixing efficiency (E C based on the mixing over the entire flow evolution) peaks at a value of 0.45 when Ri = 0.16. For the present case of uniform stratification, we find E C to be considerably smaller (approximately 0.33) in the cases with Ri 0.08. Also, E C remains relatively constant among these cases suggesting a much wider range of Ri for optimal turbulent mixing.
A TL with elevated local stratification and shear forms at each edge of the shear layer owing to vertical turbulent fluxes which transport mass and momentum outward from the central region. The two TLs bound a central zone where the shear and stratification profiles are quite different from their initial shape. The central zone takes the form of a layer with some variability of shear and stratification around nominal constants whose value depends on the background stratification (Ri 0 ). As Ri 0 increases, the TL becomes thinner. The local N 2 (z) in the TL of the present configuration can be more than twice larger than N 2 0 , in contrast to the two-layer density profile where N 2 in the TL does not exceed its initial peak. Despite having the largest local N 2 , the TL sees significant turbulence at late time long after turbulence at the shear-layer centre has subsided and the local value of Ri g becomes larger than the critical value of 0.25. The TL exhibits high Re b = O(10 2 ) with mixing as efficient as at the shear-layer centre.
Background N 2 0 supports strong internal wave flux across the TL as previously reported by Watanabe et al. (2018). However, the magnitude of the wave flux (33 % of the peak spatially integrated dissipation rate) is smaller at the higher Re of the present DNS. Since the route to turbulence in the shear layer is different at high Re, so is the wave flux. It is worth noting that, unlike the far-field internal waves seen in Pham et al. (2009), the waves in the present study with weaker N 0 are evanescent and do not propagate far from the TL.
The dependence of mixing efficiency (E 3d ) and flux coefficient (Γ 3d = E 3d /(1 − E 3d )) on buoyancy Reynolds number (Re b ), turbulent Froude number (Fr) and bulk Richardson number (Ri b ) during the turbulent phase is found to be different from that seen in the shear layer with a two-layer density profile. The dependence is closer to statistically homogeneous turbulence forced by uniform shear and uniform stratification. A possible reason is that, during the regime of vigorous turbulence, the present high-Re shear layer in a uniformly stratified fluid evolves to shear and stratification profiles (e.g. the late-time profiles in figures 10 and 11) closer to the homogeneous shear problem over a large portion of the shear layer. For the same value of Ri, the uniformly stratified shear layer has a larger initial background potential energy. Using a two-layer density profile, Mashayek et al. (2017b) suggested that Γ 3d (Re b ) increases as Re  . When compared with the results of Shih et al. (2005), there are some quantitative differences, e.g. the cases with Ri 0.08 have higher values of Γ 3d . We note that, different from Shih et al. (2005), Portwood et al. (2019) reported that Γ ∝ Re −1/2 b dependence does not exist for 100 < Re b < 1000 in a recent DNS study with similar set-up. They argue that a transient effect in Shih et al. (2005) is the possible cause of the scaling. Our results support the validity of the Γ ∝ Re −1/2 b scaling. When Γ 3d is parametrized as a function of the Froude number (Fr), the flux coefficient in the present study also exhibits similar dependence as observed in the study of homogeneous stratified forced turbulence of Maffioli et al. (2016). The peak value of Γ 3d occurs at Fr = ε/NK ≈ 0.2 and it decreases as Fr deviates from this value.
The results for the mixing-efficiency parameterization depend on the choice employed for the vertical length scale of the shear layer. Figure 19(a) shows that the vertical profiles of turbulent production, TKE dissipation and TPE dissipation extend outside the boundaries of the shear layer marked by ±I u /2. Clearly, the shear-layer thickness (L sl ) is able to include the entire turbulent mixing zone better than I u . The relationship between the flux coefficient and the buoyancy Reynolds number (figure 19b) using ±I u /2 has some differences with that obtained using L sl (figure 17b). Comparison of figures 17(b) to 19(b) reveals the agreement with the scaling Γ 3d = 4Re The mixing efficiency in the cases with Ri 0.08 exceeds Osborn's model value of 1/6 over the entire turbulent state. Similar to the results of Salehipour et al. (2016), E 3d also exhibits a dependence on Ri b . While Salehipour et al. (2016) suggested that the mixing efficiency reaches its peak value of approximately 0.33 when Ri b = 0.4 and is saturated when Ri b = 1, E 3d in the Ri = 0.12 case reaches its maximum value of 0.31 when Ri b is as large as 1.2. The larger Ri b found in the present study is directly related to the stronger stratification in the TLs. The turbulent diffusivity (K ρ ) and turbulent viscosity (K m ) are larger than Osborn's model prediction as well as Shih et al. (2005). During entry into the initial energetic regime when Re b increases to > 100, Ri b also becomes large so that the turbulent Prandtl number (Pr t ∼ Ri b /E 3d ) is larger than in Shih et al. (2005) or Salehipour et al. (2016). In the other regimes, Pr t increases with decreasing Re b similar to other flows.
The results of the present study further confirm that turbulent mixing and its parameterization is sensitive to flow conditions including the shape of initial velocity and density profiles. The evolution of Ri b is significantly different between two-layer and constant stratification profiles since local N 2 (z) and S 2 (z) evolve differently. In order for a mixing parameterization to be generally applicable, future efforts would benefit by going beyond the use of bulk parameters to account for problem-dependent variability of local shear and stratification. It should be noted the parameterization of mixing efficiency may require multiple parameters; some of which are not yet considered in the present study. The value of the molecular Prandtl number (Pr = 1) in the present study is smaller than in oceanic flows where Pr varies from 7 to 700. The mixing efficiency E C has been shown to decrease at higher Pr (Smyth et al. 2001; for the two-layer density profile. In light of the significant reduction in E C between the two-layer density profile and uniform stratification (e.g. see figure 9), a further decrease due to higher-Pr effect would bring E C closer to the value of 1/6 used in Osborn's model.