Interpreted machine learning in fluid dynamics: Explaining relaminarisation events in wall-bounded shear flows

Machine Learning (ML) is becoming increasingly popular in fluid dynamics. Powerful ML algorithms such as neural networks or ensemble methods are notoriously difficult to interpret. Here, we introduce the novel Shapley Additive Explanations (SHAP) algorithm (Lundberg&Lee, 2017), a game-theoretic approach that explains the output of a given ML model, in the fluid dynamics context. We give a proof of concept concerning SHAP as an explainable AI method providing useful and human-interpretable insight for fluid dynamics. To show that the feature importance ranking provided by SHAP can be interpreted physically, we first consider data from an established low-dimensional model based on the self-sustaining process (SSP) in wall-bounded shear flows, where each data feature has a clear physical and dynamical interpretation in terms of known representative features of the near-wall dynamics, i.e. streamwise vortices, streaks and linear streak instabilities. SHAP determines consistently that only the laminar profile, the streamwise vortex, and a specific streak instability play a major role in the prediction. We demonstrate that the method can be applied to larger fluid dynamics datasets by a SHAP evaluation on plane Couette flow in a minimal flow unit focussing on the relevance of streaks and their instabilities for the prediction of relaminarisation events. Here, we find that the prediction is based on proxies for streak modulations corresponding to linear streak instabilities within the SSP. That is, the SHAP analysis suggests that the break-up of the self-sustaining cycle is connected with a suppression of streak instabilities.


Introduction
Recent successes in the application of artificial intelligence (AI) methods to fluid dynamics cover a wide range of topics. These include model building such as a data-driven identification of suitable Reynolds-averages Navier-Stokes models (Duraisamy et al. 2019;Rosofsky & Huerta 2020), subgrid-scale parametrisations (Xie et al. 2020;Rosofsky & Huerta 2020), state estimation by neural networks based on reduced-order models (Nair & Goza 2020), data assimila-tion for rotating turbulence (Buzzicotti et al. 2021) through Generative Adversarial Networks (Goodfellow et al. 2014), dynamical and statistical prediction tasks (Srinivasan et al. 2019;Lellep et al. 2020;Boullé et al. 2020;Pandey & Schumacher 2020;Pandey et al. 2020), or pattern extraction in thermal convection (Schneide et al. 2018;Fonda et al. 2019). Open questions remain as to how AI can be used to increase our knowledge of the physics of a turbulent flow, which in turn requires knowledge as to what data features a given machine learning (ML) method bases its decisions upon. This is related to the question of representativeness versus significance introduced and discussed by Jiménez (2018) in the context of two-dimensional homogeneous turbulence and motivates the application of explainable AI.
Lately, advances in model agnostic explanation techniques have been made by Lundberg & Lee (2017) in the form of the introduction of SHapley Additive exPlanations (SHAP) values. These techniques have proven themselves useful in a wide range of applications, such as decreasing the risk of hypoxaemia during surgery (Lundberg et al. 2018b) by indicating the risk factors on a per-case basis. Subsequently, these methods have been adapted and optimised for tree ensemble methods (Lundberg et al. 2018a). Here, we use boosted trees as well as deep neural networks in conjunction with SHAP values to provide a first conceptual step towards a machine-assisted understanding of relaminarisation events in wall-bounded shear flows.
Relaminarisation describes the collapse of turbulent transients onto a linearly stable laminar flow profile. It is intrinsically connected with the transition to sustained turbulence in wallbounded shear flows. Localised turbulent patches such as puffs in pipe flow either relaminarise or split in two (Wygnanski & Champagne 1973;Nishi et al. 2008;Avila et al. 2011a). Transient turbulence is explained in dynamical systems terms through a boundary crisis between a turbulent attractor and a lower branch of certain exact solutions of the Navier-Stokes equations (Kawahara & Kida 2001;Kreilos & Eckhardt 2012;Lustro et al. 2019). In consequence, the boundary of the basin of attraction of the laminar fixed point becomes fractal, and the turbulent attractor transforms into a chaotic saddle. Relaminarisation events correspond to state-space trajectories originating within this complex basin of attraction of the laminar state, eventually leaving the chaotic saddle in favour of the laminar fixed point. For an ensemble of statespace trajectores, the hallmark of escape from a chaotic saddle -a memoryless process -is an exponential sojourn time distribution ( ) ∝ exp ( / ) , with ( ) denoting the probability of residing within the strange saddle after time and the characteristic time scale of the escape (Ott 2002). Exponentially distributed sojourn times, or turbulent lifetimes, are a salient feature of wall-bounded turbulence close to onset, for instance in pipe flow (Hof et al. 2006;Eckhardt et al. 2007;Hof et al. 2008;Avila et al. 2010Avila et al. , 2011b or plane Couette flow (Schmiegel & Eckhardt 1997;Bottin et al. 1998;Eckhardt et al. 2007;Schneider et al. 2010;Shi et al. 2013), and they occur in box turbulence with periodic boundary conditions provided the forcing allows relaminarisation (Linkmann & Morozov 2015). The associated time scale usually increases super-exponentially with Reynolds number (Eckhardt & Schneider 2008;Hof et al. 2008;Avila et al. 2011b;Linkmann & Morozov 2015). The puff splitting process also has a characteristic Reynolds-number-dependent time scale, and the transition to sustained and eventually space-filling turbulence occurs when the puff splitting time scale exceeds the relaminarisation time scale (Avila et al. 2011b). In the language of critical phenomena, the subcritical transition to turbulence belongs to the Directed Percolation universality class (Pomeau 1986;Lemoult et al. 2016).
In order to facilitate the physical interpretation and to save computational effort, in this first step we consider a nine-dimensional shear flow model (Moehlis et al. 2004) that reproduces the aformentioned turbulence lifetime distribution (Moehlis et al. 2004) of a wall-bounded parallel shear flow. Subsequently, and in order to demonstrate that the method can be upscaled to larger datasets relevant to fluid dynamics applications, we provide an example, where the same classification task is carried out on data obtained by Direct Numerical Simulation (DNS) of plane Couette flow in a minimal flow unit. Here, we focus on the structure of high-and low-speed streaks characteristic of near-wall turbulence.
The low-dimensional model is obtained from the Navier-Stokes equations by Galerkin truncation and the basis functions are chosen to incorporate the self-sustaining process (SSP) (Waleffe 1997), which describes the basic nonlinear near-wall dynamics of wall-bounded parallel shear flows close to the onset of turbulence. According to the SSP, a streak is generated by advection of the laminar flow by a streamwise vortex, this streak is linearly unstable to spanwise and wall-normal perturbations, which couple to re-generate the streamwise vortex and the process starts anew. The nine-dimensional model assigns suitably constructed basis functions to the laminar profile, the streamwise vortex, the streak and its instabilities, and includes a few more degrees of freedom to allow for mode couplings. Each basis function, that is, each feature for the subsequent ML steps, has a clear physical interpretation. Hence the model lends itself well for a first application of explainable AI methods to determine which flow features are significant for the prediction of relaminarisation events.
The nine-mode model by Moehlis et al. (2004) and similar low-dimensional models have been considered in a number of contributions addressing fundamental questions in the dynamics of parallel shear flows. Variants of the nine-mode model have been used, for instance, to introduce the concept of the edge of chaos to fluid dynamics and its connection with relaminarisation events (Skufca et al. 2006), to understand drag reduction in viscoelastic fluids (Roy et al. 2006), or to develop data-driven approaches to identify extreme fluctuations in turbulent flows (Schmid et al. 2018). In the context of AI, Srinivasan et al. (2019) used different types of neural networks (NNs) to predict the turbulent dynamics of the nine-dimensional model. There, the focus was on the ability of NNs to reproduce shear flow dynamics and statistics with a view towards the development of machine-assisted subgrid-scale models. Good predictions of the mean streamwise velocity and Reynolds stresses were also obtained with Echo State Networks (ESNs) (Pandey et al. 2020). Doan et al. (2019) used physics-informed ESNs, where the equations of motion are incorporated as an additional term in the loss function, for dynamical prediction of chaotic bursts related to relaminarisation attempts.
The key contribution in our work is to identify the significant features within a data-driven prediction of relaminarisation events, that is, the features a classifier needs to see in order to perform well. For the NMM, apart from the laminar profile, we find that SHAP identifies some of the main constituents of the self-sustaining process, the streamwise vortex and a single sinusoidal streak instability, as important for the prediction of relaminarisation events. Other features, such as the streak mode or certain streak instabilities, which are certainly of relevance for the dynamics, are not identified. These strongly correlate with the features that have been identified as important for the classification, hence they carry little additional information for the classifier. There is no a-priori reason for choosing, say, the streamwise vortex instead of the streak as a feature relevant for the prediction. In fact, if predictions are run using only subsets consisting of featured that have not been identified as important but correlate with important features, the prediction accuracy drops significantly. Finally, the information provided by SHAP is discussed in conjunction with the model equations to provide physical insights into the inner workings of the SSP within the remit of the nine-mode model. For the DNS data, SHAP values indicate that the classifier bases its decisions on regions in the flow that can be associated with streak instabilities. This suggests SHAP as a method to inform the practitioner as to which flow features carry information relevant to the prediction of relaminarisation events, information that cannot be extracted by established means.
The remainder of this article is organised as follows. We begin with an introduction of the ninemode-model, its mathematical structure and dynamical phenomenology in sec. 2. Subsequently, sec. 3 summarises the technical details of the machine-learning approach, that is, boosted trees for the classification and SHAP values for the interpretation. The results of the main investigation are presented in sec. 4. First, we summarise the prediction of relaminarisation events. Second, the most important features, here the physically interpretable basis functions of the aforementioned nine-mode model (NMM), are identified by ranking according to the mean absolute SHAP values for a number of prediction time horizons. Short prediction times, where the nonlinear dynamics is already substantially weakened, serve as validation cases. As expected, the laminar mode is the only relevant feature in the prediction in such cases. For longer prediction times the laminar mode remains important, and the modes corresponding to the streamwise vortex and the sinusoidal streak instability become relevant. Therein, sec. 4.3 contains a critical discussion and interpretation of the results described in the previous sections. Here, we connect the significant features identified by SHAP to important human-observed characteristics of wall-bounded shear flows such as streaks and streamwise vortices in the self-sustaining process. Section 5 provides an example SHAP calculation on DNS data of plane Couette flow in a minimal flow unit. We summarise our results and provide suggestions for further research in sec. 6 with a view towards the application and extension of the methods presented here to higher-dimensional data obtained from experiments or high-resolution numerical simulations.

The nine-mode model
We begin with a brief description of the nine-mode model (Moehlis et al. 2004) and its main features. The model is obtained by Galerkin truncation of a variation of plane Couette flow with free-slip boundary conditions at the confining walls, the sinusoidal shear flow. Sinusoidal shear flows show qualitatively similar behavior compared with canonical shear flows such as pipe and plane Couette flow, in the sense that (i) the dynamics is goverened by the self-sustaining process (Waleffe 1997), and (ii) the laminar profile is linearly stable for all Reynolds numbers (Drazin & Reid 2004). Most importantly, the sinusoidal shear flow we use subcritically transitions to turbulence and shows relaminarisation events, it is thus a prototypical example of a wallbounded shear flow.
More precisely, we consider an incompressible flow of a Newtonian fluid between two -in principle -infinitely extended parallel plates a distance apart, with free-slip boundary conditions in the wall-normal 2 -direction. Periodic boundary conditions in the homogeneous streamwise ( 1 -) and spanwise ( 3 )-directions model the infinite extent of the plates. The sinusoidal shear flow is thus described by the incompressible Navier-Stokes equations in a rectangular domain Ω = [0, 1 ] × [− /2, /2] × [0, 3 ]. These read in non-dimensionalised form is the fluid velocity, is the pressure divided by the density and = 0 /(2 ) the Reynolds number based on the kinematic viscosity , the velocity of the laminar flow 0 and the distance between the confining plates, andˆ 1 the unit vector in the streamwise direction. The last term on the right-hand side of eq. (2.1) corresponds to an external volume force, which is required to maintain the flow owing to the free-slip boundary conditions. It sustains the laminar profile ( 2 ) = √ 2 sin( 2 /2)ˆ 1 and determines thereby the velocity scale 0 , which is given by ( 2 ) evaluated at a distance 2 = /4 from the top plate. for the velocity field with nine corresponding time-dependent coefficients ( ). Equation (2.1) then gives rise to a system of nine ordinary differential equations for 1 ( ), . . . , 9 ( ) -the NMM -given by eqs.  Fig.7), that is, it shows relaminarisation events with qualitatively similar statistics as wall-bounded parallel shear flows. Hence, the model is suitable for a study concerned with the prediction of relaminarisation events of turbulent shear flows. The nine ordinary differential equations that comprise the NMM are solved with an explicit Runge-Kutta method of order 5 (Dormand & Prince 1980) with a fixed time step, using Scipy (Virtanen et al. 2020) with Python. The time step for the integrator is set to = 0.25 for all simulations and we use a simulation domain of size [0, 4 ] ×[−1, 1] ×[0, 2 ] in units of /2. Since we later train ML models to predict the relaminarisation events, a Reynolds number of = 250 is chosen in order to reduce waiting times for relaminarisation events, as the mean turbulent lifetime increases very rapidly with Reynolds number. Figure 1 presents a time series of 1 ( ), . . . , 9 ( ) representative of a relaminarisation event in the NMM. After irregular fluctuations, eventually the coefficients 2 ( ), . . . , 9 ( ), pertaining to all but the laminar mode, decay. In contrast, the coefficient 1 ( ) of the laminar mode, shown in red, asymptotes to unity. The chaotic regions of the dynamics of the NMM are characterised by a Lyapunov time of ≈ 60. The Lyapunov time is the inverse of the largest Lyapunov exponent (Ott 2002) and corresponds to the time after which initially infinitesimally close phase-space trajectories become separated by an 2 -distance of , Euler's number.

XGBoost
A gradient boosted tree model is used as ML model for making the relaminarisation predictions. Specifically, the XGBoost (Chen & Guestrin 2016) implementation of a boosted tree model in Python is utilised to benefit from its fast implementation for very large datasets. XGBoost is 3), with the laminar coefficient 1 shown in black and modes 2 to 9 are shown in red to yellow. The dashed green line represents the threshold between turbulent and laminar dynamics as defined by an energy threshold on the deviations of the laminar profile = 5 × 10 −3 , see eq. (4.1). The number of snapshots per training sample is set to = 5, which are Δ apart. The temporal spacing is set to Δ = 100 in this example for visual purposes only, Δ = 3 is used in all calculations with > 1. The short orange vertical lines mark prediction time horizons of = {200, 300, 350} for visual guidance, see Section 4 for further details.
known for its high performances on ML tasks such as high energy physics event classification, massive online course dropout rate predictions and other dedicated real-life ML competition tasks (Chen & Guestrin 2016). Additionally, XGBoost-based classifiers benefit from fast implementations of SHAP value computations (Lundberg et al. 2018a) that will be used in Sec. 4.2 to explain the trained ML model.
Boosting methods belong to the class of ensemble methods (Hastie et al. 2009). These methods use an ensemble of weak learners, i.e. models that by themselves are not very powerful, to make predictions. The mathematical details of boosted trees and XGBoost can be found in Appendix A.1.

SHAP values
While ML models might show good prediction performances given a task, it is not per se clear which relations have been learned and led to this good performance. Complex and well performing ML models come at the cost of being difficult to be interpreted and inspected. Hence, traditionally less performing methods, such as linear models, were deployed for the sake of being easier to be interpreted. Recent advances in explainable AI attempt to work on the understanding of well-performing and complex ML models -including model agnostic explanation techniques and model-specific explanation techniques -to benefit from high prediction performances as well as explainable models.
One recent method that enables complex models to be interpreted are SHapley Additive exPlanations (SHAP) values. SHAP values unify recently developed explainable AI methods such as the LIME (Ribeiro et al. 2016), DeepLIFT (Shrikumar et al. 2017) and layer-wise relevance propagation (Bach et al. 2015) algorithms while also demonstrating theoretically that SHAP values provide multiple desirable properties. Additionally, SHAP values can be evaluated efficiently when using model-specific implementations such as for XGBoost. We briefly introduce SHAP values in the following.
SHAP values belong to the class of additive feature explanation models that explain the ML model output at sample ∈ R in terms of effects assigned to each of the features, with as number of features. Lundberg & Lee (2017) define a specific choice of Φ which they coined as SHAP values. These are based on the game theoretic Shapley values (Shapley 1953) and adhere to three desirable properties that make their explanations locally accurate and consistent. The SHAP value for feature of sample for model are computed as with as subset of features that does not contain the feature to be explained, the set of all features and ( ) as model output of feature subset . Φ 0 is determined separately as the average model output by Φ 0 = ( = ∅).
Intuitively, SHAP values thereby measure the difference between the trained model evaluated including a particular target feature and evaluated excluding it, averaged over all feature set combinations that do not include the target feature. The prefactor is a symmetric weighting factor and puts emphasis on model output differences for feature subsets with either a small number of features or a number close to . Hence, the model output difference that stems from removing the target feature is considered particularly relevant when there is either a small or a large number of features in the feature set that is considered.
The model evaluated on a feature subset , ( ), is technically challenging as a model is trained on a fixed number of features. ( ) is realised by a conditional expectation value that conditions on the feature values of that are present in feature subset , This avoids the technical difficulty of evaluating a readily trained model on a subset of features. The SHAP value property of local accuracy ensures that the sum of the SHAP values for the explained sample corresponds to the difference between the model output for that sample, ( ), and the mean prediction of the model, (3.4) Hence, the sum over all SHAP values is equal to the difference between model output and mean model prediction.
We use a fast implementation of SHAP values for tree ensemble models by Lundberg et al. (2018a). While Eq. (3.3) is typically evaluated by an integration over a background dataset, the fast tree-specific algorithm incorporates the tree structure by omitting all paths that are not compatible with the conditional values .
While SHAP values provide per-sample contributions for each feature, a typical task is to assign each feature = 1, . . . , an importance for the model predictions. A common approach is to average the absolute SHAP values over all samples in the dataset (Molnar 2020). The average ensures a statistical statement about the SHAP values and removing the sign from the SHAP values ensures that positive and negative contributions to the ML model output are accounted for equally.
Additionally to the classical SHAP values presented above, there exist SHAP interaction values (Lundberg et al. 2020) that capture the contributions of feature interactions to the ML model output by generalising the classical SHAP values to combinations of features. Consequently, each sample is assigned a matrix of SHAP interaction values that are computed as (3.6) Setting Φ 0,0 ( , ) to the average output of , one obtains a similar property as for the classical SHAP values in eq. (3.4), namely the additivity property Also for these SHAP interaction values we use a fast implementation for tree ensembles (Lundberg et al. 2020).

Results
Before studying the inner workings of the ML model, a well-performing model needs to be trained on relaminarisation events. This section defines the fluid dynamical classification task and presents the achieved results with a XGBoost tree followed by their explanation with SHAP values.
The prediction of the relaminarisation events a time ahead is considered a supervised binary classification problem in ML (Bishop 2006). Supervised tasks require the training data to consist of pairs of input and target outputs, commonly called and , respectively. Here, the input data consists of a number of nine dimensional vectors of spectral coefficients = ( 1 , . . . , 9 ) from the flow model introduced in sec. 2. The output is a binary variable encoded as 1 and 0 that contains information on whether the flow corresponding to the input spectral coefficients relaminarised a time ahead or not, respectively.
The training data is acquired by forward simulation of the flow model. A single fluid simulation is initialised with a random nine dimensional initial condition, with initial amplitudes uniformly distributed according to (−0.25, 0.25), and integrated for 4000 time units. After removing a transient period of 200 time units to ensure that the dynamics has reached the attracting phase space region, training samples for each of the two classes are extracted from the trajectory. This process of starting forward simulations of the fluid model and the subsequent extraction of training data is repeated until enough training samples have been obtained.
The training data comprises of = 10 6 training samples, half of which belong to the class of samples that relaminarise and that do not relaminarise, respectively. The balanced test dataset is separate from the training dataset and consists of = 10 5 samples that have not been used for training purposes.
The extraction of training samples from a trajectory is based on the classification of the trajectory in turbulent and laminar regions. For that, the energy of the deviation from the laminar flow of each of the velocity fields ( , ) in the trajectory is computed as (4.1) using the spectral expansion coefficients ( ) at each time step and the orthonormality of the basis functions in the last equality. To classify the trajectory in turbulent and laminar sections, an energy threshold = 5 · 10 −3 is set. Hence, a velocity field is classified according to the binary variable with = 0 denoting the class of samples that do not relaminarise time steps ahead and class 1 denoting those that do relaminarise. The value for is chosen based on empirical tests that have shown no return to chaotic dynamics after a trajectory reached a velocity field with energy .
Using the classification ( ( )) of a trajectory ( ), the training data acquisition is characterised by the prediction horizon and the number of flow fields that make up one sample. To construct a single training sample from a trajectory, a random point in the trajectory is chosen to serve as target point. Its classification label ( ( )) is used as training data target output . The input data is obtained by using equally spaced spectral coefficients preceding the chosen time point about , i.e. at − . Hence, a single training sample is extracted as with the temporal spacing between subsequent snapshots for one training sample Δ . We gauged Δ = 3 to the dynamics of the flow model in order to capture sufficient dynamical detail. Finally, the temporal positions are spread randomly in turbulent regions to obtain samples for class 0 and specifically placed at the laminar transition to obtain samples for class 1. Figure 1 shows the training data acquisition process based on an example trajectory with one randomly chosen time ,1 to obtain a sample for class 0, coloured in blue, and another time ,2 set at the laminar transition to obtain a sample for class 1, coloured in green. The short orange vertical lines mark the prediction time horizons of = {200, 300, 350} for visual guidance. The large value of 1 for = 200 demonstrates why this prediction horizon serves as validation case. After training, the ML classifier can be given a set of points equally spaced with Δ and predict whether the flow described by this data will relaminarise after a time integration of time units.
It is good practice to analyse the training data prior to training classifiers on it. We pick = 1 and visualise the training data distributions of the nine spectral expansion coefficients for ∈ {200, 300}, see Fig. 2. The distributions for the two classes 0 and 1 become statistically less distinguishable for increasing , requiring the ML model to learn per-sample correlations to perform well. It is observed that the classes for = 200 can be distinguished from a statistical point of view already. This is because the prediction horizon is not large enough to move the samples off the slope of the laminar transition as indicated by the rightmost orange bar in Fig. 1. The prediction horizon of = 300, on the other hand, is large enough to forbid sample classification through simple statistical properties because the histograms of both classes mostly overlap. Hence, = 200 is considered a benchmark case as the prediction performance is expected to be high because the large laminar mode is sufficient for the classification.

Prediction of relaminarisation events
The hyperparameters of the gradient boosted tree are optimised using a randomised hyperparameter search strategy. The strategy chooses the hyperparameters from predefined continuous (discrete) uniform distributions ( , ) ( ( , )) between values and and samples a fixed number of draws. We draw 100 hyperparameter combinations according to the distributions (4.4) We verified for = 300 that 100 draws cover the hyperparameter phase space sufficiently well by drawing 200 hyperparameter combinations to show that this leads to similar prediction performances as for 100. The hyperparameters that are found by the randomised hyperparameter search are listed in Appendix A.2.
The prediction performance, measured on a test dataset, for = 1 decays with increasing , as expected on account of the intrinsic chaotic dynamics of the flow model (Lellep et al. 2020;Moehlis et al. 2005Moehlis et al. , 2002, see Fig. 3. Nevertheless, the prediction performance is around 90% for 5 Lyapunov times (Bezruchko & Smirnov 2010) in the future and is, thereby, sufficiently good for the subsequent model explanations by SHAP values. Calculations for different values of verify that the prediction performance only varies marginally for one exemplary set of hyperparameters. This is to be expected based on the deterministic nature of the dynamical system and its full observability. Hence, we here focus on = 1, which means that the classifier does not get dynamical information but only a single spectral snapshot of the flow field. This reduces the computational cost for the subsequent model explanation by SHAP values.
The prediction horizon = 200, indeed, corresponds to the benchmark case where the laminar mode is supposed to be the only relevant indicator for relaminarisation and 450 corresponds to the case beyond which the ML model cannot predict reliably due to the chaotic nature of the system (Lellep et al. 2020;Moehlis et al. 2005Moehlis et al. , 2002. Lastly, to demonstrate the performance of the machine learning model also for applied tasks, the  model is applied in parallel to a running fluid simulation. Figure 4(a) shows the on-line prediction of one simulated trajectory. The horizontal bottom bar indicates whether the prediction of the classifier has been correct (green) or incorrect (red). We collected statistics over 1000 trajectories to quantify how well the model performs on an applied task instead of the test dataset. As shown in Fig. 4(b), the model performance for the on-line live prediction is with around 90% true positives and true negatives as well as around 10% false positives and false negatives comparable to the performance on the test dataset in terms of the normalised confusion matrices of the predictions. The normalisation of the confusion matrices is necessary to account for the substantial class imbalance in the data pertaining to the live prediction and to, thereby, make the performances on the two tasks comparable.
The results demonstrate that the ML model performs sufficiently well despite the intrinsic difficulty of predicting chaotic dynamics. Next, we turn towards the main contribution of this work. There, we use the trained XGBoost ML model as high-performing state-of-the-art ML model together with SHAP values to identify the most relevant physical processes for the relaminarisation prediction in shear flows in a purely data-driven manner.

Explanation of relaminarisation predictions
Since SHAP values offer explanations per sample and there are many samples to explain using the test dataset, two approaches may be taken: First, a statistical statement can be obtained by evaluating the histograms of SHAP values of all explained samples. Second, live explanations of single samples can be calculated, similar to what we demonstrated previously in Sec. 4.1 with live predictions of relaminarisation events. This work focuses on the former of the two perspectives and notes the potential of the latter approach for future work in Sec. 6.
The statistical evaluation shows bi-modal SHAP value distributions, see Fig. 5. Each class corresponds to one of the modes, emphasising that the model learned to distinguish between the two classes internally as the two classes are explained differently.
From Eq. (3.4) follows that the model output ( ) is made up of the SHAP values Φ ( , ). The multi-modality of the SHAP values conditional on the class means therefore that the feature contributions to the final output differ for both classes. Figure 6 shows the average absolute SHAP values per class over all explained samples for = 300 and thereby quantifies the differences in mode importance for the prediction of the two classes (Molnar 2020). Hence, the figure demonstrates that modes 1, 3 and 5 are the three most important modes. Feature importances are evaluated by computing the absolute SHAP value mean for different prediction times. This is shown in Fig. 7 for a range of .
The robustness of these results has been validated by two means: First, the SHAP values of = 300, which is neither a trivial task nor suffers from bad prediction performance at a prediction performance of 91%, are recomputed for a second set of randomly acquired training data by using a different initial training data seed. Not only do the minute fluctuations in Fig. 3 indicate the close similarities between the results but also the SHAP value histograms are similar (data not shown). Second, the XGBoost model with the optimal hyperparameters is retrained on a subset of features that are chosen according to the feature importances derived from the SHAP values. The computations confirm that the basis functions -which here have a clear correspondence to physical features and dynamical mechanisms -identified as most important features by the SHAP values lead to the largest training performance of all subsets tested. Also, the least important modes lead to the lowest training performance. Lastly, the baseline of all modes consistently achieves the largest prediction performance. Additional to the few tested feature subset combinations, all 9 3 = 84 combinations to pick 3 out of the 9 features have been evaluated for = 300. For these subsets, the prediction accuracy varies between 65% and 80%, with the combination of the features with the largest SHAP values, (1, 3, 5), leading to the maximal prediction accuracy (not shown).
To appreciate the concept of SHAP values, it is instructive to consider correlation matrices of the training data as shown in Fig. 8(a,b) for classes 0 and 1, respectively. A few observations can be made from the data. First, the correlation matrices belonging to two classes are remarkably similar, demonstrating that correlations alone are not sufficient to distinguish between the classes. Here, we note that correlations only capture linear relations between random variables. The only difference is that modes 4 and 5 positively correlate in class 0, while they correlate negatively in class 1, and similarly for modes 7 and 8. When comparing the correlation matrices with the mode coupling table or with the amplitude equations in Moehlis et al. (2004) we observe that strongly correlating modes couple via either the laminar profile (mode 1) or its deviation in streamwise direction (mode 9). The strong negative correlations between modes 2 and 3, and strong positive correlations between modes 6 and 7, which occur for both classes, can be made plausible by inspection of the evolution equation of the laminar profile. The nonlinear coupling only extract energy from the laminar flow if the amplitudes of modes 2 and 3 have the opposite sign, and those of modes 6 and 8 are of the same sign as products of these mode pairs occur in the evolution equation of the laminar profile. In other words, in order to obtain unsteady dynamics modes 2 and 3 must be mostly of opposite sign while 6 and 8 must mostly have the same sign. In this context we note that the amplitude of the laminar mode is always positive, as can be seen from the top panel of Fig. 2.
Secondly, we consistently find correlations between modes identified as significant for the prediction and irrelevant modes. For instance modes one and nine correlate, and so do mode two and three, four and five. Thirdly, modes that are considered significant do not correlate. The latter two points highlight the game-theoretic structure of SHAP. For example, as the fluctuations of the coefficients pertaining to modes two and three are strongly correlated, it is sufficient for the classifier to know about one of them. We will return to this point in Sec. 4.3.
To elucidate second-order interaction effects further, the SHAP interaction values (Lundberg et al. 2020) are computed, see Fig. 8(c,d). The overall bar heights denote the mode importance across both classes, the coloured bars distinguish between both classes. Interactions between modes 3 and 5 are found to be strongest for samples from both prediction classes. In particular modes 7 and 8 differ in their importances for the two classes: both are more important in cases where relaminarisation does not occur. Interaction effects between modes 4 and 5 are present for both classes, but more pronounced for samples of class 0. Generally, interaction effects of samples of class 1 are stronger than for those of class 0. The feature importances presented in Fig. 7 show that the laminar mode is consistently identified as a relevant feature. The shortest prediction time = 200 not only comes with a prediction accuracy of ≈ 98%, but the feature importance of the laminar mode is also significantly stronger than for the other tested prediction horizons. This indicates that this prediction case can, indeed, be considered a validation case. Within that scope, the validation succeeded as the statistically significant laminar mode is detected as most relevant mode. Increasing the prediction horizons leads to a decrease in the importance metric for all features, as can be inferred from the observed decrease in normalisation factors shown in Fig. 7 (b). The normalisation constants shown in Fig. 7(b) are computed as = 9 =1 Φ , , where Φ , ∈ R denotes the SHAP value of mode ∈ {1, . . . , 9} for a prediction time ( ) , the superscript enumerating the sampled prediction times ∈ {200, 225, . . . , 450}. Figure 7(a) thus presents Φ , / . The observed decrease in normalisation factors with increasing indicates, together with declining prediction performance, that sufficiently well-performing classifiers are required to enable the subsequent explanation step.

Interpretation
Throughout the prediction horizons, 1 , 3 and 5 are consistently considered important. These modes represent the laminar profile, the streamwise vortex and a spanwise sinusoidal linear instability of the streak mode 2 , respectively. Streamwise vortices and streaks are a characteristic feature of wall-bounded shear flows (Holmes et al. 2012;Schmid et al. 2002;Bottin et al. 1998;Hamilton et al. 1995). Alongside the laminar profile and its linear instabilities 4 − 7 , they play a central role in the self-sustaining process, the basic mechanism sustaining turbulence in wall-bounded shear flows. The importance of the streamwise vortex 3 increases with prediction horizon and decreases from ≈ 300 onwards, where the prediction accuracy begins to fall below 90%.
The streak mode itself appears to be irrelevant for any of the predictions, which is remarkable as it is, like the streamwise vortex, a representative feature of near-wall turbulence. Similarly, its instabilities, except mode 5 , are not of importance for the classification. For the shortest prediction time, that is, for the validation case = 200, mode 5 does not play a decisive role either, which is plausibly related to the SSP being significantly weakened close to a relaminarisation event. This rationale, of course, only applies to data samples in class 1, where relaminarisation occurs. Like the vortex mode 3 , the spanwise instability mode 5 increases in importance with prediction horizon except for the longest prediction horizon, which again can be plausibly explained by a more vigorous SSP further away from a relaminarisation event. Since 1 and 3 are translation-invariant in the streamwise direction, a mode with -dependence should always be recognised, as the SSP cannot be maintained in two dimensions. The dominance of 5 over any of the other instabilities may be related to its geometry resulting in a stronger shearing and thus a faster instability of the streak.
Apart from modes directly connected with the SSP, the deviation of the mean profile from the laminar flow, 9 , is also recognised as important for = 200 and = 250. Turbulent velocity field fluctuations are known to alter the mean profile. In extended domains, where turbulence close to its onset occurs in a localised manner, localisation occurs through turbulence interacting with and changing the mean profile. The mean profile of a puff in pipe flow, for instance, is flatter than the Hagen-Poiseuille profile towards the middle of the domain, which decreases turbulence production (van Doorne & Westerweel 2009;Hof et al. 2010;Barkley 2016). Now the question arises as to if and how the information SHAP provides concerning the mode importance ranking can be connected to the equations of motion and what can be learned from this. In particular, concerning strongly correlated modes, it is instructive to understand why a particular mode is favoured. For modes two and three, the mode coupling table of Moehlis et al. (2004) again gives some indications. Mode three (the streamwise vortex) generates mode two (the streak) by advection of mode one (the laminar profile) -this is the first step of the SSPor mode 9 (the deviation of the laminar profile). However, the coupling table is not symmetric, that is, 2 cannot generate 3 , and 3 can only be re-generated through nonlinear interactions involving either 5 and 6 or modes 4 and 7 or 8 -this is known as the third and last step in the SSP, where instabilities couple nonlinearly to re-generate the streamwise vortex. Hence, out of the strongly correlated mode pair 2 and 3 , the latter should be physically more significant in the SSP than the former, in the sense that 2 will become active if 1 and 3 are, but not vice versa. SHAP indeed identifies 3 as significant while 2 plays no decisive role in the prediction. A similar conclusion has recently been obtain for the transition to turbulence in flow through a vertically heated pipe (Marensi et al. 2021), where relaminarisation due to buoyancy forces has been connected with a suppression of streamwise vortices rather than streaks.
For modes 4 and 5 the situation is more subtle, as both modes can be converted into each other through advection by the laminar profile. Again considering the mode coupling table (or the amplitude equations), two points distinguish mode 5 from mode 4: (a) mode 5 is odd in 2 while mode 4 is even in 2 , (b) in interactions with mode 2, mode 5 couples to the only fully 3d mode, mode 8 (which is also odd in 2 ), while mode 4 does not. A fully fledged SSP should involve 3d dynamics, and the data distribution of mode 8 shows this clearly for the validation case ( = 200) as mode 8 is significantly weakened in class 1 compared to class 0. Considering the training data distributions of modes 4, 5 and 8, we observe that that the pdfs of mode 5 differ considerably between class 0 and class 1, and again mode 5 is suppressed in class 1. In contrast, mode 4 is active in both classes. Mode 5 thus provides a more direct route to three-dimensional dynamics from streak instabilites than mode 4 does.
In summary, the picture that emerges is as follows. For a sustained SSP, the streamwise vortex must be remain active as only it can generate the streak. Further to this, supplying spanwise flow perturbations of odd parity in wall-normal direction should help to prevent relaminarisation events, while spanwise flow fluctuations connected with streak instabilities of even parity in wall-normal direction play a minor role in sustaining the SSP.

Example -SHAP on data obtained by Direct Numerical Simulation
In order to demonstrate that the method can be leveraged to larger fluid dynamics datasets, we now discuss an example where SHAP values corresponding to the prediction of relaminarisation events are calculated on a dataset obtained by DNS of minimal plane Couette flow at transitional Reynolds number. For this particular example, features are not tied to physical processes or any modal representation of the flow. Instead, the analysis is carried out on flow-field samples, and SHAP is used in conjuction with a now neural-network based classifier to provide (a) an indication as to which flow features need to be observed to allow an accurate prediction of relaminarisation events, and (b) an interpretation thereof in terms of the SSP. Specifically to address the latter, and to connect to the results obtained for the SSP within the NMM, we ask the classifier to predict relaminarisation events based on the structure of the streaks characteristic for the SSP, and we use SHAP to demonstrate that the classifier bases its decisions on data features indicating the presence of streak instabilities. To do so, we focus on the streamwise component of the velocity field evaluated at a particular point in streamwise direction, that is, the classifier works with 2D data slices showing cross-sections of high-and low-speed streaks.

Numerical Experiments
In order to keep the computational effort to a manageable level commensurate with an example calculation, we consider simulations of plane Couette flow at a Reynolds number of 400 in the minimal flow unit, a domain of size 1 × 2 × 3 = 1.755 × 2 × 1.2 with periodic boundary conditions in stream-and spanwise directions and no-slip boundary conditions in the wall-normal direction, the smallest domain that sustains turbulence (Jiménez & Moin 1991;Hamilton et al. 1995;Kawahara & Kida 2001). The calculations have been carried out with a pseudospectral solver provided by channelflow2.0 (Gibson 2014; Gibson et al. 2022) using 1 × 2 × 3 = 16×33×16 points in streamwise, wall-normal and spanwise directions, respectively, with full dealiasing in stream-and spanwise directions, a resolution similar to other, including recent, studies of minimal plane Couette flow (Kawahara & Kida 2001;van Veen & Kawahara 2011;Lustro et al. 2019). We generate velocity field data for 5000 trajectories running for 5000 advective time units and take data samples at an interval of 10 advective time units. The simulations are initialised with randomly perturbed velocity-field samples taken at intervals of one advective time unit from a turbulent master trajectory and the training data acquisition process is started after a transient of 100 advective time units. The criterion for the observation of a relaminarisation event is a cross-flow energy threshold of 10 −5 , and we verify that the turbulent lifetime distribution is still exponential for this grid size (not shown).
As indicated earlier, a convolutional neural network is trained on the streamwise slice at 1 = 0 of the streamwise 1 component with around 5000 samples, yielding a spatial sample size of 10 × 33, taking into account truncation to remove aliasing effects. Two 2D convolutional layers with subsequent 2D max pooling layers are followed by a flattening layer with a dropout layer and a fully connected softmax layer with two neurons, one for each output class (Chollet 2020), to establish the NN graph. The size of the snapshots is well within the capabilities of NNs, that is, the method can certainly be applied to higher-resolved data. The main reason for the choice of resolution here is that the exponential lifetime distribution results in having to discard a significant number of trajectories, essentially all those where relaminarisation occurred very quickly, in order to ensure that the transient from the initial data has passed. After training the NN, the SHAP values for samples from the test dataset are calculated. We can focus on the SHAP values for class 1 only, as the SHAP values for the two classes differ only by a minus sign.

Results
First, the prediction time is varied between 10 and 200 advective time units to obtain the performance of the convolutional network for tasks of different difficulties (Lellep et al. 2020). The performance decreases from around 99% prediction accuracy for < 60 to around 60% at = 200 with a performance > 90% for 130 (not shown). In what follows, we discuss results obtained for = 90, however, results are consistent with larger prediction horizons with a prediction accuracy of > 90%. In Fig. 9 we present one representative sample for each of the two classes together with the spatial distribution of SHAP values to illustrate general observations that can be obtained from the data.
The streamwise component 1 of velocity-field samples evaluated at 1 = 0 always consists of localised regions in the velocity field of alternating small and large magnitudes, corresponding to cross sections of low-and high-speed streaks. Samples corresponding to class 0 feature less uniform streak cross sections than those of class 1. More precisely, the spatial decay of a streak in wall-normal and spanwise directions is less regular for samples of class 0 than for those of class 1. This can be seen by comparison of the representative visualisations shown in the top panels of Figs. 9(a) for class 0 and Figs. 9(b) for class 1. In what follows, we refer to regions of spatial decay as streak tails.
For samples of class 0, i.e. those that do not relaminarise, the SHAP values are mostly negative while the SHAP values for samples of class 1 are mostly positive. Furthermore, for samples of class 0 SHAP values detect the streak cores, where 1 varies little, more so for the low-speed rather than the high-speed streak. For class 1, however, the SHAP values point towards the tails of the corresponding more uniform streak cross sections. The tails of the high-speed streaks are hereby more pronounced. Interestingly, the tails of the less regular class-0 streak cross sections slightly contribute towards a classification of those samples to class 1 and the inner region of the class-1 small velocity regions contribute to the classification of those samples to class 0. That is because the tails and the core look similar to those of the other class, respectively. We can therefore conclude that the NN uses the streak cores for the classification towards class 0 and the streak tails, where velocity-field gradients are large, for the classification towards class 1.

Discussion
The 2D slices used in the training result in the classifier having to predict relaminarisation events based on the structure of the streaks, but without directly seeing the streak modulations characteristic of linear streak instabilities, as these would only be visible in either the full volume or in a wall-normal cross section of the flow. Comparing wall-normal cross-sections obtained from samples of classes 0 and 1 a-posteriori, we find consistently that streak modulations are much weaker or absent in class 1 compared with class 0. In conjunction with the results obtained by SHAP, we conclude that the classifier bases its decision on proxies for streak modulations corresponding to linear streak instabilities within the SSP.
For a relaminarisation event to occur, the SSP must break at some point in its evolution. The SSP consists of three consecutive stages, (i) the laminar profile is advected by the streamwise vortex creating streaks, (ii) the streaks become linearly unstable, (iii) the linear instabilities couple nonlinearly and re-generate the streamwise vortex. Relaminarisation could in principle be related with any of these stages, for instance with a weakening of streaks or of streamwise vortices or a suppression of streak instabilities. Strong streaks are present in both class 0 and class 1 DNS data samples, as can be seen from the visualisations of representative data samples shown in Fig. 9. This is commensurate with the results obtained for the NMM, where we found that the streak mode itself is not relevant for the prediction of relaminarisation events. That is, a scenario whereby relaminarisation is connected with a suppression of streaks is unlikely. A similar observation has been made for buoyancy-induced relaminarisation in a vertically heated pipe (Marensi et al. 2021). In contrast to this, and as discussed above, accurate predictions of relaminarisation events can be made based on the presence or absence for proxies for linear streak instabilities. If they are present, the streaks have a less regular profile and the flow remains turbulent, if not, it relaminarises. We note that the classification task did not involve observables connected with streamwise vortices. However, as streamwise vortices are a consequence of streak instabilities, not precursors, a suppression of streak instabilities would necessarily result in a suppression of streamwise vortices. In summary, the SHAP analysis suggests that the break-up point in the self-sustaining cycle is connected with a suppression of streak instabilities. Using 2D slices of a low-resolution DNS near the laminar-turbulent transition as training data for relaminarisation predictions shows that SHAP values yield insights into what is important for the prediction in the demonstration carried out here. Specifically, localised SHAP values can be linked to localised coherent regions of the flow, thereby providing a way to find relevant flow features for what the convolutional NN classifier uses for the prediction. Interestingly, regions where velocity-field gradients are large, that is, the tail ends of the streak cross sections, play a major role in the prediction of relaminarisation events.
The ability of SHAP to identify localised regions in the flow and isolate important sub-regions therein suggests that SHAP values can also identify physical processes that are used by classifiers provided a data representation capable of encoding physical, i.e. dynamical, processes is used. This would involve passing temporally resolved data, for instance by providing single training samples consisting of time-ordered data samples in sub-intervals of a given time series. Additionally, the correlation between input data and relevance for classifiers as determined by SHAP values could be used by experimentalists to, for instance, increase measurement resolution where necessary or optimise the location of probes.

Conclusions
The purpose of this article is to introduce SHAP as an explainable AI method capable of identifying flow features relevant for the prediction of relaminarisation events in wall-bounded parallel shear flows. As a first step and in order to facilitate a physical interpretation of the SHAP output, we used a dataset consisting of snapshots generated through forward integrations of the nine-mode model of Moehlis et al. (2004), as it is based on the self-sustaining process and each feature, here basis function, has a clear physical meaning. Subsequently, the same classification task is carried out on data obtained from DNSs of minimal plane Couette flow, where we specifically focus on the prediction of relaminarisation event based on the structure of highand low-speed streaks. The feature ranking furnished by SHAP was interpreted in the context of the SSP, resulting in a clear distinction between those near-wall features phenomenologically representative of the flow and those significant for the dynamics of a wall-bounded turbulent flow close to the onset of turbulence. More specifically, we demonstrated that relaminarisation events are preceded by a weakening of streak instabilities and thus necessarily of streamwise vortices, rather than being connected with the streaks themselves. Relaminarisation can only occur when the self-sustaining cycle breaks up, and our analysis suggests that this happens at the streak instability stage.
Concerning the nine-mode model, each data feature has a clear physical interpretation. This allows to address the issue of representativeness versus significance (Jiménez 2018). To do so, we suggest to classify the information obtained into two categories, one comprises features of known phenomenological importance -the representative features -which can be identified in a shear flow by the naked eye such as streamwise vortices or streaks, and the other comprises features of potential dynamical significance that are more difficult to observe directly, i.e. being not or at least much less representative. In the present context, the second class contains modes representing linear streak instabilities, for instance. For the first class, SHAP was used to uncover which of the known representative features were significant for the prediction of relaminarisation events. First, we see that a known representative feature of near-wall dynamics, the streamwise vortex, is identified. Second, and more interestingly, the dynamics of streak mode, also a representative feature, is not relevant for the prediction of relaminarisation events. In the second class, SHAP identifies a dynamically significant feature among the streak instabilities, the fundamental spanwise mode that is odd in 2 . This suggests that even though the streak has several types of instabilities within the SSP, the main effect on the dynamics with respect to relaminarisation events stems from spanwise instabilities of odd parity with respect to reflection about the midplane, at least in the nine-mode model.
For the DNS data, we find that SHAP identifies spatially localised regions in the flow that are relevant for the prediction of relaminarisation events. Taking guidance from the results obtained for the NMM, a classification task to probe the relevance of streak instabilites for the prediction of relaminarisation events was constructed by showing 2D data planes orthogonal to the spanwise direction to a classifier, that is, planes including cross-sections of high-and low-speed streaks. We find that SHAP values cluster in certain regions on the plane connected with variations in streak structure, which indicates that the classifier bases its decision on proxies for streak modulations corresponding to linear streak instabilities within the SSP. Since streamwise vortices are generated by nonlinear interactions of streak instabilities, SHAP thus identifies the suppression of streak instabilities as the point of breakdown within the self-sustaining cycle leading to relaminarisation. SHAP thus identifies not only which of the characteristic phenomenological features of the self-sustaining process in a wall-bounded shear flow are significant for the prediction of relaminarisation events, it also recognises patterns in the data corresponding to its fundamental dynamical mechanism. That is, it serves as a means to distinguish representativeness from significance of features for a given ML task. Furthermore, variances in the feature importance ranking across prediction horizons are commensurate with differences in the dynamics one would expect closer or further away from a relaminarisation event.
Finally, we conclude with a few suggestions for further work. As SHAP is model-agnostic and can be used in conjunction with deep learning algorithms, this method can be upscaled and applied to high-dimensional experimental and numerical data. Essentially, we can enviseage two main application categories, one aimed at obtaining further physics insight from high-dimensional numerical or experimental data, and one at purely technical improvements of analysis routines. The former will in most instances require a pre-processing step to decompose the data into physically interpretable features, while no such decomposition would be required for the latter to yield useful results. The results for the low-dimensional model presented here serve as a proof of concept for the former. The aforementioned example calculation using DNS data of transitional minimal plane Couette flow, where SHAP was used to identify regions in the flow that the classifier needs to see in order to render accurate predictions, demonstrates that the latter is in principle possible. An in-depth analysis of DNS data at higher Reynolds number and resolution is beyond the scope of the present paper, however it would be a very interesting follow-up study. For the former, examples for useful data decompositions are proper orthogonal decomposition (POD) (Berkooz et al. 1993) or dynamic mode decomposition (DMD) (Schmid 2010), both by now widely used techniques for data analysis and model reduction in fluid dynamics. While POD returns modes corresponding to energy content, DMD decomposes experimental or numerical data into spatio-temporally coherent structures labelled by frequency. The suitability of any data decomposition and reduction technique would depend on the planned task.
Identifying important modes and their interactions for a data representation without a straightforward physical interpretation could be useful to construct for instance a lower-dimensional description of the dynamics by only retaining important modes for any given ML task at hand. In complex geometries, SHAP analyses could provide guidance as to which part of a simulation domain requires high resolution and where compromises regarding resolution and simulation cost will be less detrimental to the overall accuracy of the simulation. This can become particularly helpful outwith pure turbulence research, such as for active fluids, geo-and astrophysical flows, viscoelastic flows or complex flow geometries, as these applications are data-intensive and the flow complexity often does not allow straightforward feature intepretation.
Ultimately, results reported here are based on a statistical analysis of the SHAP values. Additionally, as SHAP values can be calculated alongside real-time predictions, per-sample SHAP values may prove themselves as useful tools in on-the-fly tasks such as machine-assisted nonlinear flow control or for optimisation problems.
Declaration of Interests. The authors report no conflict of interest. as iterative objective for iteration step with regularisation Ω( ) = + 1 2 |( 1 , . . . , )| 2 on tree . Here, denotes the number of terminal nodes of tree and with = 1, . . . , is the terminal node weight of index and tree . Using as terminal node set of index and tree , the objective is used to quantify the quality of a tree structure after minimising the quadratic iterative objective, ) 2 ∈ ℎ (2) + + (A 5) with = { | ( ) = } as terminal node index set, ( ) mapping a sample to the index of the terminal node it is assigned to and ℎ (1) and ℎ (2) as first and second order derivatives of from Eq. (A 4), respectively. Using L ( ) ({ } =1,..., ), the quality of a node to be split into left and right, = ∪ , can be measured quantitatively.
The remarkable novelty of XGBoost, aside from the regularisation Ω( ), is a split finding technique that uses weighted data quantiles to find appropriate split candidates, which are themselves evaluated with a novel approximate split finding algorithm that uses the split loss L ( ) ({ } =1,..., ) as metric. Furthermore, XGBoost scales to very large datasets as all components are properly parallelisable. This is due to additional technical innovations, such as data sparsity awareness, cache awareness and out-of-core computations if the dataset gets very large.
XGBoost classifiers come with a number of tunable hyperparameter that need to be specified by the user. The following parameters are tuned in this work and are therefore explained here using the XGBoost notation: n_estimators (ℎ ) -The number of decision trees to fit to the task.
This number corresponds to in the notation introduced above.  Table 1 lists the optimal hyperparameters as identified by the randomised hyperparameter optimisation with 100 drawn hyperparameter samples. Since all prediction times use the same 100 hyperparameter samples, the same set of hyperparameters might be found to be optimal for more than one prediction horizon, e.g. = 200 and 250.