Prediction of equivalent sand-grain size and identification of drag-relevant scales of roughness – a data-driven approach

Abstract Despite decades of research, a universal method for prediction of roughness-induced skin friction in a turbulent flow over an arbitrary rough surface is still elusive. The purpose of the present work is to examine two possibilities; first, predicting equivalent sand-grain roughness size $k_s$ based on the roughness height probability density function and power spectrum (PS) leveraging machine learning as a regression tool; and second, extracting information about relevance of different roughness scales to skin-friction drag by interpreting the output of the trained data-driven model. The model is an ensemble neural network (ENN) consisting of 50 deep neural networks. The data for the training of the model are obtained from direct numerical simulations (DNS) of turbulent flow in plane channels over 85 irregular multi-scale roughness samples at friction Reynolds number $Re_\tau =800$. The 85 roughness samples are selected from a repository of 4200 samples, covering a wide parameter space, through an active learning (AL) framework. The selection is made in several iterations, based on the informativeness of samples in the repository, quantified by the variance of ENN predictions. This AL framework aims to maximize the generalizability of the predictions with a certain amount of data. This is examined using three different testing data sets with different types of roughness, including 21 surfaces from the literature. The model yields overall mean error 5 %–10 % on different testing data sets. Subsequently, a data interpretation technique, known as layer-wise relevance propagation, is applied to measure the contributions of different roughness wavelengths to the predicted $k_s$. High-pass filtering is then applied to the roughness PS to exclude the wavenumbers identified as drag-irrelevant. The filtered rough surfaces are investigated using DNS, and it is demonstrated that despite significant impact of filtering on the roughness topographical appearance and statistics, the skin-friction coefficient of the original roughness is preserved successfully.


Introduction
Surface degradation in flow-related engineering applications can take various forms, such as wearing or fouling, resulting in roughness on the solid surfaces.The most significant effect of surface roughness, in practical sense, is an increase in the skin-friction drag under turbulent flow conditions.As an example, the uncertainties in the prediction of roughnessinduced skin-friction on ship hulls subjected to bio-fouling can cause multiple billion dollars of energy waste every year (Schultz et al. 2011;Chung et al. 2021).Understandably, study of turbulent flow over rough surfaces has been an active area of research for nearly a century (Nikuradse 1933;Schlichting 1936;Perry et al. 1969;Raupach 1992;Krogstad et al. 1992;Bhaganagar et al. 2004;Busse et al. 2017;Jouybari et al. 2022).
The seminal work by Nikuradse (1933) has provided the following researchers with a common 'currency' to measure roughness-induced drag; that is equivalent sand-grain roughness size,   , defined as the sand-grain size in Nikuradse's experiments producing the same skin-friction coefficient as a rough surface of interest in the fully-rough regime.Equivalent sand-grain size is related to the downward shift in the logarithmic region of the inner-scaled mean velocity profile widely observed on rough walls, which is referred to as roughness function Δ + (Hama 1954).In the fully-rough regime, where the skin-friction coefficient is independent of Reynolds number, where  is the von Kármán constant and  is the smooth-wall log-law intercept (Jiménez 2004).
One must note that   is, by definition, a flow variable and not a geometric one.As a result, for any 'new' rough surface, it needs to be determined through a (physical or highfidelity numerical) experiment in which the skin-friction drag is measured.Obviously, such an exercise is not practical in many applications; therefore, a great amount of effort in the past few decades has been devoted to determining   of an arbitrary roughness a priori, i.e. merely based on its geometry (see e.g.van Rij et al. (2002); Flack & Schultz (2010); Chan et al. (2015); Forooghi et al. (2017); Thakkar et al. (2017); Flack et al. (2020)).A comprehensive description of these efforts can be found in the reviews by Chung et al. (2021) and Flack & Chung (2022).Essentially, they can be summarized as attempts to regress correlations between   (or Δ + ) and a few statistical parameters of roughness geometry based on available data.Some widely used parameters in this context are skewness of roughness height probability density function (p.d.f.) (Flack & Schultz 2010), effective (or mean absolute) slope (Napoli et al. 2008), and correlation length of rough surface geometry (Thakkar et al. 2017).
As a result of increased computational capacities in recent years, DNS has become a source of data for development of accurate roughness correlations as pointed out by Flack (2018).In this regard, the idea of DNS in minimal channels, proposed by Chung et al. (2015), has enabled characterizing larger numbers of roughness samples with a certain computational resource.Availability of more data, on one hand, has opened the door to utilization of machine learning (ML)-based regression tools, and on the other hand, enables inclusion of more roughness information (beyond only a few parameters) as the input to such tools.The latter point is particularly important since there is increasing evidence that both statistical and spectral information on roughness geometry are required for prediction of flow response to a multi-scale roughness (Alves Portela et al. 2021).In this regard, it has been shown that   for multi-scale random roughness can be determined nearly uniquely with a combined knowledge of roughness height p.d.f. and its power spectrum (PS) (Yang et al. 2022(Yang et al. , 2023)).
The first ML-based 'data-driven' tool for prediction of   has been recently reported by Jouybari et al. (2021).These authors used deep neural network and Gaussian process regression to train models with 17 inputs including widely used roughness parameters and their products.The training data for their model is obtained from DNS of flow over certain types of artificially generated roughness, which were also used to evaluate the model.Lee et al. (2022) used a similar neural network to that of Jouybari et al. (2021) and showed that improvements in predictive performance can be achieved if the network is 'pre-trained' on existing empirical correlations.While these pioneering works deliver promising results, the data-driven approach arguably has the potential to realize truly universal models, which can generalize beyond a certain class of roughness.The present work is an attempt to explore this potential.To this end a model is trained on a wide variety of multi-scale irregular roughness samples, selected based on an adaptive approach (explained shortly), which is aimed to enhance the universality of the predictions.This is evaluated using 'unseen' roughness from different testing data sets with different natures.Moreover, unlike the previous efforts, the present model incorporates the complete p.d.f. and PS of roughness as inputs rather than a finite set of pre-determined parameters.
Considerable attention has been paid in recent literature to the multi-scale nature of realistic roughness and the significance of its 'spectral content'.It has been suggested that, beyond a certain threshold, large roughness wave-lengths may impact the roughness-induced drag less significantly (Barros et al. 2018;Yang et al. 2022).While parametric studies of roughness PS (Anderson & Meneveau 2011;Barros et al. 2018) or Fourier filtering (Busse et al. 2015;Alves Portela et al. 2021) can shed light on this matter, in the present work we explore the possibility to directly evaluate contributions of different roughness scales utilizing the information embeded in the data-driven model developed in this study.This is motivated by the fact that the (discretized) PS is a direct input to the model, which hints at the potential to extract information about the role of different wave-lengths through interpretation of the model.
In order to train the data-driven roughness model,   for several roughness samples should be determined.This is referred to as 'labeling' of those samples borrowing the term from the ML terminology.Moreover, each roughness sample along with its   value is called a training 'data point'.One should note that labeling is a computationally expensive process due to the need to perform DNS.In dealing with such scenarios, ML methods classified under active learning (AL) -also known as query-based learning (Naoki & Hiroshi 1998) or optimal experimental design (Fedorov 1972) in different contexts -have been proven particularly advantageous (Zhu et al. 2005;Settles & Craven 2008;Bangert et al. 2021).In AL, selection of the training data is navigated in a way that the information gain from a certain amount of available data is maximized (Settles 2009).The 'informativeness' of a potential data point is commonly measured by the uncertainty in its prediction, which needs to be determined without labeling, e.g. through the standard deviation of the predictive distribution of a Bayesian model (Gal & Ghahramani 2015) or the variation of the predictions among a number of individual models (Raychaudhuri & Hamey 1995).
Two major AL categories can be identified in the literature (Angluin 2004;Lang & Baum 1992;Lewis & Gale 1994).The methods based on membership query synthesis expand an existing data set by creating and labeling new samples that the model is most curious about.In contrast, the methods based on pool-based sampling utilize a 'bounded' unlabeled data set (also called repository) U, select and label the most informative samples from U, and include them in the labeled training data set L. In the present work, pool-based sampling is deemed more suitable as it can prevent creating unrealistic samples (Lang & Baum 1992).Moreover, identification of the most informative samples follows a query-by-committee (QBC) strategy (Seung et al. 1992), in which variance in the outputs of an ensemble of individual models (the committee) is the basis for the next query.A detailed description of the implemented QBC is provided in section 2.
In summary, the present work aims to answer two questions; first, whether 'universal' datadriven predictions of   can be approached using a complete statistical-spectral representation of roughness (i.e., with p.d.f. and PS as inputs).We leverage AL to facilitate achieving this goal.The second question is whether and how the information embedded in a data-driven model can provide insight on the contributions of different roughness scales to the added drag.Following this introduction, the roughness generation approach, DNSs, and the ML methodology are described in section 2. In section 3, first the results and performance of the model are discussed, then the analysis of drag-relevant scales are presented.Section 4 summarizes the main conclusions.

Roughness repository
The (unlabeled) roughness 'repository' U is constructed by a collection of 4200 artificial irregular rough surfaces.These surfaces are generated through a mathematical roughness generation method where the PS and p.d.f. of each roughness can be prescribed (Pérez-Ràfols & Almqvist 2019).For creation of the present repository, p.d.f. and PS are parameterized, as described shortly, and their parameters are randomly varied within a realistic range to generate a variety of roughness samples while imitating the random nature of roughness formation in practical applications.
In total, three types of p.d.f.-namely Gaussian, Weibull, and bimodal -are used and for each new roughness added to the repository, one type is randomly selected.The Weibull distribution of random variable  -here the roughness height -follows where the shape parameter 0.7 <  < 1.7 is randomly selected with  = 1.0.In the present notation,  denotes the local roughness height as a function of wall-parallel coordinates (, ).
The bimodal distribution is obtained combining two Gaussian distributions through (Peng & Bhushan 2000): where  G (|, ) is the p.d.f. of the Gaussian distribution with randomized mean 0 <  < 0.5 and randomized standard deviation 0 <  < 0.5.The p.d.f.variable  is then scaled from 0 to the roughness peak-to-trough height  t = max() − min(), whose value is randomly determined in the range 0.06 <  t / < 0.18, where  is the channel half height.
The PS of the roughness samples in the repository is controlled by 2 randomized parameters, namely the roll-off length   (Jacobs et al. 2017) and the power-law decline rate  PS (Lyashenko et al. 2013), whose values are selected in the range 0.1 <   /(log( 0 / 1 )) < 0.6 and −3 <  PS < −0.1.Here,  0 and  1 represent the upper and lower bounds of the PS or the largest and smallest wavelengths forming the roughness topography.Random perturbations are added to the PS to achieve higher randomness in PS.The lower bound of the roughness wavelength is set to  1 = 0.04H to ensure that the finest structures can be discritized by an adequate number of grid points.The upper bound of the roughness wavelength  0 is randomly selected in the range 0.5H <  0 < 2H.As will be discussed later, the roughness sample size as well as the simulation domain size should both be adjusted to accommodate this wavelength.
Eventually, 4200 separate pairs of p.d.f. and PS are generated using the described random Focus on Fluids articles must not exceed this page length process, each leading to one rough surface added to the repository U .A representation of the parameter space covered by these samples is illustrated in section 3.1.Moreover, examples of the generated samples can be seen in Appendix A.
2.2.Direct numerical simulation DNS is employed to solve the turbulent flow over selected rough surfaces from the repository in a plane channel driven by constant pressure gradient (CPG).Each DNS leads to determination of the   value for the respective roughness sample -a practice referred to as 'labeling' in this paper.The DNS is performed with a pseudo-spectral Navier-stokes solver SIMSON (Chevalier et al. 2007).Fourier and Chebyshev series are employed for the discretization in wall-parallel and wall-normal directions, respectively.Time integration is carried out using third-order Runge-Kutta method for the advective and forcing terms and second order Crank-Nicolson method for the viscous terms.The roughness representation in the fluid domain is based on the immersed boundary method (IBM) of Goldstein et al. (1993).The code and the IBM are previously validated and used in several publications in the past (Forooghi et al. 2018a;Vanderwel et al. 2019;Yang et al. 2022).The solved Navier-Stokes equation writes where u = (, , ) ⊺ is the velocity vector and   is the mean pressure gradient in the flow direction added as a constant and uniform source term to the momentum equation to drive the flow.Moreover, , e x , ,  and f IBM denote pressure fluctuation, streamwise unit vector, density, kinematic viscosity and external body force term due to IBM, respectively.Periodic boundary conditions are applied in the streamwise and spanwise directions.The friction Reynolds number is defined as Re  =   (H −  md )/, where   = √︁   / and   = −  • (H −  md ) are the friction velocity and the wall shear stress, respectively.Here  and  −  md are half channel height without and with roughness,  md being the mean (meltdown) roughness height.In the present work all simulations are performed at Re  = 800.
Due to the high computational demand of many DNSs, the concept of DNS in minimal channels (Chung et al. 2015;MacDonald et al. 2016) is adopted for the considered simulations.Recently, Yang et al. (2022) showed the applicability of this concept for flow over irregular roughness subject to certain criteria.Accordingly, roughness function over a rough surface can be accurately predicted by a comparison of mean velocity profiles in smooth and rough minimal channels if the size of the channels satisfies the following conditions: (2.5) Here,   and   are the spanwise and streamwise extent of the minimal channel, respectively,  0 is the largest wavelength in roughness spectrum, and k is the characteristic physical roughness height.The plus superscript indicates viscous scaling hereafter.The above condition suggests that, the minimal channel size of each roughness should be determined based on the  0 (which in practice defines the most strict constraint).As described before,  0 is known for each generated roughness sample.Table 1 summarizes the simulation set up for all DNSs based on the respective  0 value.Due to the different sizes of the simulation domains, the chosen numbers of grid points differ according to the mesh size, but in all cases Δ + , ⩽ 4. In wall-normal directions, cosine stretching mesh is adopted for the Chebychev discretization.The mesh independence is confirmed in a set of additional tests.For each investigated roughness, Δ + is determined from the offset in the logarithmic velocity profile comparing corresponding rough and smooth DNSs.Notably, when plotting mean velocity profiles, zero-plane displacement  0 is applied in order to achieve parallel velocity profiles in the logarithmic layer, where  0 is determined as the moment centroid of the drag profile on the rough surface following Jackson's method (Jackson 1981) .It is worth noting that in the extensive literature on rough wall-bounded turbulent flows, various definitions of  0 have been proposed, and furthermore, the choice of virtual wall position can affect the predicted rough-wall shear stress   and thus the resulting   value (Chan-Braun et al. 2011).Therefore, it is important to recognize this as a possible source of uncertainty and take into account the definitions of   and  0 when comparing data from different sources.
It is also important to determine if the flow has reached the fully rough regime in each simulation.To this end Δ + is combined with Eqn.1.1 to yield a testing value of  +  .Then following the threshold adopted by Jouybari et al. (2021) a roughness with  +  ⩾ 50 is deemed to be in the fully rough regime and all samples not matching this criterion are excluded from the training or testing process.The selected threshold of  +  ⩾ 50 is somewhat lower than the common threshold of  +  ⩾ 70 (Flack & Schultz 2010) and thus may introduce some data points with limited transitionally-rough behavior into the database.This threshold is, however, deliberately chosen as a trade-off to maximize the number of training data given the limited computational resources.One should note that an increase in the threshold value of  +  while maintaining the same parameter space would be possible by increasing Re  .This would, however, lead to an obvious compromise in the final performance of the model by reducing the number of training data points at a given computational cost.
Overall, 85 roughness samples are DNS-labeled and eventually included in the labeled data set L to train the final AL-based model.The procedure for selection of these training samples is explained in detail in the following.Eight out of the 85 labeled samples are located in the range of 50 ⩽  +  ⩽ 70.We observe that incorporating these samples into the training process improves model performance.This improvement in the model performance can be attributed both to the incorporation of more informative samples according to AL as well as to the regularization effect of data diversity introduced by including transitionally rough training samples, which makes the model more robust and mitigates over-fitting (Reed & Marks 1999;Bishop et al. 1995).

Machine learning
The machine learning model in the present work is constructed in a QBC fashion by building an ensemble neural network (ENN) model consisting of 50 independent neural networks (NNs) with identical architecture as the 'committee' members.Similar to the methods proposed by Raychaudhuri & Hamey (1995) and Burbidge et al. (2007), the prediction uncertainty of the ENN model is defined as the variance of the predictions among the members,    .
The workflow of the AL framework is sketched in the figure 1.Two collections of roughness samples are included in the framework.These are the (unlabeled) repository U and the (labeled) training data set L. As a starting point in the AL framework, 30 samples are randomly selected from the repository, labeled (i.e.their   is calculated) through DNS, and used to train a first ENN model, which is referred to as the 'base model'.This preliminary 'base model' is subsequently improved throughout multiple AL iterations.In each AL iteration, approximately 20 new roughness samples from U are DNS-labeled and added to L for training of ENN.These are the samples in U with the highest prediction variances according to the most recent ENN.This QBC strategy leads to an effective exploration of the repository and adding the new data at the most uncertain regions of the parameter space.
The function of the ENN model is to regress the (dimensionless) equivalent sand-grain roughness   =   / 99 , and to calculate the variance of the predictions    as a basis for QBC ( 99 is the 99% confidence interval of the roughness p.d.f., which is used as the representative physical scale of roughness height in this paper).The ENN is composed of multiple NNs with similar structures that is shown in figure 2. The input vector  of the NN contains the discretized roughness p.d.f. and PS along with 3 additional characteristic features of the rough surface, i.e.  t / 99 and the normalized largest and smallest roughness wavelength  * 0 =  0 / 99 and  * 1 =  1 / 99 , respectively.The input elements in  that represent the roughness p.d.f. and PS are obtained by equidistantly discretizing the roughness p.d.f. and PS each into 30 values within the height range 0 <  <  t and the wavenumber range 2/ 1 > 2/ > 2/ 0 .Each NN in the ensemble is constructed with one input layer with 63 (3+30+30) input elements, three hidden layers with 64, 128 and 32 non-linear neurons with rectified linear units (ReLUs) activation (max{0, }), and one linear neuron in the output layer.

Forward propagation
Input layer The optimal number of neurons at each layer is determined through a grid search of a range of numbers that achieves the lowest model prediction error on T inter .L2-regularization is applied to the loss function.Adaptive momentum estimation (Adam) is employed to train the model.The final prediction of the ENN is defined as the mean prediction over the 50 NNs, namely    = 50 =1 k, /50, where k represents the prediction of a single NN, the index  indicates the index of the NN.The prediction variance is calculated as

Neural network
It is worth noting that each NN in the ENN model is individually trained based on 90% of the randomly selected samples in the labeled data set L while the rest of the samples are used for validation.The initial weights of the neurons in each NN are randomly assigned at the beginning of the training process.In such a way, the diversity among the QBC members is ensured, which is an important factor in determining the generalization of the ENN model (Melville & Mooney 2003).It is important to note that the current ensemble members used in the model are deterministic neural networks, and the uncertainty of the training data from DNS is assumed to be minimal.However, when considering experimental training data, where (aleatoric) uncertainties arise from possible measurement errors, the performance of the current ENN approach may be compromised due to its limited capability in handling such uncertainties.In these scenarios, the utilization of probabilistic models -such as Bayesian neural networks -may be more suitable as they allow for the explicit incorporation of measurement uncertainties.

Testing data sets
In the present work, three distinct testing data sets are introduced to evaluate the model performance and its universality.The difference among the data sets lies in the nature and origin of samples they contain.The first data set T inter is composed of 20 samples randomly chosen from U that have been never seen by the model during the training process.Despite the fact that the employed roughness generation method can generate irregular, multi-scale surfaces resembling realistic roughness, we separately test the model for additional rough surfaces extracted from scanning of naturally-occurring roughness, which form the second testing data set T ext,1 .There are five samples in this 'external' data set.These include roughness generated by ice accretion (Velandia & Bansmer 2019), deposit in internal combustion engine (Forooghi et al. 2018b), and a grit-blasted surface (Thakkar et al. 2017).In addition to that, we test the model against a second external data set T ext,2 , which contains irregular roughness samples from the database provided by Jouybari et al. (2021).In this data set, many roughness samples are generated by placing ellipsoidal elements of different sizes and orientations on a smooth wall, making them rather distinct from the type of roughness used to train the model.We separate this testing data set from the other two as it contains a specific type of artificial roughness.

Assessment of the active learning framework
In this section, we explore if the AL framework enhances the training behavior of the model.To do so, we compare a model trained with AL-selected data points to one trained with an arbitrary selection of data points.To avoid computational cost of running many eventually unused DNSs, the comparison is made for only one AL iteration.Figure 3 shows all p.d.f. and PS pairs contained in the repository U (gray) and those randomly selected for the initial 'base model' (green), as well as those selected for further training (other colors).The wide range of available roughness can be understood from the area covered by gray curves.As explained before, once the base model is trained using the initial randomly selected data set, it is used to determine which samples from the repository U should be selected for the next round  3 that the AL model explores surfaces that are least similar to those in the initial data set (green) and tend to cover the entire repository with a higher weight given to the marginal cases.Furthermore, the parameter distribution as well as the corresponding   values of the selected roughness by means of AL and EQ is compared in the insets of figure 4. It can be seen, that both the AL and EQ models generally prioritize selecting samples within the waviness regime i.e.ES< 0.35 (Napoli et al. 2008).This preference may arise from the fact that the resulting drag in the waviness regime (ES< 0.35) is sensitive to changes in ES (Schultz & Flack 2009).Conversely, beyond this regime (ES > 0.35), the resulting Δ + saturates in relation to increasing ES, making these samples less interesting for both labeling strategies.On the other hand, the AL model particularly tends to sample the roughness with positive skewness and low correlation length.This can similarly be a result of the roughness effect being highly sensitive to the variations in roughness statistics within these range of parameters, which is in-line with previous findings (Busse & Jelly 2023;Schultz & Flack 2009).Subsequently, two separate models are trained based on the AL and EQ strategies.These models are applied separately to determine the variance of prediction for roughness in the repository, and the results are depicted in figure 4(a) using red and blue lines.It is evident from the results that both the AL and EQ models generally reduce the prediction variance.However, a more substantial decline in the values of    is achieved by the AL model.This is the expected behavior as AL is designed to reduce the prediction uncertainty by targeting regions of the parameter space where the uncertainty is the largest.Interestingly, Rapids articles must not exceed this page length some increase in    of the EQ model can be observed for a number of samples with very high    , which can be a sign that the performance of EQ model in the 'difficult' tasks deteriorates as it is not trained well for those tasks due to ineffective selection of its training data.Moreover, the prediction errors (calculated based on correct   values of testing data set T inter obtained by DNS) are illustrated in figure 4(b).The averaged prediction errors, , achieved by the base model, the AL model, and the EQ model for the entire T inter are 19.1%,16.0%, and 22.0%, respectively.While AL model yields a meaningful reduction in , the overall performance of the EQ model deteriorates, possibly due to the over-fitting, which in our case refers to the condition where the model is trained to fit a limited number of relatively similar data points so precisely that its ability to extrapolate on dissimilar testing data is degraded (Hastie et al. 2009).To better analyse this observation, the testing data set T inter is evenly split into 2 subsets according to their    , namely the high-and low-variance subsets.The  values for both high and low-variance subsets are illustrated in the figure.It is clear that, while the EQ strategy improves the model performance for the already lowvariance test data, its error increases for high-variance test data, which can be taken as an indication of over-fitting as described above.The AL sampling strategy, in contrast, seems to protect the model from over-fitting -especially in the circumstance of a small training data set -hence the error is reduced for both high-and low-certainty test data as a result of effective selection of training data.

Performance of the final model
Having demonstrated the advantage of AL over random sampling, three additional AL iterations are carried out.The distribution of the PS and p.d.f. of the selected roughness from the second to the fourth AL iterations are displayed in figure 3 with black lines.A number of roughness maps from each AL round are also displayed in Appendix A.
The total number of data points for training of the model after four iterations adds up to 85; these are the data that form L. The scatter plots of some widely investigated roughness parameters in L as well as in the unlabeled repository U are displayed in the lower left part of figure 5(a).In the figure,  (, ) is the elevation map of the roughness,  = 1/( 3 rms ) ∫  ( −  md ) 3 d represents the skewness, where  is the wall-projected surface area, and  md = 1/ ∫  d is the melt-down height of the roughness.The effective slope is defined as ES = 1/ ∫  |()/()|d . Corr is the correlation length representing the horizontal separation at which the roughness height auto-correlation function drops under 0.2.An inverse correlation can be observed between  Corr and ES, which is expected as roughness with larger dominant wavelength tend to have lower mean slope.The distribution of other statistics in U appears to be reasonably random.
For the sake of comparison, the test data are additionally represented in the upper right part of the plots with orange (for T inter ) and purple (for T ext, 1&2 ) symbols.It is worth noting that only the roughness samples that locate in the fully rough regime at the currently investigated Re  are included in L and shown in the figure.Figure 5(b) shows the values of   (from DNS) against the three roughness statistics for all labeled data in the training and testing data sets.As can be clearly observed in the figure, while equivalent sand-grain roughness shows some general correlation with each of these statistics (increasing with  and ES, decreasing with  Corr ), the collapse of data is far from perfect.Clearly, no roughness statistics can entirely capture the effect of an irregular multi-scale roughness topography on drag, which is essentially a motivation behind seeking a NN-based model to find the functional relation between   and a higher order representation of roughness (here p.d.f. and PS).
Eventually, the final model is trained on the entire labeled data set L. The mean and maximum error values achieved by this model on all three testing data sets, as well as those errors after each training round, are displayed separately in figure 6.The figure shows a generally decreasing trend in both mean and maximum error as the model is progressively trained for more AL rounds, despite some exceptions to the general trend in the first two rounds when the number of data points is low.It is notable that the AL model is particularly successful in bringing down the maximum error, and hence, can be considered reliable over a wide range of scenarios.
One should mention that the model performs consistently well for three different testing data sets with different natures.While the data set T inter covers an extensive parameter space -hence containing more extreme cases -it is generated employing the same method as the training data.Therefore, to avoid a biased evaluation of the model, two 'external' testing data sets from literature are also included.The data set T ext,2 is believed to be particularly challenging for the model, since it is formed by roughness generated artificially using discrete elements (Jouybari et al. 2021), which is fundamentally different from the target roughness of this study.Nevertheless, the final model yields very similar errors for all data sets; what can be taken as an indication of its generalizability.The averaged errors of the final model within the data sets T inter , T ext,1 , and T ext,2 are approximately 9.3%, 5.2%, and 10.2%, respectively.
It is crucial to acknowledge that the present model is developed under the assumption of statistical surface homogeneity.However, when reaching beyond this assumption, the presence of surface heterogeneity introduces additional complexity to the problem that cannot be adequately represented by the current training samples.As a consequence, the effect of heterogeneous roughness structures (Hinze 1967;Stroh et al. 2020) cannot be adequately accounted for by the current model.

Data-driven exploration of drag-relevant roughness scales
The fact that naturally-occurring roughness usually has a multi-scale nature with continuous spectrum is well established (Sayles & Thomas 1978).How spectral content of roughness affects skin-friction drag and whether a certain range of length-scales dominates it are, however, questions receiving attention more recently (Anderson & Meneveau 2011;Mejia-Alvarez & Christensen 2010;Barros et al. 2018;Medjnoun et al. 2021).In this sense, Busse et al. (2015) applied low-pass Fourier filtering to a realistic roughness and observed no significant effect on skin-friction drag when the filtered wavelengths were lower than a certain threshold.On the other hand, Barros et al. (2018), used high-pass filtering and suggested that very large length scales may not significantly contribute to drag.Alves Portela et al. (2021) examined three filtered surfaces, each maintaining one-third of the original spectral content associated with large, intermediate, or small scales.In all cases, the filtered scales were shown to include 'drag-relevant' information.While both lower and higher limits of drag-relevant scales (if they exist) can be matter of discussion, the present study mainly focuses on the latter.Possibly related to that question, Schultz & Flack (2009) documented the equivalent sand-grain size of pyramid-like roughness with wavelengths higher (hence lower effective slopes) than a certain value not to scale in the same way as those with smaller wavelengths.These authors coined the term 'wavy' for the high-wavelength roughness behavior.Later on, Yuan & Piomelli (2014a) revealed that the wavy regime may emerge at a different threshold (in terms of effective slope) in a multi-scale roughness compared to the single-scale pyramid-like roughness.Recently, Yang et al. (2022) showed that the spectral coherence of roughness topography and time-averaged drag force on a rough wall drops at large streamwise wavelengths, which, in-line with the finding of Barros et al. (2018), suggests decreasing dragrelevance of large scales.
In the present work, we are particularly interested to explore the possibility of extracting the drag-relevant scales from the knowledge embedded in the data-driven model.In doing so, we employ the layer-wise relevance propagation (LRP) technique (Bach et al. 2015), which has previously proven successful in other contexts as a way to interpret decisions of NN models (Arras et al. 2016;Samek et al. 2015).LRP is an instance-based technique, which can be used to quantify the contribution of each input feature (here points in discretized p.d.f. and PS) to the output of the model (here   =   / 99 ) for a single test case (here a roughness sample).According to this technique, the contribution score (or relevance) of neuron  at each layer of the deep neural network can be expressed as where   is the contribution score of neuron  in the subsequent layer.In equation 3.1,  and  are the weight and activation of the neuron that are obtained when the model is used to predict one instance (here the   for the roughness sample of interest).Note that in our NN, the last layer corresponds to the predicted output and the first layer to the input roughness information.For better interpretability we assign the value of one to the contribution score (or relevance) of the output neuron.As a result, the sum of contribution scores of all inputs must be one.Note that the contribution scores shown in this section are averaged over the 50 NN members.
In order to extract drag-relevant scales we consider the following idea.A wavelength that does not affect   (which is a measure of added drag) still contributes to an increasing variance of the roughness height, and hence  99 .Therefore, the related output of the NN, which is the ratio   / 99 , is decreased.An input that decreases the output shows a negative LRP contribution score.With that in mind, figure 7 shows three exemplary roughness samples (named A, B and C) and their discretized PS.Each discrete wave-number in a PS is an input to the model, thus has a contribution score, which is indicated using the specified colour code.The spectra are shown in pre-multiplied form and the p.d.f. of each roughness is also displayed.Samples with both Gaussian and non-Gaussian p.d.f.s are included.It is observed in figure 7 that the small wave-numbers (i.e.large wavelengths) generally have more negative contribution scores, which is in accordance to the suggestion of Barros et al. (2018).Indeed the most negative contributions consistently belong to the largest wavelengths for all samples.On the other hand, smaller wavelengths generally show larger contribution scores, but the trend is not monotonic.This might indicate that drag-relevant scales reside within a certain range of the spectral content.
In order to examine whether or not negative LRP contribution score indeed indicates drag-irrelevance, we apply high-pass filtering to the samples in figure 7, and examine the resulting roughness using DNS under the same conditions as for the original roughness.The position of filter is chosen to be the largest wavelength with non-positive contribution score (a 3-point moving average is applied to smoothen the LRP scores beforehand).Figure 8 shows the height map of original versus filtered samples, the spectra with filter positions, and the inner-scaled mean velocity profiles before and after filtering for samples A, B, and C. Some statistical properties of all original and filtered samples are also displayed in table 2. It is clear from figure 8 that the velocity profiles of original and filtered samples collapse very well in the logarithmic region and beyond, which obviously leads to similar values of roughness function and drag coefficient.This observation lends support to the hypothesis that the large roughness scales beyond a threshold do not have a meaningful contribution to the added drag, and that LRP analysis can be a data-driven route to identifying those scales a priori.One obvious application of this finding can be in selection of sampling size for the investigations of roughness effect.In practice, it is not always possible to obtain roughness samples, that are large enough to encompass the full spectrum of scales.However, once the range of drag-relevant scales is completely covered by a roughness sample, a miscalculation due to a limited sample size can be avoided.
Interestingly, in all samples shown in figure 8, the filtered scales have a significant contribution to the roughness height variance based on the pre-multiplied roughness spectra.This is also reflected in the significant decrease in roughness height  99 and  Corr in table 2, as anticipated.Additionally, based on the three observed cases, the reductions in the ES values are found proportional to the filtered fraction of the PS.Other statistical parameters also undergo changes due to filtering, while obviously none of these changes are relevant in determining the drag.It is worth noting that additional to the roughness height  99 and ES, other drag-determining quantities, such as , undergo a general reduction for roughness C. According to some existing empirical correlations (e.g. the correlations proposed by Chan  2020)) the simultaneous reduction in these statistics should lead to a lower   .This is however not the case in reality based on the DNS results which can be reminiscent of the suggestion by Barros et al. (2018) that a high-pass filtering is necessary if predictive correlations are to capture the correct trend between   and the roughness statistics.This also provides an indication for the hypothesis that, while statistical parameters can correlate the equivalent sand grain size of irregular roughness to some degree, only a combined statistical-spectral approach can fully capture the physics of roughness-induced drag.Furthermore, it is observed in figure 8 that the mean velocity profiles of original and filtered can exhibit some deviation very close to the wall.These deviations can be attributed to the altered volume occupied by roughness close to the wall as reflected by their  md and  values.However, these do not seem to have a significant influence beyond the region occupied by roughness.
To shed further light on why the filtered large scales do no contribute to added drag, exemplary - planes of the time-averaged streamwise velocity field are examined in figure 9.The overlaid white contour lines are iso-contours of streamwise mean velocity  = 0, which mark the regions of reversed flow.Expectedly, roughness A exhibits relatively frequent flow recirculation due to larger local surface slope.In contrast, the occurrence of flow separation over roughness B and C seem to be less frequent, which could be linked to the waviness characteristics (Schultz & Flack 2009) and less dominant form drag as a result of the low surface slope.When comparing the flow fields over filtered and original roughness, it is evident that the locations of flow recirculation are the same, and filtering has a minimal impact on the extent of reversed flow regions.Moreover, in figure 9, red contours are used to show the blanketing layer, which following Busse et al. (2017) is defined as the flow region confined by iso-surfaces of  + = 5.On a smooth wall, the blanketing layer would be identical to the viscous sub-layer while on a rough wall it can be an indication of how the near-wall flow adapts to the roughness topography.Similarly to the observations in (Busse et al. 2017), one can observe in figure 9 that the blanketing layers in the shown cases do not follow the small roughness scales and steep roughness patterns.This behaviour can be better recognised if the 'depth' of the blanketing layer, i.e.Δ  + =5 (, ) =   + =5 (, ) −  (, ), is considered.The maps of Δ  + =5 (, ) are shown in figure 10, where a visual inspection reveals relative insensitivity of the blanketing layers to the smaller scales of roughness topography (which appear when the roughness height is subtracted from the  + = 5 iso-contour height).Interestingly, in the same figure, a fair level of similarity is observed between the Δ  + =5 (, ) maps of the corresponding original and filtered cases.This can be a hint that the blanketing layer has adapted to the filtered scales.This idea is examined in appendix B through a spectral analysis of Δ  + =5 (, ) for cases A-C.Based on this analysis, one might be able to hypothesize that the drag-irrelevant roughness scales are indeed those to which the blanketing layer can adapt.As a final remark, a relation between the drag and blanketing layer depth is physically plausible as a change in this depth is generally accompanied by modifications in local flow phenomena (flow separation, strong changes in local velocity gradient on the wall, etc.) that can be linked to added drag.

Turbulent statistics over original and filtered roughness
In the previous section, we used an LRP analysis of the trained model to identify which roughness scales contribute to the added skin-friction drag.While Δ + is arguably the most important flow statistic in practical sense, due to its relation to drag, roughness also affects higher order flow statistics particularly in the so-called 'roughness sub-layer' (Chung et al. 2021).In the present study, we specifically focus on comparing the turbulent and dispersive stresses over pairs of unfiltered and filtered samples A, B, and C from section 3.3 as the main means of momentum transfer away from the wall.
The velocity fluctuations in rough channels can be decomposed into turbulent and timeaveraged spatial fluctuations following the triple decomposition of the velocity field proposed by Raupach (1992):  11 (red colour).Only a minor difference between original and filtered roughness can be observed for the turbulent Reynolds stresses.For the < ′  ′ > + component, the peak values are comparable although, for samples B and C, filtering slightly increases the peak value.Roughness has previously been shown to damp inner-scaled streamwise turbulent stress that can be related to the suppression of elongated near-wall turbulent structures (Yuan & Piomelli 2014b;Forooghi et al. 2018a).This effect seems not to be affected significantly by elimination of drag-irrelevant large roughness scales.Moreover, the agreement can be observed for the the other two normal turbulent stress as well as the shear stress (< ′  ′ > + is not shown for the sake of brevity).Excellent agreement of wall-normal turbulent stress is reminiscent of the suggestion by Orlandi & Leonardi (2008) that roughness function is related to this component of turbulent stress.Furthermore, the collapse of shear stress profiles is an indication of similarity in the vertical mean momentum transport due to turbulence.The agreement of these components thus contribute to the concordance of the mean velocity profiles in the log-layer.
For the dispersive stresses it is apparent that the only component affected by filtering of roughness is the streamwise normal component < ũ ũ> + , for which the peak values are reduced by filtering.It is worth mentioning that same trend (reduction of the < ũ ũ> + peak values and agreement of other dispersive stresses) can be observed if an intrinsic averaging approach is used (not shown for brevity).Despite the possible shift of the zero-plane  0 after filtering, the peak of < ũ ũ> + is consistently observed at the vicinity of the respective zero-planes, i.e., at ( −  0 ) ≈ 0. The discernible reduction in this peak value suggests a less pronounced inhomogeneity of the mean streamwise velocity when larger wavelengths are filtered.Arguably, the large-scale undulations present in the original roughness lead to large scale variations in mean velocity, resulting in greater flow inhomogeneity as also pointed out by Yuan & Jouybari (2018).Despite the fact that the values of dispersive shear stress are small in all cases, a comparison among the three shown cases can provide certain insight into the the roughness-flow interactions.As depicted in figure 11 (c, f, i), roughness A exhibits a positive −< ũ ṽ> + peak, whereas roughness B and C display negative peaks.Such a negative sign can be attributed to the 'waviness effect' since a wavy structure (one with relatively low slope) causes an acceleration of the mean flow on the windward side and a deceleration on the leeward side (Alves Portela et al. 2021).Positive −< ũ ṽ> + , on the other hand, can be linked to recirculation behind steep roughness elements (Yuan & Jouybari 2018).This is in-line with the fact that roughness A has a much larger ES compared to the other two.The collapse of dispersive shear stress profiles in figure 11 shows that none of these behaviours are affected by the applied filtering.
The results shown so far indicate that the streamwise dispersive stress is the only secondorder one-point velocity statistic affected by filtering of drag-irrelevant scales.This however does not modify the shear stress profile as discussed above.To further elaborate this finding, joint p.d.f.s of local dispersive motions in the wall-parallel plane  =  0 are calculated for all the three samples along with their filtered counterparts and shown in figure 12.Here intrinsive averaging is used meaning that the areas inside roughness are excluded for calculation of the dispersive velocities shown in the joint p.d.f..The subscript  denotes intrinsic averaging.Following the idea of quadrant analysis (Wallace et al. 1972), the ũ+  -ṽ+  plane is divided into four quadrants, Q1-Q4, based on the signs of ũ+  and ṽ+  .While the joint p.d.f.look relatively similar before and after high-pass filtering, it is observed that filtering results in contours shrinking along the ũ+  -axis.This is in-line with the reduction of peak values of streamwise dispersive component discussed before.Notably, the joint p.d.f.retains its near-symmetry with respect to the ṽ+  -axis, which means that reduction of extreme ũ+  fluctuations shows no preference in the direction of momentum transfer.This results in the similar shape of the contours apart from horizontal stretching.An obvious outcome is that the shear stress profiles are unaffected by modifications in ũ+  .

Conclusions
In this study, we present a new approach for predicting the normalized equivalent sand-grain height   =   / 99 of homogeneous irregular roughness based on roughness p.d.f. and PS utilizing a machine learning ENN model.The model is developed within the AL framework to effectively reduce the required amount of training data.This framework searches for roughness samples with the highest prediction variances    in an unlabeled repository U of 4200 samples.Eventually, a labeled data set L comprising 85 AL-selected samples is constricted and utilized to derive the ENN model.The significant improvement of the learning efficiency of the model through AL is demonstrated by comparing it with a non-AL approach.Furthermore, it is observed that the employment of AL serves to effectively mitigate the deleterious effects of over-fitting, as evidenced by the observed general drop in prediction error.The mean prediction errors of the final AL-ENN model for an internal testing data set T inter , as well as two external data sets containing both realistic and artificially generated roughness, T ext, 1 and T ext, 2 , are 9.3%, 5.2%, and 10.2%, respectively.The consistently good predictions for testing data with different natures can be taken as a sign that a universal model is approached.more clarity, the two-dimensional spectra are azimuthally averaged and plotted in green color.Moreover, the locations of LRP-identified filters and the pre-multiplied roughness PSs are also added to the plots.Interestingly, all spectra show a significant decrease in the contribution of wavelengths larger than (wavenumbers smaller than) the filter.Note that all plots in figure 14 belong to the original cases with no influence from the filtering.A wavelength that is present in the roughness topography but absent in the blanketing layer depth is one to which the layer has adapted.Therefore, the fact that the spectrum drops for drag-irrelevant scales might suggest that those scales are the ones to which the blanketing layer can adapt.Despite the above discussion, further systematic investigations are required to establish a conclusive evidence as the present study covers limited ranges of roughness scales and Reynolds numbers.Ideally data on surfaces with more 'drag-irrelevant' large scales and at a much wider range of Reynolds numbers are required to establish a solid hypothesis.Additionally, one should bear in mind that the current results are obtained in the fully-rough regime, and as discussed by Busse et al. (2017), blanketing layer can behave differently at different regimes.

Figure 1 :
Figure 1: Schematic of the AL-framework.

Figure 2 :
Figure 2: Schematic of a single neural network in ENN.
Figure 3: PS (a) and p.d.f.(b) of 4200 roughness samples in the roughness repository (gray).The samples selected for training are distinguished with different colors.While AL model tends to explore the PS and p.d.f.domain, the EQ model contains samples that are placed closely to the known initial database.
Figure 4: (a) Prediction variance    obtained by three different models for all the samples in repository U. (b) The average error obtained by the three models for 10 high-variance samples and 10 low-variance samples in T inter (sorted based on the variance of the base model).The total averaged errors are displayed in the legend.Insets: The distribution of the statistical parameters as well as the corresponding   of the new samples with AL and EQ sampling strategies with identical color code.

Figure 5 :Figure 6 :
Figure 5: (a) Pair plots of roughness statistics.Lower left: The distribution of the samples in U (gray) and L (green).Diagonal: Histogram of single roughness statistics in U. Upper right: Joint probability distribution of statistics overlaid by test data in T inter (orange) and T ext,1&2 (purple).(b) Values of   =   / 99 obtained from DNS (ground truth) as a function of the selected statistics.Color code is same as in (a)

Figure 7 :
Figure 7: Height maps, p.d.f.s and discretized color-coded pre-multiplied roughness height PS of three exemplary samples A (a), B (b), and C (c).The PS are colored by the LRP contribution scores.

Figure 8 :
Figure 8: The original and high-pass filtered roughness (a, d, g), the pre-multiplied roughness height PS with the filtered scales indicated by gray shade (b, e, h), and inner-scaled mean velocity profiles out of DNS on the original and filtered roughness (c, f, i).Note that the DNS are carried out in minial channels.

Figure 9 :
Figure 9: Time averaged streamwise velocity distribution  + in selected -normal plane for the original and filtered cases A-C.The overlaid white contour lines mark the regions of reversed flow ( < 0).Blanketing layer (iso-contours of  + = 5) is displayed with red contour lines.The gray color represents the rough structures.The calculation of blanking layer depth Δ  + =5 is schematically illustrated in (a).

Figure 11 :
Figure 11: DA turbulent and dispersive stresses for Roughness A, B and C.

Figure 12 :
Figure 12: Joint p.d.f. of ũ+  and ṽ+  at plane  =  0 , values in roughness excluded.Contour lines range from 0.05 to 1.55 with step 0.1.Subscript  indicates being a result of intrinsic averaging.

Figure 13 :Figure 14 :
Figure 13: Examples of roughness samples included in L. Patches of same size extracted from different samples.(a∼e) correspond to initial round and AL round 1∼4, respectively.

Table 2 :
md /H  99 /H  ES  Corr /H Statistical properties of selected surfaces A, B and C