Multi-scale reconstruction of turbulent rotating flows with proper orthogonal decomposition and generative adversarial networks

Abstract Data reconstruction of rotating turbulent snapshots is investigated utilizing data-driven tools. This problem is crucial for numerous geophysical applications and fundamental aspects, given the concurrent effects of direct and inverse energy cascades. Additionally, benchmarking of various reconstruction techniques is essential to assess the trade-off between quantitative supremacy, implementation complexity and explicability. In this study, we use linear and nonlinear tools based on the proper orthogonal decomposition (POD) and generative adversarial network (GAN) for reconstructing rotating turbulence snapshots with spatial damages (inpainting). We focus on accurately reproducing both statistical properties and instantaneous velocity fields. Different gap sizes and gap geometries are investigated in order to assess the importance of coherency and multi-scale properties of the missing information. Surprisingly enough, concerning point-wise reconstruction, the nonlinear GAN does not outperform one of the linear POD techniques. On the other hand, the supremacy of the GAN approach is shown when the statistical multi-scale properties are compared. Similarly, extreme events in the gap region are better predicted when using GAN. The balance between point-wise error and statistical properties is controlled by the adversarial ratio, which determines the relative importance of the generator and the discriminator in the GAN training.

MSC Codes (Optional) Please enter your MSC Codes here

Introduction
The problem of reconstructing missing information, due to measurements constraints and lack of spatial/temporal resolution, is ubiquitous in almost all important applications of turbulence to laboratory experiments, geophysics, meteorology and oceanography (Asch et al. 2016;Le Dimet & Talagrand 1986;Torn & Hakim 2009;Bell et al. 2009;Krysta et al. 2011).For example, satellite imagery often suffers from missing data due to dead pixels and thick cloud cover (Shen et al. 2015;Zhang et al. 2018;Militino et al. 2019;Storer et al. 2022).In Particle Tracking Velocimetry (PTV) experiments (Dabiri & Pecora 2020), spatial gaps naturally occur due to the use of a small number of seeded particles.Additionally, in Particle Image Velocimetry (PIV) experiments, missing information can arise due to outof-pair particles, object shadows, or light reflection issues (Garcia 2011;Wang et al. 2016;Wen et al. 2019).Similarly, in many instances, the experimental probes are limited to assess only a subset of the relevant fields, asking for a careful apriori engineering of the most relevant features to be tracked.Recently, many data-driven Machine Learning tools have been proposed to fulfil some of the previous tasks.Research using these black-box tools is at its infancy and we lack systematic quantitative benchmarks for paradigmatic high-quality and high-quantity multi-scale complex datasets, a mandatory step to make them useful for the fluid-dynamics community.In this paper, we perform a systematic quantitative comparison among three data-driven methods (no information on the underlying equations) to reconstruct highly complex two-dimensional (2D) fields from a typical geophysical set-up, as the one of rotating turbulence.The first two methods are connected with a linear model reduction, the so-called Proper Orthogonal Decomposition (POD) and the third is based on a fully non-linear Convolutional Neural Network (CNN) embedding in a framework of Generative Adversarial Network (GAN) (Goodfellow et al. 2014;Deng et al. 2019;Subramaniam et al. 2020;Buzzicotti et al. 2021;Kim et al. 2021;Guastoni et al. 2021;Yousif et al. 2022;Buzzicotti & Bonaccorso 2022).POD is widely used for pattern recognition (Sirovich & Kirby 1987;Fukunaga 2013), optimization (Singh et al. 2001) and data assimilation (Romain et al. 2014;Suzuki 2014).To repair the missing data in a gappy field, Everson & Sirovich (1995) proposed GPOD, where coefficients are optimized according to the measured data outside the gap.By introducing some modifications to GPOD, Venturi & Karniadakis (2004) improved its robustness and made it reach the maximum possible resolution at a given level of spatio-temporal gappiness.Gunes et al. (2006) showed that GPOD reconstruction outperfroms the Kriging interpolation (Oliver & Webster 1990;Myers 2002;Gunes & Rist 2008).However, GPOD is essentially a linear interpolation and thus is in trouble when dealing with complex multi-scale and non-Gaussian flows as the ones typical of fully developed turbulence (Alexakis & Biferale 2018) and/or large missing areas (Li et al. 2021).
EPOD was first used in Maurel et al. (2001) on the PIV data of a turbulent internal engine flow, where the POD analysis is conducted in a sub-domain spanning only the central rotating region but the preferred directions of the jet-vortex interaction can be clearly identified.Borée (2003) generalized the EPOD and reported that EPOD can be applied to study the correlation of any physical quantity in any domain with the projection of any measured quantity on its POD modes in the measurement domain.EPOD has many applications of flow sensing, where flow predictions are made based on remote probes (Tinney et al. 2008;Hosseini et al. 2016;Discetti et al. 2019).For example, using EPOD as a reference of their CNN models, Guastoni et al. (2021) predicted the 2D velocity-fluctuation fields at different wall-normal locations from the wall-shear-stress components and the wall pressure in a turbulent open-channel flow.EPOD also provides a linear relation between the input and output fields.
In recent years, CNNs have made a great success in computer vision tasks (Niu & Suen 2012;Russakovsky et al. 2015;He et al. 2016) because of their powerful ability of handling nonlinearities (Hornik 1991;Kreinovich 1991;Baral et al. 2018).In fluid mechanics, CNN has also been shown as a encouraging technique for data prediction/reconstruction (Fukami et al. 2019;Güemes et al. 2019;Kim & Lee 2020;Li et al. 2023).Many researches devote to the super-resolution task, where CNNs are used to reconstruct high-resolution data from low-resolution data of laminar and turbulent flows (Liu et al. 2020;Subramaniam et al. 2020;Fukami et al. 2021;Kim et al. 2021).In the scenario where a large gap exists, missing both large-and small-scale features, Buzzicotti et al. (2021) reconstructed for the first time a set of 2D damaged snapshots of three-dimensional (3D) rotating turbulence with GAN.Recent works show that CNN or GAN is also feasible to reconstruct the 3D velocity fields with 2D observations (Matsuo et al. 2021;Yousif et al. 2022).GAN consists of two CNNs, a generator and a discriminator.Previous preliminary researches indicate that the introduction of discriminator significantly improves the high-order statistics of the prediction (Deng et al. 2019;Subramaniam et al. 2020;Buzzicotti et al. 2021;Kim et al. 2021;Güemes et al. 2021).At difference from the previous work (Buzzicotti et al. 2021), here we attack the problem of data reconstruction with GAN at changing the ratio between the input measurements and the missing information.Furthermore, we present a first systematic assessment of the non-linear vs. linear reconstruction methods, by showing also results using two different POD-based methods.We discuss and present novel results concerning both point-based and statistical metrics.Moreover, the dependency of GAN properties on the adversarial ratio is also systematically studied.The adversarial ratio gauges the relative importance of the discriminator in comparison to the generator throughout the training process.
Two factors make the reconstruction difficult.First, turbulent flows have a large number of active degrees of freedom which grows with the turbulent intensity, typically parameterized by the Reynolds number.The second factor is the spatio-temporal gappiness, which depends on the area and geometry of the missing region.In the current work we conduct a first systematic comparative study between GPOD, EPOD and GAN on the reconstruction of turbulence in the presence of rotation, which is a paradigmatic system with both coherent vortices at large scales and strong non-Gaussian and intermittent fluctuations at small scales (Alexakis & Biferale 2018;Buzzicotti et al. 2018;Di Leoni et al. 2020).Figure 1 displays some examples of the reconstruction task in this work.The aim is to fill the gap region with data close to the ground truth (figure 1(c) and (f)).A second long term goal would also be to systematically perform features ranking: understanding the quality of the supplied information on the basis of its performance in the reconstruction goal.The latter is connected to the sacred grail of turbulence: identifying the master degrees of freedoms driving turbulent flow, connected also to control problems (Choi et al. 1994;Lee et al. 1997;Gad-el Hak & Tsai 2006;Brunton & Noack 2015;Fahland et al. 2021).The study presented in this work is a first step towards a quantitative assessment of the tools that can be employed to ask and answer this kind of questions.In order to focus on two paradigmatic realistic set-ups, we study two gap geometries, a central square gap (figure 1(a,d)) and random gappiness (figure 1(b,e)).The latter is related to practical applications such as PTV and PIV.The gap area is also varied from a small to an extremely large proportion up to the limit where only one thin layer is supplied at the border, a seemingly impossible reconstructing task, for evaluation of the three methods on different situations.In a recent work, Clark Di Leoni et al. (2022) used Physics-Informed Neural Networks (PINNs) for reconstruction with sparse and noisy particle tracks obtained experimentally.As in practice the measurements are always noisy or filtered, we also investigate the robustness of the EPOD and GAN reconstruction methods.
The paper is organized as follows.Section 2.1 describes the dataset and the reconstruction problem set-up.The GPOD, EPOD and GAN-based reconstruction methods are introduced in §2.2, §2.3 and §2.4,respectively.In §3, the performances of POD-and GAN-based methods on turbulence reconstruction are systematically compared when there is one central square gap of different sizes.We address the dependency on the adversarial ratio for the GAN-based reconstruction in §4 and show results for random gappiness from GPOD, EPOD and GAN in §5.The robustness of EPOD and GAN to measurement noise and the computational cost of all methods are discussed in §6.Finally, conclusions of the work are presented in §7.

Dataset and reconstruction problem set-up
For the evaluation of different reconstruction tools, we use a dataset from the TURB-Rot (Biferale et al. 2020) open database.The dataset used in this study is generated from a direct numerical simulation (DNS) of the Navier-Stokes equations for the homogeneous incompressible flow in the presence of rotation with periodic boundary conditions (Godeferd & Moisy 2015;Seshasayanan & Alexakis 2018;Pouquet et al. 2018;van Kan & Alexakis 2020;Yokoyama & Takaoka 2021).In a rotating frame of reference, both Coriolis and centripetal accelerations must be taken into account.However, the centrifugal force can be expressed as the gradient of the centrifugal potential and included in the pressure gradient term.In this way, the resulting equations will explicitly show only the presence of the Coriolis force, while the centripetal term is absorbed into a modified pressure (Cohen & Kundu 2004).The simulation is performed using a fully dealiased parallel pseudo-spectral code in a 3D ( 1 - 2 - 3 ) periodic domain of size [0, 2) 3 with 256 grid points in each direction, as shown in the inset of figure 2(b).Denote  0 = 2 as the domain size, the Fourier spectral wave number is  = ( 1 ,  2 ,  3 ), where  1 = 2 1 / 0 ( 1 ∈ Z) and one can similarly obtain  2 Focus on Fluids articles must not exceed this page length and  3 .The governing equations can be written as where  is the incompressible velocity,  =  x3 is the system rotation vector, p =  + 1 2 ∥ × ∥ 2 is the pressure in an inertial frame modified by a centrifugal term,  is the kinematic viscosity and  is an external forcing mechanism at scales around   = 4 via a second-order Ornstein-Uhlenbeck process (Sawford 1991;Buzzicotti et al. 2016).Figure 2(a) plots the energy evolution with time of the whole simulation.The energy spectrum  () = 1 2 ⩽∥ ∥ <+1 ∥ û( )∥ 2 averaged over time is shown in figure 2(b), where the gray area indicates the forcing wave numbers.To enlarge the inertial range between   and the Kolmogorov dissipative wave number,   = 32, which is picked as the scale where  () starts to decay exponentially, the viscous term ∇ 2  in equation (2.1) is replaced with a hyperviscous term  ℎ ∇ 4  (Haugen & Brandenburg 2004;Frisch et al. 2008).We define an effective Reynolds number as  eff = ( 0 /  ) −3/4 ≈ 13.45, with the smallest wave number  0 = 1.A linear friction term − acting only on wave numbers ∥  ∥ ⩽ 2 is also used in r.h.s. of (2.1) to prevent a large-scale condensation (Alexakis & Biferale 2018).As shown in figure 2(a), the flow reaches a stationary state with a Rossby number  = E 1/2 /(/  ) ≈ 0.1, where E is the kinetic energy.The integral length scale is  = E/ ∫   () d ∼ 0.15 0 and the integral time scale is  = /E 1/2 ≈ 0.185.Readers can refer to Biferale et al. (2020) for more details on the simulation.
The dataset is extracted from the above simulation as follows: First, we sampled 600 snapshots of the whole 3D velocity field from time  = 276 up to  = 876 for training and validation, and we sampled 160 3D snapshots from  = 1516 to  = 1676 for testing, as shown in figure 2(a).A sampling interval Δ  ≈ 5.41 is used to decrease correlations in time between two successive snapshots.
To reduce the amount of data to be analyzed, the resolution of sampled fields is downsized from 256 3 to 64 3 by a spectral low-pass filter where the cut-off is the Kolmogorov dissipative wave number such as to only eliminate  the fully dissipative degrees of freedom where the flow becomes smooth (Frisch 1995).Therefore, there is not a loss of data complexity in this procedure and it also indicates that the measurement resolution corresponds to   .
For each downsized 3D field, we selected 16 horizontal ( 1 - 2 ) planes at different  3 , each of which can be augmented to 11 (for training and validation) or 8 (for testing) different ones by randomly shifting it along both  1 and  2 directions using periodic boundary conditions (Biferale et al. 2020).Therefore, a total of 176 or 128 planes can be obtained at each instant of time.
Finally, the 105600 planes sampled at early times are randomly shuffled and used to constitute the Train/Validation split: 84480 (80%)/10560 (10%), which is used for the training process, while the other 20480 planes sampled at later times are used for the testing process.
The parameters of the dataset are summarized in table 1.In the present study, we only reconstruct the velocity module,  = ∥∥, which is always positive.Note that we restrict our study to 2D horizontal slices in order to make contact with geophysical observation, although GPOD, EPOD and GAN are feasible to 3D data.
We next describe the reconstruction problem set-up.Figure 3 presents an example of a gappy field, where ,  and  represent the whole region, the gap region and the known region, respectively.Given the damaged area   , we can define the gap size as  = √   .As shown in figure 1, two gap geometries are considered: i) a square gap located at the center and ii) random gappiness which spreads over the whole region.Once the positions in  are determined,  is fixed for all planes over the training and the testing processes.Note that the GAN-based reconstruction can also handle the case where  is randomly changed for different planes (not shown).For a field () defined on , we define the supplied measurements in  as   () = () (with  ∈ ), and the ground truth or the predicted field in , as  ( )   () = () or  ( )  () (with  ∈ ).The reconstruction models are 'learned' with the training data defined on the whole region .Once the training process completed, one can evaluate the models by comparing the prediction and the ground truth in  over the test dataset.2: Summary of the optimal  ′ for the square gap (s.g.) and random gappiness (r. g.) with different sizes.

GPOD reconstruction
This section briefly presents the procedure of GPOD.The first step is to conduct POD analysis with the training data on the whole region , namely solving the eigenvalue problem where ) is the correlation matrix, given ⟨•⟩ as the average over training dataset.We denote   as the eigenvalues and   () as the POD eigenmodes, where  = 1, . . .,   and   =   1 ×   2 is the number of points in .For the homogeneous periodic flow considered in this study, it can be demonstrated that the POD modes correspond to Fourier modes, and their spectra are identical (Holmes et al. 2012).In all POD analyses of the present study, the mean of () is not removed.Any realization of the field can be decomposed as with the POD coefficients (2.6) In the case when we have data only in , the relation (2.6) cannot be used and one can adopt the dimension reduction by keeping only the first  ′ POD modes and minimize the distance between the measurements and the linear POD decomposition (Everson & Sirovich 1995), to obtain the predicted coefficients { ( )  }  ′ =1 .Then the GPOD prediction can be given as We optimize the value of  ′ during the training phase by requiring a minimum mean  2 distance with the ground truth in the gap: (2.9) Table 2 summarizes the optimal  ′ used in this study.An analysis of reconstruction error for different  ′ is conducted in Appendix A. Let us notice that there also exists a different approach to select, frame-by-frame, a subset of POD modes to be used in the GPOD approach, based on Lasso, a regression analysis that performs mode selection with regularization (Tibshirani 1996).Results using this second approach do not show any significant improvement in a typical case of our study (see Appendix B).

EPOD reconstruction
To use EPOD for flow reconstruction, we first compute the correlation matrix and solve the eigenvalue problem to obtain the eigenvalues   and the POD eigenmodes   (), where  = 1, . . .,   and   equals to the number of points in .We remark that   () are not Fourier modes, as the presence of the internal gap breaks the homogeneity.Any realization of the measured field in  can be decomposed as where the -th POD coefficient is obtained from (2.13) Furthermore, with (2.12) and an important property (Borée 2003), ⟨    ⟩ =      , one can derive the following identity: (2.14) Here, we reiterate that ⟨•⟩ denotes the average over the training dataset.Specifically, ⟨    ⟩ can be interpreted as (1/ train )  train =1  ()   ()  , where the superscript  represents the index of a particular snapshot.The Extended POD mode is defined by replacing   () with the field to be predicted  ( )   () in (2.14): Once obtained the set of EPOD modes (2.15) in the training process one can start the reconstruction of a test data with the measurement   () from calculating the POD coefficients (2.13) and the prediction in  can be obtained as the correlated part with   () (Borée 2003): (2.16)

GAN-based reconstruction with Context Encoders
In a previous work, Buzzicotti et al. (2021) used a context encoder (Pathak et al. 2016) embedding in GAN to generate missing data for the case where the total gap size is fixed, but with different spatial distributions.To generalize the previous approach to study gaps of different geometries and sizes, here we extend previous GAN architecture by adding one .The latent feature represents the output vector of the encoder with 4096 neurons in figure 4, extracted from the input with the convolutions and nonlinear activations.To constrain the predicted velocity module being positive, a Rectified Linear Unit (ReLU) activation function is adopted at the last layer of generator.The discriminator acts as a 'referee' functional  (•), which takes either  ( )   () or  ( )  () and outputs the probability that the provided input ( ( )  or  ( )  ) belongs to the real turbulent ensemble.The generator is trained to minimize the following loss function: where the  2 loss is the mean squared error (MSE) between the prediction and the ground truth.It is important to stress that on the contrary of the GPOD case, here the supervised  2 loss is calculated only inside the gap region .The hyper-parameter   is called the adversarial ratio and the adversarial loss is where (  ) is the probability distribution of the field in  over the training dataset and   (  ) is the probability distribution of the predicted field in  given by the generator.At the same time, the discriminator is trained to maximize the cross entropy based on its classification prediction for both real and predicted samples, (2.20) where   (  ) is the probability distribution of the ground truth,  ( )  ().Goodfellow et al. (2014) further showed that the adversarial training between generator and discriminator with   = 1 in (2.17) minimizes the Jensen-Shannon (JS) divergence between the real and the predicted distributions, JSD(   ∥   ).Refer to (3.6) for the definition of the JS divergence.Therefore, the adversarial loss helps the generator to produce predictions that are statistically similar to real turbulent configurations.It is important to stress that the adversarial ratio   , which controls the weighted summation of L MSE and L  , is tuned to reach a balance between the MSE and turbulent statistics of the reconstruction (see §4).More details about the GAN are discussed in Appendix C, including the architecture, hyper-parameters and the training schedule.

Comparison between POD-and GAN-based reconstructions
To conduct a systematic comparison between POD-and GAN-based reconstructions, we start by studying the case with a central square gap of various sizes (see figure 1).All reconstruction methods are first evaluated with the predicted velocity module itself, which is dominated by the large-scale coherent structures.The predictions are further assessed from a multi-scale perspective, with the help of the gradient of the predicted velocity module, spectral properties and multi-scale flatness.Finally, the performance on predicting extreme events is studied for all methods.

Large-scale information
In this section, the predicted velocity module in the missing region is quantitatively evaluated.First we consider the reconstruction error and define the normalized MSE in the gap as where ⟨•⟩ represents hereafter the average over the test data.The normalization factor is defined as and  ( )  is defined similarly.With the specific form of    , predictions with too small or too large energy will give a large MSE.To provide a baseline for MSE, a set of predictions can be made by randomly sampling the missing field from the true turbulent data.In other words, the baseline comes from uncorrelated predictions that are statistically consistent with the ground truth.The baseline value is around 0.54, see Appendix D. Figure 5  then the same calculation is repeated over 160 different batches, from which we calculate the MSE mean and its range of variation.EPOD and GAN reconstructions provide similar MSEs except at the largest gap size, where GAN has a little bit larger MSE than EPOD.Besides, both EPOD and GAN have smaller MSEs than GPOD for all gap sizes.Figure 6 shows the probability density function (PDF) of the spatially averaged  2 error in the missing region for one flow configuration where is the normalized point-wise  2 error.The PDFs are shown for three different gap sizes / 0 = 24/64, 40/64 and 62/64.Clearly, the PDFs concentrating on regions of smaller Δ  correspond to the cases with smaller MSEs in figure 5. To further study the performance of the three tools, we plot the averaged point-wise  2 error, ⟨Δ   ()⟩, for a square gap of size / 0 = 40/64 in figure 7. It shows that GPOD produces large ⟨Δ   ⟩ all over the gap, while EPOD and GAN behave quite better, especially for the edge region.Moreover, GAN generates smaller ⟨Δ   ⟩ than EPOD in the inner area (figure 7(b) and (c)).However, the  2 error is naturally dominated by the more energetic structures (the ones found at large scales in our turbulent flows) and does not provide an informative evaluation of the predicted fields at multiple scales, which is also important for assessing the reconstruction tools for the turbulent data.Indeed, from figure 8 it is possible to see in a glimpse that the PODand GAN-based reconstructions have completely different multi-scale statistics which is not captured by the MSE. Figure 8 shows predictions of an instantaneous velocity module field based on GPOD, EPOD and GAN methods compared with the ground truth solution.For all three gap sizes / 0 = 24/64, 40/64 and 62/64, GAN produces realistic reconstructions while GPOD and EPOD only generates blurry predictions.Besides, there are also obvious discontinuities between the supplied measurements and the GPOD predictions of the missing part.This is clearly due to the fact that the number of POD modes  ′ used for prediction in  (2.8) is limited, as there are only   measured points available in (2.7) for each damaged data (thus  ′ <   ).Moreover, minimizing the  2 distance from ground truth in (2.9) results in solutions with almost the correct energy contents but without the complex multi-scale properties.Unlike GPOD using global basis defined on the whole region , EPOD gives better results by considering the correlation between fields defined on two smaller regions,  and .In this way the prediction (2.16) has the degrees of freedom equal to   , which are larger than those for GPOD.Therefore, EPOD can predict the large-scale coherent structures but is still limited in generating correct multi-scale properties.Specifically, when the gap size is extremely large,   is very small thus both GPOD and EPOD have small degrees of freedom to make realistic predictions.
To quantify the statistical similarity between the predictions and the ground truth, we can study the JS divergence, JSD(  ) = JSD(PDF( ( )   ) ∥ PDF( ( )  )), defined on the distribution of the velocity amplitude in one point, which is a marginal distribution of the whole PDF of the real or predicted fields inside the gap,   or   .For distributions  and  of a continuous random variable , the JS divergence is a measure of their similarity, where  = 1 2 ( + ) and is the Kullback-Leibler divergence.A small JS divergence indicates that the two probability distributions are close and vice versa.We use the base 2 logarithm and thus 0 ⩽ JSD( ∥ ) ⩽ 1 , with JSD( ∥ ) = 0 if and only if  = .Similar to the MSE, the JS divergence is calculated using batches of data and 10 different batches are used to obtain its mean and range of variation.The batch size used to evaluate the JS divergence is now set at 2048, which is larger than that used for the MSE, in order to improve the estimation of the probability distributions.Figure 9 shows JSD(  ) for the three reconstruction tools.We have found that GAN gives smaller JSD(  ) than GPOD and EPOD by an order of magnitude over almost the full range of gap sizes, indicating that the PDF of GAN prediction has a better correspondence to the ground truth.This is further shown in figure 10 where we present the PDFs of the predicted velocity module for different gap sizes compared with that of the original data.Besides the imprecise PDF shapes of GPOD and EPOD, we note that they are also predicting some negative values, which is unphysical for a velocity module.This problem is avoided in the GAN reconstruction, as a ReLU activation function has been used in the last layer of the generator.

Multi-scale information
This section reports a quantitative analysis of the multi-scale information reconstructed by the three methods.We first study the gradient of the predicted velocity module in the missing region,   / 1 .Figure 11 plots MSE(  / 1 ), which is similarly defined as (3.1), and we can see that all methods produce MSE(  / 1 ) with values much larger than those of MSE(  ).Moreover, GAN shows similar errors with GPOD at the largest gap size and with EPOD at small gap sizes.However, MSE itself is not enough for a comprehensive evaluation of the reconstruction.This can be easily understood again by looking at the gradient of different reconstructions shown in figure 12.It is obvious that GAN predictions are much more 'realistic' than those of GPOD and EPOD, although their values of MSE(  / 1 ) are close.Indeed, if both fields are highly fluctuating, even a small spatial shift between the reconstruction and the true solution would result in a significantly larger MSE.This is exactly the case of GAN predictions where we can see that they have obvious correlations with the ground truth but the MSE is large because of its sensitivity to small spatial shifting.On the other hand, the GPOD or EPOD solutions are inaccurate, having too small spatial fluctuations even with a similar MSE when compared with the GAN.As done above for the velocity amplitude, here we further quantify the quality of the reconstruction by looking at the JS divergence between the two PDFs in figure 13.For other metrics to assess the quality of the predictions see, e.g.(Wang et al. 2004;Wang & Simoncelli 2005;Li et al. 2023).
Figure 13 confirms that GAN is able to well predict the PDF of   / 1 while GPOD and EPOD do not have this ability.Moreover, GPOD produces comparable JSD(  / 1 ) with EPOD.The above conclusions are further supported in figure 14, which shows PDFs Figure 14: PDFs of the gradient of the reconstructed velocity module in the missing region obtained from GPOD, EPOD and GAN for a square gap with different sizes.PDF of the original data over the whole region is plotted for reference and (/ 1 ) is the standard deviation of the original data.
of   / 1 from the predictions and the ground truth.We next compare the scale-by-scale energy budget of the original and reconstructed solutions in figure 15, with the help of the energy spectrum defined over the whole region, where  = ( 1 ,  2 ) is the wave number, û( ) is the Fourier transform of velocity module and û * ( ) is its complex conjugate.To highlight the reconstruction performance as a function of the wave number, we also show the ratio between the reconstructed and the original spectra,  ()/ ( ) () for the three different gap sizes on the second row of figure 15. ground truth calculated over the whole region is also shown for reference.We remark that the flatness is used to characterize the intermittency in the turbulence community (see Frisch 1995).It is determined by the two-point PDFs, PDF(  ), connected to the distribution of the whole real or generated fields inside the gap,   or   .Figures 15 and 16 show that GAN performs well to reproduce the multi-scale statistical properties, except at small scales for large gap sizes.However, GPOD and EPOD can only predict a good energy spectrum for the small gap size / 0 = 24/64 but fail at all scales for both the energy spectrum and flatness at gap sizes / 0 = 40/64 and 62/60.

Extreme events
In this section, we focus on the ability of the different methods to reconstruct extreme events inside the gap for each frame.In figure 17 we present the scatter plots of the largest values of velocity module or its gradient measured in the gap region from the original data and the predicted fields generated by GPOD, EPOD or GAN.On top of each panel we report the scatter plot correlation index, defined as+ where | sin | = ∥ × ∥/∥∥ with  as the angle between the unit vector  = (1/ √ 2, 1/ √ 2) and  = (max( ( )   ), max( ( )  )).The  for   / 1 can be similarly defined.It is obvious that  ∈ [0, 1] and  = 1 corresponds to a perfect prediction in terms of the extreme events.In figure 17, it shows that for both extreme values of velocity module and its gradient, GAN is the least biased while the other two methods tend to underestimate them.

Dependency of GAN-based reconstruction on the adversarial ratio
As shown by the previous results, GAN is certainly superior regarding metrics evaluated in this study.This supremacy is given by the fact that with the non-linear CNN structure of the generator, GAN optimizes the point-wise  2 loss and minimizes the JS divergence between the probability distributions of the real and generated fields with the help of the adversarial discriminator (see §2.4).To study the effects of the balancing between the above two objectives on reconstruction quality, we have performed a systematic scanning of the GAN performances at changing the adversarial ratio   , the hyper-parameter controlling the relative importance of  2 loss and adversarial loss of the generator, as shown in equation (2.17).We consider a central square gap of size / 0 = 40/64 and train the GAN with different adversarial ratios, where   = 10 −4 , 10 −3 , 10 −2 and 10 −1 .Table 3 shows the Table 3: The MSE and the JS divergence between PDFs for the original and generated velocity module inside the missing region, obtained from GAN with different adversarial ratios for a square gap of size / 0 = 40/64.The results for GPOD and EPOD are provided as well for comparison.The MSE and JS divergence are computed over different test batches, specifically of sizes 128 and 2048, respectively.From these computations, we obtain both the mean values and the error bounds.values of MSE(  ) and JSD(  ) obtained at different adversarial ratios.It is obvious that the adversarial ratio controls the balance between the point-wise reconstruction error and the predicted turbulent statistics.As the adversarial ratio increases, the MSE increases while the JS divergence decreases.PDFs of the predicted velocity module from GANs with different adversarial ratios are compared with that of the original data in figure 18, which shows that the predicted PDF gets closer to the original one with a larger adversarial ratio.The above results clearly show that there exists an optimal adversarial ratio to satisfy the multi-objective requirements of having a small  2 distance and a realistic PDF.In the limit of vanishing   , the GAN outperforms GPOD and EPOD in terms of MSE, but falls behind them concerning JS divergence (table 3).

Dependency on gap geometry: random gappiness
Things change when looking at a completely different topology of the damages.Here we study the case ii) in §2.1, where position points are removed randomly in the original domain , without any spatial correlations.Because the random gappiness is easier for interpolating than a square gap of the same size, all reconstruction methods show good and comparable results in terms of the MSE, the JS divergence and PDFs for velocity module (figures 19 and 20).For almost all damaged densities, POD-and GAN-based methods give small values The training process of EPOD is computationally cheaper than that of GPOD.The cost can be estimated as O ( 2 train   ) considering a SVD to solve (2.11) and the linear algebra operations in (2.13) and (2.15).The testing process of EPOD consists of carrying out (2.13) and (2.16), with a computational cost of O ( test     ).GAN is the most computationally expensive method.It has about 5 × 10 6 (≫  train ) trainable parameters, which are involved in the forward and the backward propagation for all the training data in one epoch.Moreover, hundreds of epoch are required for the convergence of GAN.However, benefiting from the GPU hardware, GAN training requires only 4 hours on an A100 Nvidia GPU.Once trained, all methods are highly efficient in performing reconstruction.It is important to emphasize that any improvement over existing methods is valuable, regardless of the computational cost involved.Even when computational resources are not a constraint, GPOD and EPOD cannot further improve the accuracy of the reconstruction.This limitation is attributed to the linear estimation of the flow state inherent in these methods.Nevertheless, there is still potential for further improvement of the GAN results, as numerous hyper-parameters remain to be fine-tuned.These hyper-parameters include aspects such as the depth of the networks, the dimension of the latent feature, etc.

Conclusions
In this work, two linear POD-based approaches, GPOD and EPOD, are compared against GAN, consisting of two adversarial non-linear CNNs, to reconstruct 2D damaged fields taken from a database of 3D rotating turbulent flows.Performances have been quantitatively judged on the basis of (i)  2 distance between each the ground truth and the reconstructed field, (ii) statistical validations based on JS divergence between the one-point PDFs, (iii) spectral properties and multi-scale flatness, and (iv) extreme events for a single frame.For one central square gap the GAN approach is proved to be superior to GPOD and EPOD, when both MSE and JS divergence are simultaneously considered, in particular for large gap sizes where the missing of multi-scale information makes the task extremely difficult.Moreover, GAN predictions are also better in terms of the energy spectra and flatness, as well as for the predicted extreme events.In the presence of random damages, the three approaches give similar results except for the case of extreme gappiness where GAN is leading again.
GPOD always generates 'discontinuous' predictions with respect to the supplied measure-ments.This is because GPOD only minimizes the  2 distance and the optimal number of POD modes used is usually much smaller than the number of measured points.On the other hand, EPOD considers the correlation between the fields inside and outside the gap and its predictions have a number of degrees of freedom equal to the number of measured points.Compared with GPOD, EPOD is less computationally demanding and generates better predictions.When the gap is extremely large, neither GPOD nor EPOD gives satisfying predictions as they have too few degrees of freedoms.
With the help of adversarial training, GAN can optimize a multi-objective problem, minimizing simultaneously the  2 distance frame by frame and the JS divergence between the real and generated distributions of the whole fields in the missing region.Furthermore, we show that for GAN reconstructions, large adversarial ratios undermine the MSE but improve the generated statistical properties and vice versa.
In terms of the potential for practical applications of the three tools analyzed in this study, we have demonstrated that both EPOD and GAN exhibit robust properties when faced with noisy multi-scale measurements.It is also worth noting that in many applications, gaps can also arise in the Fourier space.This typically occurs when we encounter measurement noise or modeling limitations at high wave numbers.In such situations, we face a super-resolution problem where we need to reconstruct the missing small-scale information.
Our work is a first step toward the set-up of benchmarks and grand challenges for realistic turbulent problems with interest in geophysical and laboratory applications, where the lack of measurements obstructs the capability to fully control the system.Many questions remain open, connected to the performance of different GAN architectures, and the difficulty of having apriori estimates of the deepness and complexity of the GAN architecture as a function of the complexity of the physics, in particular concerning the quantity and the geometry (2D or 3D) of the missing information.Furthermore, little is known about the performance of the data-driven models as a function of the Reynolds or Rossby numbers, and the possibility to supply physics information to help to further improve the network's performances. and With  ′ as the number of POD modes kept for dimension reduction, the POD decomposition can be written in the vector form where the definitions of , ,  ′ , ,  ′ and  ′ are shown below:   Lasso penalizes the  1 norm of the coefficients and tends to produce some coefficients that are exactly zero, which is similar to finding a best subset of POD modes that does not necessarily consist of the leading ones.The hyper-parameter  controls regularization strength and we estimate  by five-fold cross-validation (Efron & Tibshirani 1994) with the data in  during the reconstruction process.
With this approach, we conducted a reconstruction experiment for a square gap of size / 0 = 40/64 for illustration and it is not our intention to perform a systematic investigation at changing the geometry and area of the gap. Figure 26 (left) shows the PDF of the estimated value of  over the test data for Lasso regression.Table 4 shows that the GPOD reconstructions with DR and Lasso give similar values of MSE(  ) and JSD(  ).DR gives a nonzero spectrum up to  ′ = 12, while Lasso selects both large-and small-scale modes with a wide range of indices.This can be further shown with the reconstruction in figure 27 (right), where DR only predicts 'smooth' structures given by the leading POD modes and Lasso generates predictions with multiple scales.

Figure 1 :
Figure 1: Examples of the reconstruction task on 2D slices of 3D turbulent rotating flows, where the color code is proportional to the velocity module.Two gap geometries are considered: (a)(d) a central square gap and (b)(e) random gappiness.Gaps in each row have the same area and the corresponding ground truth is shown in (c) and (f).We denote  as the gap region and  =  \  as the known region, where  is the entire 2D image.

Figure 2 :
Figure 2: (a) Energy evolution of the turbulent flow, where we also show the sampling time periods for the training/validation and testing data.(b) The averaged energy spectrum.The gray area indicates the forcing wave numbers and   is the Kolmogorov dissipative wave number where  () starts to decay exponentially.The inset shows an instantaneous visualization of velocity module with the frame of reference for the simulation.

Figure 4 :
Figure 4: Architecture of generator and discriminator network for flow reconstruction with a square gap.The kernel size  and the corresponding stride  are determined based on the gap size .Similar architecture holds for random gappiness as well.

Figure 5 :
Figure 5: The MSE of the reconstructed velocity module from GPOD, EPOD and GAN in a square gap with different sizes.The abscissa is normalized by the domain size  0 .Red horizontal line represents the uncorrelated baseline.

Figure 8 :
Figure 8: Reconstruction of an instantaneous field (velocity module) by the different tools for a square gap of sizes / 0 = 24/64 (1st row), 40/64 (2nd row) and 62/64 (3rd row).The damaged fields are shown in the 1st column, while the 2th to 4th columns show the reconstructed fields obtained from GPOD, EPOD and GAN.The ground truth is shown in the 5th column.

Figure 9 :Figure 10 :
Figure9: The JS divergence between PDFs of the velocity module inside the missing region from the original data and the predictions obtained from GPOD, EPOD and GAN for a square gap with different sizes.

Figure 11 :Figure 12 :
Figure 11: The MSE of the gradient of the reconstructed velocity module from GPOD, EPOD and GAN in a square gap with different sizes.Red horizontal line represents the uncorrelated baseline.Damaged GPOD EPOD GAN Original

Figure 15 :Figure 16 :
Figure15: Energy spectra of the original velocity module and the reconstructions obtained from GPOD, EPOD and GAN for a square gap of different sizes (1st row).The corresponding  ()/( ) () is shown on the 2nd row, where  () and  ( ) () are the spectra of the reconstructions and the ground truth, respectively.

Figure 17 :
Figure 17: Scatter plots of the maximum values of velocity module (1st row) or its gradient (2nd row) in the missing region obtained from the original data and the one produced by GPOD, EPOD or GAN for a square gap of size / 0 = 40/64.Colors are proportional to the density of points in the scatter plot.The correlation indices are shown on top of each panel.

Figure 18 :
Figure18: PDFs of the reconstructed velocity module inside the gap region, which is obtained from GAN with different adversatial ratios, for a square gap of size / 0 = 40/64.

Figure 22 :Figure 23 :Figure 24 :
Figure 22: Reconstruction of an instantaneous field (velocity module) by the different tools for random gappiness of sizes / 0 = 60/64 (1st row) and / 0 = 62/64 (2nd row).The corresponding gradient fields are shown in the 3rd and 4th rows.The damaged fields are shown in the 1st column, while the 2th to 4th columns show the fields obtained from GPOD, EPOD and GAN.The ground truth is shown in the 5th column.
Figure 26 (right) also shows that the PDFs of their predicted velocity module are comparable.The difference between DR and Lasso can be illustrated by the spectra of the predicted POD coefficients of an instantaneous field with a square gap of size / 0 = 40/64, as shown in figure 27 (left).

Figure 26 :
Figure26: PDF of the estimated value of  over the test data for Lasso regression (left) and PDFs of the velocity module from the ground truth and that from the missing region obtained from GPOD with DR and Lasso (right) for a square gap of size / 0 = 40/64.

Figure 27 :
Figure 27: The spectra of the predicted POD coefficients obtained from the GPOD with DR and Lasso for an instantaneous field with a square gap of size / 0 = 40/64 (left).The corresponding damaged, reconstructed and original velocity module fields with their gradient fields are shown on the right.

Table 1 :
Description of the dataset used for the evaluation of reconstruction methods.Here,   1 and   2 indicate the resolution of the horizontal plane.The number of fields for training/validation/testing is denoted as  train / valid / test .The sampling time periods for training/validation and testing are  train and  test , respectively.