Prediction and control of two-dimensional decaying turbulence using generative adversarial networks

Abstract An accurate prediction of turbulence has been very costly since it requires an infinitesimally small time step for advancing the governing equations to resolve the fast-evolving small-scale motions. With the recent development of various machine learning (ML) algorithms, the finite-time prediction of turbulence became one of promising options to relieve the computational burden. Yet, a reliable prediction of the small-scale motions is challenging. In this study, PredictionNet, a data-driven ML framework based on generative adversarial networks (GANs), was developed for fast prediction of turbulence with high accuracy down to the smallest scale using a relatively small number of parameters. In particular, we conducted learning of two-dimensional (2-D) decaying turbulence at finite lead times using direct numerical simulation data. The developed prediction model accurately predicted turbulent fields at a finite lead time of up to half the Eulerian integral time scale over which the large-scale motions remain fairly correlated. Scale decomposition was used to interpret the predictability depending on the spatial scale, and the role of latent variables in the discriminator network was investigated. The good performance of the GAN in predicting small-scale turbulence is attributed to the scale-selection and scale-interaction capability of the latent variable. Furthermore, by utilising PredictionNet as a surrogate model, a control model named ControlNet was developed to identify disturbance fields that drive the time evolution of the flow field in the direction that optimises the specified objective function.


Introduction
Turbulence is a multiscale nonlinear dynamic phenomenon frequently observed in various flows in nature and industry.Certain deterministic dynamic features such as coherent structures have been found in turbulence (Hussain 1986); however, the behaviour of turbulence in most flows is chaotic.These characteristics make the accurate prediction of turbulence challenging despite that governing equations for turbulence, called the Navier-Stokes equations, exists.With the qualitative and quantitative expansion of computing resources over the past decades, various numerical approaches have been proposed.Direct numerical simulation (DNS), a full-resolution approach that can provide the most detailed description, is restricted to low Reynolds numbers.Moreover, the spatial filtering strategy of large eddy simulation (LES) and the temporal averaging approach of the Reynolds-averaged Navier-Stokes (RANS) model, which provide relatively fast solutions, lack reliable general closure models.Most importantly, these traditional numerical approaches are based on the temporal advancement of the governing partial differential equations, which is still costly to be used in practice, even with ever-increasing computing power.
Recently, ML-and other data-driven approaches have become popular in many areas of science, engineering, and technology owing to their effectiveness and efficiency in dealing with complex systems.Certainly, efforts have been made to apply ML to turbulence problems, particularly in fluid dynamics (Duraisamy et al. 2019;Brenner et al. 2019;Brunton et al. 2020).Other studies have attempted to develop new closure models for turbulence models using ML, such as subgrid-scale models (Gamahara & Hattori 2017;Beck et al. 2018;Maulik et al. 2018;Guan et al. 2021;Kim et al. 2022).Moreover, wall models for various flows in LES have been proposed (Yang et al. 2019;Zhou et al. 2021;Bae & Koumoutsakos 2022;Dupuy et al. 2023;Vadrot et al. 2023;Lozano-Durán & Bae 2023), and improvements in closure models for RANS have been attempted (Duraisamy et al. 2015;Ling et al. 2016;Parish & Duraisamy 2016;Singh et al. 2017;Wu et al. 2018;Duraisamy et al. 2019;Zhu et al. 2019).Some attempts have yielded promising results; however, more exploration is required to secure reliable general closure models.
Another approach using ML to predict turbulence dynamics based on reduced-order modeling (ROM) has been proposed.With low-dimensional representations obtained through mathematical decomposition, such as proper orthogonal decomposition (Sirovich 1987), Koopman operator (Mezić 2005(Mezić , 2013) ) and dynamic mode decomposition (Schmid 2010), or using recent network architectures such as autoencoder (AE) and convolutional neural networks (CNNs), the governing dynamics of latent space are trained using dynamic models, such as recurrent neural networks (RNNs).For example, Hennigh (2017) developed a model called Lat-Net based on the lattice Boltzmann method data by applying an AE structure.King et al. (2018) developed a compressed convolutional long short-term memory (LSTM) combining AE and LSTM and showed that dynamic approaches such as Lat-Net and their method are more effective in reflecting turbulence physics, compared to static approaches.Wang et al. (2018) and Mohan & Gaitonde (2018) efficiently predicted the coefficients of basis functions using LSTM after applying proper orthogonal decomposition to various flows.Mohan et al. (2019) extended the range of prediction to the forced homogeneous isotropic turbulence with two passive scalars.Srinivasan et al. (2019) confirmed the applicability of LSTM to wall-bounded near-wall turbulence using the nine-equation shear flow model.A recent study by Nakamura et al. (2021) successfully applied nonlinear mode decomposition to predict the minimal turbulent channel flow using a CNN-based AE.Although ROM-based methods are efficient and easy to analyze, system characteristics such as nonlinear transients and multiscale phenomena can be easily lost during information compression when only the dominant modes are used.However, as complexity of ROM-based methods increases, models tend to capture most physical phenomena of turbulence.For example, as reported by Nakamura et al. (2021), approximately 1500 modes were found to be sufficient to fully reconstruct turbulence characteristics.Then a question arises on how many modes need to be considered and the number of modes to properly represent turbulence might be not as small as intended.
The most basic and popular ML models are artificial neural networks (ANNs), also known as multilayer perceptrons, which determine nonlinear functional relationships between the input and output data and are relatively easy to train (Beck & Kurz 2021).However, when the input is in the form of a field, CNNs effectively capture embedded spatial patterns or correlations.In our previous work on the prediction of turbulent heat transfer (Kim & Lee 2020b), a CNN successfully reconstructed the wall heat-flux distribution based on wall shear stresses.However, CNN-based models whose objective function is to minimise the pointwise mean-squared difference between the predicted and target fields sometimes produce blurry outputs (Kim & Lee 2020a;Kim et al. 2021).Conversely, GANs (Goodfellow et al. 2014), in which a generator () and discriminator () network are trained simultaneously in an adversarial manner such that  is trained to generate high-quality data while  is trained to distinguish generated data from target data, can produce better output than CNNs (Deng et al. 2019;Kim & Lee 2020a;Kim et al. 2021Kim et al. , 2023)).Lee & You (2019) also showed that GANs are better in long-term prediction of unsteady flow over a circular cylinder.Ravuri et al. (2021) applied a deep generative model to precipitation nowcasting and reported a much higher accuracy with small-scale features than other ML models and numerical weatherprediction systems.GANs appear to capture the statistical characteristics of fields better than CNNs; we focus on this capability of GAN in this study.First, we selected 2D decaying homogeneous isotropic turbulence (DHIT), which is essential in fields such as weather forecasting (Shi et al. 2015;Rüttgers et al. 2019;Liu & Lee 2020;Ravuri et al. 2021); it is relatively simple; thus, its prediction can be performed at a reasonable cost.Also, the study of 2D turbulence was initially considered as a simplified version of 3D turbulence; however, it was studied extensively (Sommeria 1986;Brachet et al. 1988;Mcwilliams 1990;McWilliams et al. 1994;Jiménez et al. 1996) after its unique characteristics related to geophysical and astrophysical problems, such as strongly rotating stratified flow, are revealed (Alexakis & Doering 2006).The primary goal of this study was to develop a high-accuracy prediction model for 2D DHIT called PredictionNet based on GANs, that produces the evolution of turbulence by reflecting spatiotemporal statistical characteristics.Successfully trained PredictionNet could predict 2D turbulence more accurately in various aspects than a baseline CNN.Although proving why GANs are better in the prediction of turbulence statistics than CNNs is prohibitively hard, we performed various quantitative statistical analyses regarding the predictive accuracy depending on time and spatial scales to provide some clues on the working principle of the GAN model.By considering scale decomposition in the analysis of the behaviour of the latent variable, we discovered that the discriminator network of a GAN possesses a scale-selection capability, leading to the successful prediction of small-scale turbulence.
Second, flow control becomes feasible if accurate flow prediction is possible.The application of ML to turbulence control dates back to Lee et al. (1997), who used a neural network for turbulence control to reduce drag in a turbulent channel flow.Recently, various studies that applied ML to flow control for drag reduction in turbulent channel flow (Park & Choi 2020;Han & Huang 2020;Lee et al. 2023), drag reduction of flow around a cylinder (Rabault et al. 2019;Rabault & Kuhnle 2019;Tang et al. 2020), and object control (Colabrese et al. 2017;Verma et al. 2018) have been conducted, yielding successful control results.However, in this study, for a fundamental understanding of the control mechanism in 2D turbulence, we considered determining the optimum disturbance field, which can modify the flow in the direction of optimising the specified objective function.Thus, we combined PredictionNet and ControlNet for specific purposes.A target where the flow control can be used meaningfully, such as maximising the propagation of the control effect of the timeevolved flow field, was set, and the results were analyzed and compared with the results of similar studies (Jiménez 2018;Yeh et al. 2021).
Following this introduction, § 2 describes the process of collecting datasets to be used for training and testing.In § 3, ML methodologies such as objective functions and network architectures are explained.The prediction and control results are subdivided and analysed qualitatively and quantitatively in § 4, and a conclusion is drawn in § 5.

Data collection
For decaying 2D turbulence, which is our target for prediction and control, DNS was performed by solving the incompressible Navier-Stokes equations in the form of the vorticity transport equation without external forcing: with where ( 1 ,  2 , ) is the vorticity field with  1 =  and  2 =  and  is the kinematic viscosity. denotes the steam function that satisfies  1 =  = / and  2 =  = −/.A pseudo-spectral method with 3/2 zero padding was adopted for spatial discretization.The computational domain size to which the biperiodic boundary condition was applied was a square box of [0, 2) 2 , and the number of spatial grids,   ×   , was 128 × 128.The Crank-Nicolson method for the viscous term and second-order Adams-Bashforth method for the convective term were used for temporal advancement.In Appendix § A, it was proven that the pseudo-spectral approximation to Equations (2.1, 2.2) in a biperiodic domain is equivalent to pseudo-spectral approximation to the rotational form of the two-dimensional Navier-Stokes equations.For the training, validation, and testing of the developed prediction network, 500, 100, and 50 independent simulations with random initialisations were performed, respectively.
Training data were collected at discrete times,   (=  0 +,  = 0, 1, 2, • • • , 100) and   + ( is the target lead time for prediction), where  (= 20Δ) is the data time step, and Δ denotes the simulation time step. 0 was selected such that the initial transient behaviour due to the prescribed initial condition in the simulation had sufficiently disappeared; thus, the powerlaw spectrum of enstrophy (Ω() ∝  −1 where  is the wavenumber, Brachet et al. 1988) was maintained.During this period, the root-mean-square vorticity magnitude  ′ and the average dissipation rate  decay in the form ∼  −0.56 and ∼  −1.12 , respectively, as shown in Figure 1(a), whereas the Taylor length scale ( =  ′ / ′ where  ′ is the RMS velocity magnitude, Jiménez 2018) and the Reynolds number based on  (  =  ′ /), which are ensemble-averaged over 500 independent simulations, linearly increase as shown in Figure 1(b).Therefore, 50,000 pairs of flow field snapshots were used in training, and for sufficient iterations of training without overfitting, data augmentation using a random phase shift at each iteration was adopted.Hereafter, the reference time and length scales in all nondimensionalizations are  * = 1/ ′ 0 and  * =  ′ 0 / ′ 0 , respectively.The nondimensionalized simulation and data time steps are Δ/ * = 0.00614 and / * = 0.123, respectively. 100 −  0 is approximately 2.5 times the Eulerian integral timescale of the vorticity field, as discussed below.Therefore, the vorticity fields at  0 and  100 are decorrelated as shown in Figures 1(c) and 1(d).In our training, all of these data spanning from  0 to  100 were used without distinction such that the trained network covers diverse characteristics of decaying turbulence.
To select the target lead time , we investigate the temporal autocorrelation function of the vorticity field, ()(= ⟨()( + )⟩ / () 2 1/2 ( + ) 2 1/2 ) for time lag , as shown The spatial two-point correlation function of vorticity   (, ) with the corresponding integral length scale   at three different times  =  0 ,  0 +   , and  0 + 2  is shown in Figure 3.For the investigated time range,   (, ) decays sufficiently close to zero at  =  (half domain length), even though   tends to increase over time from 0.876 * at  0 to 1.03 * at  100 because of the suppression of the small-scale motions of 2D turbulence.This marginally decaying nature in the periodic domain was considered in the design of the training network such that data at all grid points were used in the prediction of vorticity at one point, as discussed in the following section.

ML models and objective functions
Training ANNs is the process of updating weight parameters to satisfy the nonlinear relationships between the inputs and outputs as closely as possible.The weight parameters were optimised to minimise the prescribed loss function (in the direction opposite to the gradient) by reflecting nonlinear mappings.Loss functions are mainly determined by objective functions, and other loss terms are often added to improve training efficiency and model performance.In GANs, a generator () and discriminator () are trained simultaneously in an adversarial manner; parameters of  and  are iteratively and alternately updated to minimize log(1 −  ( ())) for  and maximize log( ()) + log(1 −  ( ())) for , respectively.This stands for the two-player min-max game with a value function  (, ) given by min where  and  are real data and random noise vectors, respectively.Operator E denotes the expectation over some sampled data, and the expressions  ∼ () and  ∼ () indicate that  is sampled from the distribution of the real dataset () and  from some simple noise distribution () such as a Gaussian distribution, respectively.Thus, we can obtain a generator that produces more realistic images.Various GANs have been developed rapidly since their introduction (Mirza & Osindero 2014;Arjovsky et al. 2017;Gulrajani et al. 2017;Karras et al. 2017;Mescheder et al. 2018;Park et al. 2019;Zhu et al. 2020).Among these, a conditional GAN (cGAN) (Mirza & Osindero 2014) provides additional information , which can be any type of auxiliary information, as a condition for the input of the generator and discriminator to improve the output quality of the generator, as follows: Furthermore, we employ two adaptive methods that can stabilise the training process, solve the problem of the vanishing gradient in which the discriminator is saturated, and prevent mode collapse, a phenomenon in which the distribution of generated samples is restricted to a specific small domain, even though the generator does not diverge.First, Equation (3.2) is modified using the Earth-Mover (EM) (Wasserstein-1) distance combined with the Kantorovich-Rubinstein (KR) duality (Villani 2009), which is called Wasserstein-GAN (WGAN) (Arjovsky et al. 2017), as shown in Equation (3.3). min This modification is made based on thorough examinations of various ways of measuring the distance between the real (data) distribution (  ) and the model distribution (  ), including the total variation distance, Kullback-Leibler divergence, and Jensen-Shannon divergence.EM distance can be expressed as the final form of Equation (3.4) by the KR duality: where the supremum is taken over all the 1-Lipschitz functions for the set of real data X,  : X → R. Simply put, the model  that wants to make   close to   represents the generator, and  corresponds to the discriminator that is optimized to make the distance between   and   larger.Thus, it can be melted down to the form of Equation (3.3).Then, a gradient penalty (GP) loss term is added to obtain the final form of WGAN-GP (Gulrajani et al. 2017).We intend to develop a model capable of predicting the dynamic behaviour of turbulence with high predictive accuracy by reflecting statistical aspects, employing a cGAN with WGAN-GP for PredictionNet and comparing the results with a baseline CNN.
PredictionNet is a network that predicts the vorticity field after a specific lead time  from the input field at  as follows: where ,  * , and  represent the real data, prediction results, and prediction targets, respectively (here, the prediction target  is independent of the additional input  in Equations (3.2) and (3.3), which are general descriptions of the value function of each GAN model).
The prediction network is trained using DNS data to play a functional role in predicting the vorticity field after the lead time from each time in our dataset.Therefore, the following objective function becomes the optimisation target, regardless of the applied model: argmin with the trainable parameters of PredictionNet   , and ∥ • ∥ represents a distance norm of any type.PredictionNet is the generator when the GAN algorithm is applied and the adversarial loss term is added to the objective function above.In our cGAN application, the generator input (), which was used to generate the output ( * ), is also used as a condition in the training of the discriminator.This allows the statistical characteristics of the input () as well as the target ( ) to be reflected on the training of the discriminator and eventually the generator through competitive training.Conditioning is implemented through concatenation so that both the generator output  * and  or the target  and  are used as input to the discriminator as illustrated in Figure 4(b).The final form of the loss function of cGAN, including the GP term, is as follows: for the generator and for the discriminator, where where  ′ =  + ( * −  ) with  between zero and one.The simplest L2 norm distance was used for data loss.The role of     was to restrict the order of discriminator outputs (keeping them from drifting too far from zero) with a small weight .Its original form in Karras et al. (2017) is similar to the L2 regularization on  ( , ) as     = E ∼  () ( ( , )) 2 , but we modified it to the above form, in which regularization can be applied average-wise during the backpropagation to robustly use hyperparameters  and  regardless of the lead time.
Accordingly,  = 0.001 was used equally for all lead times, and  and  were fixed at 10 and 100/(  ×   ), respectively, by fine tuning.There exists a separate optimum hyperparameter setting for each lead time; however, we verified that our hyperparameter setting showed no significant difference in performance from the optimum settings.In addition, we verified that it worked properly for lead times ranging from 1 to 100.For the loss function of the baseline CNN, L2 regularisation loss was added to Equation (3.6) using L2 norm distance to improve the performance and make the optimisation process efficient, as follows: where  1 is set to 1/(  ×   ), and  2 modulates the strength of regularisation and is fixed at 0.0001 based on case studies.Simplified configurations of the CNN and cGAN are shown in Figure 4(a) and 4(b).ControlNet uses a pre-trained PredictionNet that contains the dynamics of our data as a surrogate model to generate disturbance fields that change the evolution of the flow field in a direction suitable for a target, as follows: (3.11) Δ represents the disturbance field added to the input field  (), and Ỹ is the disturbanceadded prediction at time  + .An important implication of ControlNet in this study is that it is a model-free method without restrictions, except for the strength of Δ.The objective function to be maximized includes the change in the vorticity field at a later time, as follows: argmax where   are the weight parameters of ControlNet.In the process of training of ControlNet, ) where ∇  denotes the gradient with respect to the coordinate directions.The coefficient  controls the strength of smoothing effect, and it is adjusted depending on the order of data loss (see § 4.4).Figure 4(c) shows a simplified configuration of ControlNet.

Network architecture
A typical multiscale architecture of a CNN was used for both PredictionNet and ControlNet, as shown in Figure 5(a).The architecture consists of six convolutional blocks, each composed of three convolutional layers with 3 × 3 filter kernels (Conv.3 × 3), called Convblk-m, named after their feature maps with different resolutions ( 128/2 𝑚 × 128/2 𝑚 (𝑚 = 0, 1, 2, 3, 4, 5)).Here, the average pooled inputs  () ( = 1, 2, 3, 4) and the original input  (0) are concatenated to the feature map tensor at the beginning of each Convblk-m to be used as feature maps.One node point of a pooled input  () contains the compressed information of 2  points of  (0) .Using such an architecture enables all the spatial grid information of an input field, , to be used to calculate a specific output point, even for the lowest-resolution feature maps.For example, the receptive field of Convblk-5 is (2 5 ×7) × (2 5 ×7), which means that all the spatial grid points of  (0) are used to calculate a point in the output.In addition, by sequentially using low-to high-resolution feature maps, the network can learn large-scale motion in the early stages, and then fill in the detailed features of small-scale characteristics in the deepest layers.As mentioned in § 2, the purpose of designing networks is to increase the prediction accuracy and determine the global optimum by considering the spectral properties of the flow.However, depending on the flow features or characteristics of the flow variables, we can expect that accurate results can be obtained even using only a small input domain with a size similar to  for cost-effectiveness.Furthermore, periodic padding was used to maintain the size of the feature maps after convolutional operations, because the biperiodic boundary condition was applied spatially.The feature maps generated from Convblk-(m+1) must be extended through upscaling to be concatenated and used with the following pooled input  () (i.e.doubling the size via upsampling).In this process, a transposed convolution is used to minimise the nonphysical phenomena.After upscaling,  () is concatenated to the feature map tensor and then Convblk-m operations are performed.Finally, after the original input field  (0) is concatenated, the operations of Convblk-0 are performed, and the output of resolution 128 × 128 is generated depending on the network, prediction result  () =  * , or disturbance field  () = Δ.Detailed information on the architectures of PredictionNet and ControlNet, including the number of feature maps used in each Convblk, are presented in Table 1.A leaky rectified linear unit (LReLU) is used as an activation function to add nonlinearities after each convolutional layer.This was not applied to the last layer of Convblk-0, to prevent nonlinearity in the final output.The discriminator of PredictionNet to which cGAN is applied has a symmetric architecture for the generator (PredictionNet), as shown in Figure 5(b). 0 is concatenated with the target or the prediction result for the input of the discriminator as a condition.In contrast to the generator, convolutional operations are performed from high to low resolutions, named with each convolutional block as Disblk-m.Average pooling was used for downsampling to half the size of the feature maps.After the operation of Disblk-5, its output feature maps passed through two fully connected layers (with output dimensions of 1 × 1 × 256 and 1 × 1 × 1) to return a scalar value.The numbers of feature maps used for each Disblk are presented in Table 1.The baseline CNN model takes the same architecture as PredictionNet but is trained without adversarial training through the discriminator.

PredictionNet -prediction of the vorticity field at a finite lead time
In this section, we discuss the performance of PredictionNet in predicting the target vorticity field at  + using the field at  as the input.The considered lead times  are 0.25  , 0.5  ,   , and 2  based on the observation that the autocorrelation of the vorticity at each lead time dropped to 0.71, 0.49, 0.26, and 0.10, respectively (Figure 2(a)).The convergence behaviour of the L2 norm-based data loss of the baseline CNN and cGAN models for various lead times in the process of training is shown for 100,000 iterations with a batch size of 32 in Figure 6.It took around 2 and 5.4 hours for CNN and cGAN, respectively, on a GPU machine of NVIDIA GeForce RTX 3090.Although the baseline CNN and PredictionNet (generator of cGAN) have the exact same architecture (around 520k trainable parameters based on Table 1), cGAN includes a discriminator to be trained comparable to the generator with many complex loss terms; thus, the training time is almost tripled.Both models yielded good convergence for lead times of up to 0.5  , whereas overfitting was observed for  =   and 2  in both models.Since the flow field is hardly correlated with the input field (correlation coefficient of 0.26 and 0.10), it is naturally more challenging to train the model that can reduce an MSE-based data loss, a pointwise metric that is independent of the spatial distribution.One notable observation in Figure 6 is that for lead times beyond   , CNN appears to exhibit better performance than cGAN, as evidenced by its smaller converged data loss.However, as will be discussed later, the pointwise accuracy alone cannot cover all the aspects of turbulence prediction due to its various statistical properties.Additionally, although the only loss term of CNN is the pointwise MSE except the weight regularization, the magnitude of its converged data loss also remains significant.
As an example of the test of the trained network, for unseen input data, the fields predicted by cGAN and CNN were compared against the target data for various lead times, as shown in Figure 7.In the test of cGAN, only PredictionNet was used to generate or predict flow field at the target lead time.Hereinafter, when presenting performance results for the instantaneous field, including Figure 7 and all subsequent statistics, we only brought the results for input at  0 .This choice was made because the difficulty of tasks that the models have to learn is much larger at earlier time since flow contains more small-scale structures than later times due to decaying nature.Therefore, performance test for input data at  0 is sufficient for test of the trained network.For short lead-time predictions, such as 0.25  , both cGAN and CNN showed good performance.However, for a lead time of 0.5  , the prediction by the CNN started displaying a blurry image, whereas cGAN's prediction maintained small-scale features relatively better than CNN.This blurry nature of the CNN worsened the predictions of   and 2  .Although cGAN's prediction also deteriorated as the lead time increased, the overall performance of cGAN, particularly in capturing small-scale variations in the field, was much better than that of the CNN even for   and 2  .
A quantitative comparison of the performances of the cGAN and CNN against the target in the prediction of various statistics is presented in Table 2.The correlation coefficient (CC) between the prediction and target fields or, equivalently, the mean-squared error by the cGAN shows a better performance than that by the CNN for lead times of up to 0.5  , even though both the cGAN and CNN predicted the target field well.For   , the CC by cGAN and CNN was 0.855 and 0.887, respectively, indicating that the predictions by both models were not poor, whereas the predictions by both models were quite poor for 2  .Once again, for lead times larger than   , CNN shows better performance on the pointwise metrics according to the trend of data loss.Conversely, the statistics related to the distribution of vorticity, such as the RMS value or standardised moments ( μ =   /  , where   = ⟨( − ⟨⟩)  ⟩ is the   ℎ central moments and  =

√
2 is the standard deviation), clearly confirm that the cGAN outperforms the CNN for all lead times, and the prediction by the cGAN was much closer to the target than that of the CNN, even for   and 2  .Furthermore, the prediction of the  ) and dissipation ( = 2       ,  ′ =  * 3 / * 2 , where    is the strain rate tensor) by cGAN was more accurate than that by the CNN, as shown in Table 2.All these qualitative and quantitative comparisons clearly suggest that the adversarial learning of the cGAN tends to capture the characteristics of turbulence data better than the CNN, which minimises the pointwise MSE only.The superiority of the cGAN is more pronounced for large lead times, for which small-scale turbulence features are difficult to predict.
Detailed distributions of probability density functions (PDFs) of vorticity, two-point correlation functions, and enstrophy spectra (Ω() = Σ | |= | ω( )| 2 where ω( , ) = F {(, )}) of the target and prediction fields by the cGAN and CNN are compared for all lead times in Figure 8.Both the cGAN and CNN effectively predicted the target PDF distribution for lead times of up to   , whereas the tail part of the PDF for 2  was not effectively captured by the cGAN and CNN.The difference in performance between the cGAN and CNN is more striking in the prediction of   () and Ω().  () and Ω() obtained by the cGAN almost perfectly match those of the target for all lead times, whereas those predicted by the CNN show large deviations from those of the target distribution progressively with the lead time.In the prediction of the correlation function, the CNN tends to produce a relatively slow decaying distribution compared with the cGAN and the target, resulting in a much larger integral length scale for all lead times.In particular, it is noticeable that even for a short lead time of 0.25  , the CNN significantly underpredicts Ω() in the high-wavenumber range, and this poor predictability propagates toward the smallwavenumber range as the lead time increases, whereas cGAN produces excellent predictions over all scale ranges, even for 2  .This evidence supports the idea that adversarial learning accurately captures the small-scale statistical features of turbulence.
To quantify in more detail the differences in the prediction results by cGAN and CNN models, scale decomposition was performed by decomposing the fields in the wavenumber components into three regimes-large-, intermediate-, and small-scale fields-as in the investigation of the temporal correlation in Figure 2  in Appendix § B. For small lead times, such as 0.25  and 0.5  , the cGAN predicted the small-scale fields accurately, whereas those produced by the CNN contained non-negligible errors.For 2  , it is difficult to predict using both models even with the large-scale field.On the other hand, the CC of decomposed fields between the target and predictions, provided in Table 4 of Appendix § B, did not demonstrate the superiority of the cGAN over the CNN.This is attributed to the fact that pointwise errors such as the MSE or CC are predominantly determined by the behaviour of large-scale motions.The pointwise MSE is the only error used in the loss function of the CNN, whereas the discriminator loss based on the latent variable is used in the loss function of the cGAN in addition to the MSE.This indicates that the latent variable plays an important role in predicting turbulent fields with multiscale characteristics.
In the prediction of the enstrophy spectrum, the cGAN showed excellent performance, compared to the CNN, by accurately reproducing the decaying behaviour of the spectrum in the small-scale range, as shown in Figure 8.The average phase error between the target and predictions by the cGAN and CNN is shown in Figure 10.For all lead times, the phase error of the small-scale motion approaches /2, which is the value for a random distribution.For short lead times 0.25  and 0.5  , the cGAN clearly suppressed the phase error in the small-scale range compared with the CNN, whereas for   and 2  , both the cGAN and CNN poorly predicted the phase of the intermediate-and small-scale motions, even though the cGAN outperformed the CNN in predicting the spectrum.
The performance of PredictionNet in the prediction of velocity fields is presented in Appendix § C, where several statistics, including the PDF of velocity, two-point correlation, and PDF of the velocity gradient are accurately predicted for all lead times.The superiority of the cGAN over the CNN was also confirmed.

Role of the discriminator in turbulence prediction
In the previous section, we demonstrated that adversarial learning using a discriminator network effectively captured the small-scale behaviour of turbulence, even when the smallscale field at a large lead time was hardly correlated with that of the initial field.In this section, we investigate the role of the discriminator in predicting turbulence through the behaviour of the latent variable, which is the output of the discriminator.Although several attempts have been made to disclose the manner in which adversarial training affects the performance of the generator and the meaning of the discriminator output (Creswell et   implicit role of the latent variable in recovering the statistical nature of turbulence in the process of a prediction. In the current cGAN framework, the discriminator network is trained to maximise the difference between the expected value of the latent variable of the target field  ( , ) and that of the prediction  ( * , ), whereas the generator network is trained to maximise  ( * , ) and to minimise the pointwise MSE between  and  * .Therefore, through the adversarial learning process, the discriminator is optimised to distinguish  * from  , whereas the generator evolves to produce  * by reflecting on the optimised latent variable of the discriminator and minimising the MSE between  and  * .The input field  is used as a condition in the construction of the optimal latent variable in the discriminator network.The behaviour of the expected value of the latent variable of the target and generated fields (  and    ) and their difference (  −    ) as learning proceeds for all lead times are illustrated in Figure 11.Because of the regularisation of the latent variable by     (=  2  ) in the loss function of the discriminator, both   and    remain around zero, although they sometimes oscillate significantly.As the learning proceeds,   −   quickly decays initially owing to the suppression of the MSE and continues to decrease to zero for 0.25  and 0.5  but to a finite value for   and 2  .The intermittent transition of   −    between a finite value and zero for 0.25  and 0.5  clearly suggests that the generator and discriminator operate competitively in an adversarial manner.As the generator improves, the difference monotonically decreases and occasionally exhibits sudden suppression.When sudden suppression occurs, the discriminator evolves to distinguish  * from  by finding a better criterion latent variable, resulting in a sudden increase in the difference.Ultimately, when the generator can produce  * that is indistinguishable from  by any criterion proposed by the discriminator, the difference converges to zero, as shown in the cases of 0.25  and 0.5  .However, for   and 2  , such an event never occurs;   −    monotonically decreases and tends toward a finite value, implying that the discriminator wins and can distinguish  * from  .Although the generator cannot produce  * to beat the discriminator,  * retains the best possible prediction.
To understand the mechanism in more detail that the discriminator uses to distinguish a generated field from a target field, the behaviour of the distribution of the latent variable during the learning process was investigated, because the latent variable plays a key role in the discriminator network.The distribution of the discriminator output of the generated field  ( * , ) at four iterations during the learning process is marked with a green circle in Figure 11(b) for all lead times and is compared with that of the target field  ( , ) in Figure 12.The distributions of latent variable of the scale-decomposed target field  (  , ),  (  , ), and  (  , ) are also compared for analysis, where   ,   and   are scale-decomposed from  in the same manner, as shown in Figure 2. Here, an additional 500 fields, in addition to the 50 test fields, were used to extract a smooth distribution of the latent variables of the target field and scale-decomposed target fields.For easy comparison, the mean value of the latent variable for the target field was shifted to zero.
For all lead times, as learning proceeded, the distributions of  ( , ) and  ( * , ) became narrower, with the mean values of  ( , ) and  ( * , ) becoming closer to each other (the gap between the two vertical lines decreases).This indicates that, as the generator improves in producing  * closer to  , the discriminator becomes more efficient in distinguishing  * from  because only the mean values are compared in the discrimination process and the narrower distributions yield the sharper mean values.Even when the mean values of  ( , ) and  ( * , ) are almost the same (18,000 and 124,000 iterations for 0.25  and 84,000 and 190,000 iterations for 0.5  , Figure 12(a) and 12(b)), a more optimal discriminator at later iterations yields a narrower distribution.The collapse of the distributions of  ( , ) and  ( * , ) in the second column for 0.25  and 0.5  occurs when the discriminator falls into a local optimum during training, implying that the predicted and target fields are indistinguishable by the discrimination criterion right at that iteration.However, as the criterion is jittered in the next iterations, the two fields become distinct again as shown in the third column.Eventually, when the two fields becom indistinguishable by any criterion after sufficient iterations, almost perfect collapse of very narrow distributions is achieved as shown in the fourth column for 0.25  (the collapse shown in the fourth column for 0.5  occurs in another local optimum).For   and 2  , for which perfect training of the discriminator was not achieved, however, distributions of  ( , ) and  ( * , ) hardly change with iteration although the mean values are getting closer to each other very slowly.
In the comparison with the scale-decomposed target field, we observe that for 0.25  and 0.5  , the distribution of the latent variable of the small-scale target field among all the scale-decomposed target fields is closest to that of the target field and generated field, whereas that of the intermediate-scale target field is closest to that of the target and generated fields for   and 2  .This suggests that a small-scale (or intermediate) field in the target field plays the most critical role in discriminating the generated field from the target field for 0.25  and 0.5  (or   and 2  ).If the generator successfully produces  * similar to  for 0.25  and 0.5  , and the predictions for   and 2  are incomplete, the small-scale fields for   and 2  are useless for discriminating the generated field from the target field, and the intermediate-scale field is partially useful.Considering the possible distribution of the functional form of the latent variable may provide a clue to this conjecture.The latent variable, which is the output of the discriminator network shown in Figure 5 (4.1)where  and  are the Fourier coefficients of  and , respectively.Δ, Δ,   , and   are the grid sizes in each direction, weights of the discriminator network, and slope of the LReLU function for the negative input, respectively.This linear relationship between the input and output is a consequence of the linear structure of the discriminator network: the convolution, average pooling, and full connection operations are linear, and the leaky ReLU function is a piecewise linear function such that either one or   is multiplied by the function argument, depending on its sign.Although   is an undetermined function, a possible shape can be conjectured.For 0.25  and 0.5  ,   of the optimised discriminator network has more weight in the small-scale range than in other scale ranges such that the latent variable is more sensitive to the small-scale field; thus, it better discriminates the generated field from the target field.Similarly, for   and 2  ,   has more weight in the intermediate-scale range than in the other ranges, and discrimination is limited to the intermediate-scale field because the small-scale field is fully decorrelated; thus, it is no longer useful for discrimination.Although the manner in which   conditionally influences learning is unclear, the scale-dependent correlation of the target field with the initial input field appears to be captured by   .This scale-selective feature of the discriminator appears to be the key element behind the successful prediction of the small-scale characteristics of turbulence.Here, the generator implicitly learns system characteristics embedded in data to deceive the discriminator with such features; thus, it's extremely challenging to provide an exact mechanism for how prediction performance is comprehensively enhanced, particularly in terms of physical and statistical attributes through adversarial training.However, we clearly showed that such a successful prediction of turbulence, even down to small scales, is possible through adversarial learning and not by the use of a CNN, which enforces suppression of the pointwise MSE only.
One last question that may arise in this regard is whether introducing physical constraints explicitly into the model, rather than using implicit competition with an additional network, could also be effective.To explore this, we incorporated the enstrophy spectrum as an explicit loss term into the objective function of the CNN model to address the performance issues associated with small-scale features.As shown in the relevant results presented in Appendix § D, adding the enstrophy loss alone did not lead to better performance although the enstrophy spectrum was better predicted.This confirms that the adoption of an implicit loss based on the latent variable in cGAN is effective in reflecting the statistical nature of turbulence, particularly the small-scale characteristics.

PredictionNet -single vs recursive prediction
In the previous section, we observed that the prediction performance for large lead times deteriorated naturally because of the poor correlation between the target field and the initial input field.An improved prediction for large lead times may be possible if recursive applications of the prediction model for short lead times are performed.Conversely, recursive applications may result in the accumulation of prediction errors.Therefore, there is an optimal model that can produce the best predictions.In this regard, since the performance of CNN models itself falls short of PredictionNet across all lead times, a more significant error accumulation is expected to arise.Thus, we would focus on analyzing the recursive prediction results using PredictionNet.Nevertheless, we also presented the results of CNN models, not only to highlight the improvements through recursive prediction for large lead-time prediction but also to compare how much further enhancement is achieved when utilizing PredictionNet.A similar concept was tested for weather prediction (Daoud et al. 2003).
For example, for the prediction of a large lead time 2  , four base models trained to predict the lead times 0.125  , 0.25  , 0.5  , and   were recursively applied 16, 8, 4, and 2 times, respectively, and compared against the single prediction as shown in Figure 13.All recursive applications produced better predictions than the corresponding single prediction, and among them, four recursive applications of the cGAN model for 0.5  yielded the best results.Eight recursive applications of CNN model for 0.25  , however, produces the best prediction.For all lead times, the best prediction was sought; the performances of all recursive predictions by cGAN and CNN are compared in Table 3 in terms of performance indices, such as the CC and MSE and statistical quantities, including RMS, μ4 , and dissipation rate.Here, we observed that the performance difference depending on the input time-point varies significantly depending on the base model; thus, the time-averaged values for CC and MSE from  0 to  100 as input are presented to ensure fair comparison, unlike § 4.1.The CC and MSE values are plotted in Figure 14.As expected, there exists an optimum base model producing the best performance for each lead time.Recursive prediction was more effective for large lead times (  and 2  ) than for short lead times (0.25  and 0.5  ) for cGAN model.For 2  , the CC improved from 0.4530 to 0.7394 and MSE decreased from 0.506 to 0.250 through the best recursive applications of cGAN model.The predicted statistics exhibited a consistent improvement.For   , the recursive applications show similar improvements.However, for 0.25  and 0.5  , even though the recursive applications produced slightly better performance in terms of the CC and MSE, the statistics predicted by the single prediction are more accurate than those of the recursive prediction.However, recursive applications of CNN model do not show a monotonic behavior in CC or MSE; for   , eight applications of CNN model trained for 0.125  yield the best prediction, whereas two applications of CNN model for   produces the best prediction for 2  .Finally, the prediction of the enstrophy spectrum by recursive prediction also yielded an improvement over the single prediction, as shown in Figure 15.Particularly, it is noticeable that eight recursive applications of CNN model trained for 0.25  yielded relatively better spectrum    than the single prediction.However, the performance of prediction was not good enough as shown in Figure 13(c).

ControlNet -maximisation of the propagation of the control effect
In this section, we present an example of flow control that uses the high-accuracy prediction model developed in the previous section as a surrogate model.By taking advantage of the prediction capability of the model, we can determine the optimum control input that can yield the maximum modification of the flow in the specified direction with a fixed control input strength.In particular, we trained ControlNet to find a disturbance field that produces the maximum modification in the vorticity field over a finite time period with the constraint of a fixed RMS of the disturbance field.Similar attempts to control 2D decaying turbulence were made by Jiménez (2018) and Yeh et al. (2021).Jiménez (2018) modified a part of a vorticity field to identify dynamically significant sub-volumes in 2D decaying turbulence.The influence on the future evolution of the flow was defined as significance using the L2 norm of the velocity field.Then, the significance of the properly-labelled regions was compared by turning the vorticity of specific regions on and off.They showed that vortices or vortex filaments are dynamically significant structures, and interstitial strain-dominated regions are less significant.In contrast, Yeh et al. (2021) developed a method called network-broadcast analysis based on network theory by applying Katz centrality (Katz 1953) to identify key dynamical paths along which perturbations amplify over time in 2D decaying turbulence.Two networks (composed of nodes and edges, not neural networks), the Biot-Savart network (BS) and Navier-Stokes network (NS), with different adjacency matrices were investigated.In the former case, vortex structures similar to those in Jiménez (2018) and in the latter case, opposite-sign vortex pairs (vortex dipoles) were the most effective structures for perturbation propagation.However, both studies confined the control disturbance field either by localising the modification for control by dividing the domain into cell units (Jiménez 2018), or by considering the perturbation using the leading singular vector of an adjacency matrix calculated from a pulse in a predefined shape (Yeh et al. 2021).However, the disturbance field considered in ControlNet is free of shape, and the only constraint is the strength of the disturbance field in terms of the fixed RMS of the disturbance field.
As a surrogate model for the time evolution of the disturbance-added initial field, PredictionNet trained for a lead time of 0.5  showing excellent prediction, was selected for the control test.ControlNet is trained to generate an optimum disturbance field Δ that maximises the difference between the disturbance-added prediction Ỹ (=  ( + Δ)) and the original (undisturbed) prediction  * .If the strength of Δ is too large, causing  + Δ to deviate from the range of dynamics of the pre-trained PredictionNet, the surrogate model will not function properly, resulting in Ỹ being different from the ground-truth solution N( + Δ).Here, N() is the result of the Navier-Stokes simulation, with the input as the initial condition.Ỹ using disturbances with various strengths of the RMS of the input field (Δ   =0.1, 0.5, 1, 5, 10, 20% of    ) were compared with the corresponding N( + Δ).We confirmed that PredictionNet functions properly within the tested range (Appendix § E).Therefore, the effect of the disturbance strength was not considered; thus, we only present the 10% case.Coefficient  related to the smoothing effect was fixed at 0.5 for Δ   = 0.1   through parameter calibration.Figure 16 shows the data loss trend of ControlNet for Δ   = 0.1   , which is maximised as the training progresses, confirming successful training.
Figure 17 presents a visualisation of the effect of the control input on one of the test data.Figure 17(a) shows the input vorticity field and corresponding prediction at  = 0.5  .Figure 17(b) shows the disturbance field generated by ControlNet Δ  , disturbanceadded prediction, and difference between the undisturbed and disturbance-added predictions, demonstrating that the control effect is effectively propagated and amplified at  = 0.5  .The MSE between Ỹ and  * is 0.757, indicating a substantial change.A prominent feature of Δ  is its large-scale structure.To verify whether the propagation of the control effect of Ỹ was maximised under the given conditions, we considered three additional disturbance fields.Inspired by the results of Jiménez (2018) and the claim of Yeh et al. (2021) regarding their BS network, in which vortices have the greatest effect on the future evolution of the base flow, we considered a 10% scaled input disturbance Δ  (= 0.1).Another disturbance field due to a single Taylor vortex Δ  extracted from the NS network of Yeh et al. (2021), which is best for amplifying the control effect when applied to a vortex dipole rather than the main vortex structures, is considered in the following form: where  = √︁ ( −   ) 2 + ( −   ) 2 with vortex center (  ,   ). represents the amplitude of the Taylor vortex, and it is adjusted to maintain the RMS of the disturbance at 0.1   .  was set to   = /  , where   = 4 denotes the wavenumber with the maximum value in the enstrophy spectra.Finally, the disturbance field in the form of a Taylor-Green vortex  Δ  approximating the disturbance field of ControlNet is considered as follows.
indicating the largest vortex in the [0, 2) 2 -domain.For Δ  and Δ  , the optimum location of (  ,   ) that yielded the greatest propagation was determined through tests using PredictionNet.
The disturbance-added predictions Ỹ and Ỹ and their differences from the undisturbed prediction  * using Δ  and Δ  located at the optimum position as disturbances, respectively, are shown in Figures 17(c) and 17(d).The difference fields show that these disturbances do not significantly change the field.MSE values for Ỹ and Ỹ against the undisturbed prediction are 0.039 and 0.402, respectively, which are much smaller than the corresponding value of 0.757 for Ỹ , as shown in Figure 18(a).From this comparison, it can be conjectured that because the goal of control is to maximise the pointwise MSE of the vorticity against the undisturbed prediction, the large-scale disturbance of the vorticity is more effective in displacing and deforming the vorticity field.The enstrophy spectra of the disturbance fields shown in Figure 18(b) confirm that Δ  has a peak value at  = 1, representing the largest permissible scale in the given domain, whereas Δ  and Δ  have peak values at  = 2 and  = 4, respectively, under the constraint that the RMS value of all disturbances is 0.1   .This observation leads to the consideration of the largest Taylor-Green vortex-type disturbance Δ  in the domain given by Equation (4.3).The distribution of the optimum location of Δ  shown in Figure 17(e), yielding the greatest propagation coincides with that of Δ  shown in Figure 17(b) (except for the sign owing to the symmetry in the propagation effect, as shown in the third panel of Figure 17(b) and 17(e)).The MSE of Ỹ against  * was 0.726, which was slightly smaller than 0.757 of Ỹ (Figure 18(a)).All these comparisons suggest that the largest-scale disturbance is the most effective in modifying the vorticity field through displacement or distortion of the given vorticity field.To verify whether the location of the largest-scale disturbance Δ  was indeed globally optimal, tests were conducted with a phase-shifted Δ  .We consider two types of phase shifting for the wavenumber components Δ  exp(    +     ): one with randomly selected   and   uniformly applied to all wavenumbers and the other with randomly selected phases differently applied to each wavenumber.The test results confirm that the maximum MSE was obtained for Δ  (without a phase shift), with the minimum MSE found at 0.2 among the 1,000 trial disturbances considered with randomly selected phases, as shown in Figure 18(a).Δ , corresponds to one of Δ  tests yielding the minimum modification with an MSE of approximately 0.45.The wide range of MSE for the largest-scale disturbance indicates that the location of the largest-scale disturbance is important for modifying the flow field.
To confirm the optimality of Δ  in real flow, full Navier-Stokes simulations were performed further than 0.5  using the disturbance-added initial conditions for Δ  , Δ  , Δ  , and Δ  .The visualisation results of the propagation over the time horizon are shown in Figure 19, and the behaviour of the normalised RMSE over time is shown in Figure 19(e).Therefore, the propagation by large-scale disturbances such as Δ  and Δ  is much more effective than that by Δ  and Δ  even up to longer than   .Moreover, the surrogate model functions properly for Δ   = 0.1   based on the comparison between the results at 0.5  in Figure 19 and those in the third column (difference fields) in Figure 17 for each disturbance.The RMSE of Δ  and Δ  behave almost indistinguishably as shown in Figure 19(e) for all test periods.
As shown in Figure 18(a), the location of a large-scale disturbance causes a difference in the modification of the flow field.To investigate this in more detail, the vorticity contours of the flow field and the velocity vectors of the disturbances yielding the greatest change, Δ  and Δ  , and the disturbance producing the least change, Δ , are plotted together in Figure 20.The velocity vector distributions of Δ  (Figure 20 Careful observation shows that the maximum modification occurs when the velocity of the disturbances is applied in the direction normal to the elongation direction of the strong vortex region, whereas the flow field is minimally changed when the velocity of the disturbances is in the elongation direction of the strong vortex patch.The conditional PDF of the angle between the velocity vectors of the input field  and disturbance fields is obtained under the condition that the velocities of the input and disturbance are greater than their own RMS values.Figure 20(d) shows that the peak is found at approximately 0.6 for Δ  and Δ  , and zero for Δ , , confirming this trend, given that the velocity vectors circumvent the vortex patches.

Conclusion
In this study, the dynamics prediction of 2D DHIT was performed using a cGAN-based deep learning model.The developed prediction network, called PredictionNet, produced highly accurate predictions up to a lead time of half the Eulerian integral time scale.A quantitative comparison of the prediction performance with that of the baseline CNN model proved the superiority of the GAN-based model.In particular, the small-scale turbulence characteristics, which were not properly predicted by the CNN model, were captured well by the cGAN-based model.A detailed investigation of the behaviour of the latent variable, which is the output of the discriminator network, suggests that the discriminator network evolves through training to possess a scale-selection capability; thus, it can effectively distinguish the generated turbulence from the true turbulence.The minimisation of the pointwise mean-squared error loss tended to capture large-scale motions only, whereas competitive optimisation of the discriminator network using the latent variable led to the recovery of small-scale motions.
In addition, recursive predictions were tested using high-accuracy base models for short lead times to improve the predictive accuracy for long lead times.As shown in Figure 13, four recursive applications of the prediction model trained for 0.5  yielded significantly better predictions than the single prediction for 2  .However, more recursive applications of the  prediction model for shorter lead times did not always guarantee improvement, indicating that an optimum recursive model exists for each lead-time prediction.
Flow control was conducted as an example of application of the developed high-accuracy prediction model.Using the developed prediction model for 0.5  as a surrogate model, a control network was trained to provide an optimum disturbance field that maximised the modification at a given lead time.When the pointwise mean-squared difference between the disturbance-added prediction and undisturbed prediction at 0.5  was maximised, the optimum disturbance turned out to be of the largest scale fitting the domain.This maximum propagation of the control effect appeared to be achieved through the translation of elongated vortices in a direction orthogonal to the elongation direction.Although the optimum disturbances were found in a model-free manner (the only constraint is the condition for RMS), our models converged well to the optimum solution despite the high probability of falling into local optima because of the high degree of freedom.Although the control of 2-D turbulence using the distributed disturbances seems impractical, we provided an example of deep learning framework on which using the well-trained prediction model, it is possible to find an optimal control input very efficiently that can change the flow as one wishes.It can easily extended to more practical applications.
Our investigation was restricted to 2D decaying homogeneous isotropic turbulence.Therefore, a natural extension would be toward more general turbulent flows such as inhomogeneous 2D flows or even 3D turbulence.Application to 3D homogeneous isotropic turbulence would be straightforward, even though there is a cost issue because training would require much more time than in 2D flows.However, it is worthwhile investigating whether the scale-selection capability of the discriminator network works in 3D isotropic turbulence.We have demonstrated that a generative model such as GAN is more effective in learning turbulence characteristics than a CNN.However, GAN is not the only generative model.Recently, a diffusion model (Sohl-Dickstein et al. 2015;Ho et al. 2020;Shu et al. 2023) was proposed as a generative model in which training is much more stable than GAN-based model.A comparative study of GAN and the diffusion models in the prediction of turbulence would be an interesting topic for future research.predictions were not sufficiently good.However, in a statistical sense the cGAN appeared to generate reasonable predictions.(Spect 0.01) and 0.1/ (Spect 0.1), were tested, where the maximum valid wavenumber  = 64.
In training models for  = 0.5  , both cases exhibited convergence in the data loss.However, it's worth noting that during the training process, these cases displayed more fluctuations in the data loss compared to what was observed in Figure 6. Figure 23 shows comparison of predicted fields and the corresponding enstrophy spectra that was explicitly given as a loss term in the objective function.In Figure 23(a), for a direct comparison, results by cGAN and original CNN for 0.5  have been extracted from Figures 7 and 8 in the main text.Figure 23(b) displays the outcomes for the additional cases.Two noteworthy observations can be made out of these figures.Firstly, both Spect 0.01 and Spect 0.1 significantly improved the spectrum when compared to Spect 0. Secondly, despite the enhancement in spectrum, the visualizations show poor performance compared to Spect 0. When comparing other statistical metrics to those presented in Table 2, Spect 0.01 showed slightly inferior performance compared to Spect 0, with CC of 0.9644, MSE of 0.0477, and RMS of 0.9384.Spect 0.1 performed even worse, with CC of 0.9441, MSE of 0.0736, and RMS of 0.9179.It is important to highlight that although the RMS of Spect 0.01 seems closer to the target even than cGAN, it was caused by underpredictions in spectrum at intermediate and small scales and the slight overpredictions at large scales.
We observed that a consideration of an explicit spectrum loss in the training of CNN did not lead to better performance.We investigate the phase information, an essential information along with the spectrum.Figure 24 provides the phase error in which Spect 0.01 exhibits a slightly increased error compared to Spect 0, and Spect 0.1 reveals a significant degradation in the phase error.It can be inferred that explicitly introducing specific statistical characteristics of the system as loss terms in the model might lead to degradation in other statistics.One can potentially enhance performance by adding more explicit losses while conducting finer optimization.However, this approach requires an extensive parameter optimization.On the other hand, GANs, by implicitly learning system characteristics through competition with the discriminator, seem to effectively learn the statistical properties of turbulence.

Appendix E. Effect of the disturbance strength on the pre-trained surrogate model
For the training of the control network, PredictionNet (cGAN-0.5 ), was used as a surrogate model.As mentioned in § 4.4, if the strength of the disturbance Δ is significantly large,  then  + Δ will deviate from the dynamics of the pre-trained PredictionNet, and the surrogate model will not work properly.Thus, the control effect cannot be properly evaluated as disturbance-added predictions,  ( + Δ) = Ỹ , are vastly different from the results of the actual simulation, N( + Δ), with  + Δ as the initial condition.Based on the RMS of the input field, Ỹ was compared with N( + Δ) using disturbances of various strengths (Δ   = 0.1, 0.5, 1, 5, 10, and 20% of    ).We could confirm that the surrogate model works properly for all testing cases.Therefore, an example using one of the test data is presented in Figure 25   most of the area in Figure 25(c), which shows the square of the difference between the two, is close to zero.Moreover, the values are not large even at positions where a relatively large error is observed.The MSE in Figure 25(c) is 0.0582, which is reasonably small.In addition, there was little change in the structure of the disturbance field as the disturbance strength changed; there was only a difference in the degree of change in the field over time.Therefore, the disturbance-added solution for a specific target does not respond sensitively to the disturbance strength within the operating range of the surrogate model.

Figure 1 .
Figure 1.Range of selected data for training with a time interval of 100/ * shown in the yellow region in (a) the vorticity RMS and dissipation rate and in (b) Reynolds number and the Taylor length scale.Example vorticity fields at (c)  0 and (d)  100 from a normalised test data.

Figure 2 .
Figure 2. Distribution of the temporal autocorrelation function of (a) the whole vorticity field and (b) the scale-decomposed vorticity fields.

Figure 3 .
Figure 3. Ensemble averaged two-point correlation functions of vorticity extracted from 500 training data.

Figure 5 .
Figure 5. Network architecture of (a) PredictionNet (the generator of cGAN) and ControlNet and (b) discriminator of cGAN.

Figure 6 .
Figure 6.Convergence of data loss depending on the lead time of (a) baseline CNN and (b) cGAN.The order of the converged value increases as the lead time gets larger.Compared with the range of normalised vorticity  at  0 , (−3, 3),  = 2  has a relatively large value of converged data loss (approximately 0.25 and 0.4 for CNN and cGAN, respectively, which are approximately 8.3% and 13.3% of the maximum value).In addition, relatively large overfitting was observed in the case of  = 2  .

Figure 7 .
Figure 7. Visualised examples of the performance of the cGAN and CNN for one test dataset.(a) Input field at  0 , prediction results at the lead time (b) 0.25  , (c) 0.5  , (d)   , and (e) 2  .The first, second, and third columns show the target DNS, cGAN predictions, and CNN predictions, respectively.

Figure 8 .
Figure 8.Comparison of statistics such as the PDF, two-point correlation, and enstrophy spectrum of the target and prediction results at different lead times (a) 0.25  , (b) 0.5  , (c)   , and (d) 2  for the same input time point  0 .
(b).The decomposed fields predicted for the lead time of   by the cGAN and CNN, for example, were compared with those of the target field, as shown in Figure9.As shown in Figure2(b), the large-scale field ( ⩽ 4) persists

Figure 11 .
Figure 11.Evolution over training iteration of (a)   and    , and (b) their difference evaluated every 2,000 iterations.

Figure 12 .
Figure 12.Distributions of discriminator output (latent variable) for various fields at several iterations.(a)  = 0.25  , (b)  = 0.5  , (c)  =   , and (d)  = 2  .All distributions are shifted by the target mean to fix the target distribution at the zero mean.The vertical black solid line indicates the target mean (zero) and the red line is the prediction mean.When the mean difference between the target and prediction is smaller than 0.5, the vertical lines are omitted for visibility.

Figure 13 .
Figure 13.Visualised prediction results for  = 2  of (a) target DNS field and single predictions using cGAN and CNN, (b) cGAN recursive predictions, and (c) CNN recursive predictions.
Statistical results of recursive predictions for various 's including the single prediction at the last entries.Best prediction for each lead time by cGAN is bold-faced.

Figure 14 .
Figure 14.Plots of the CC and MSE of the recursive predictions in terms of the lead time of the base recursive model.Only   and 2  cases are included for CNN for visibility.

Figure 15 .
Figure 15.Enstrophy spectra of the best case of recursive prediction compared with the target and single prediction for  = 2  using (a) cGAN and (b) CNN.

Figure 18 .
Figure 18.(a) Comparison of MSEs between various Ỹ and  * using different disturbance fields.Black dots denote the sorted MSE of phase-shifted Δ  .The first node of black dots (red dashed line) shows the result of the optimum Δ  .(b) Enstrophy spectra of the input , Δ  , Δ  , and Δ  .

Figure 19 .
Figure 19.Simulated propagations of the control effect along with the time horizon presented by differences between the disturbance-added simulations and the original simulation.(a) Δ  , (b) Δ  , (c) Δ  , and (d) Δ  .The RMSE result of the propagation of each disturbance is shown in (e) after being normalised by the input RMS.

Figure 20 .
Figure 20.Distribution of the disturbance vector field with the input vorticity contours for (a) Δ  , (b) Δ  , and (c) Δ , .(d) Conditional PDF of the angle between the velocity vectors of the input and disturbance.

Figure 21 .
Figure 21.Scale decompositions of other values of  (continued on the next page).

Figure 22 .
Figure 22.Comparison between the velocity statistics of the target and prediction results depending on .(a)  = 0.25  , (b)  = 0.5  , (c)  =   , and (d)  = 2  for the same input time point of  0 .The first, second, and third columns display velocity PDFs, longitudinal and transverse correlation functions, and velocity gradient PDFs, respectively.

Figure 23 .
Figure 23.Visualised examples of prediction result and enstrophy spectra for 0.5  prediction using (a) cGAN and Spect 0 (baseline CNN) and (b) spectrum loss added CNNs.

Figure 24 .
Figure 24.Prediction results of the phase error for 0.5  comparing cGAN, Spect 0, and spectrum loss added CNNs.
for the 20% case, which most likely deviates from the dynamics of PredictionNet.The two fields in Figures 25(a) and 25(b) are qualitatively similar, and

Table 1 .
Number of feature maps used at each convolutional layer of Convblks and Disblks.

Table 2 .
Quantitative comparison of the performance of the cGAN and CNN against the target in the prediction of the CC and MSE & RMS and the   ℎ standardized moments & Kolmogorov length scale () and dissipation rate ( ′ ) depending on the lead time for the input at  0 .All the dimensional data such as MSE, RMS, and  ′ are provided just for the relative comparison.

Table 4 .
Correlation coefficient between the target and prediction of the scale-decomposed fields depending on the lead time.