Machine learning based spatio-temporal super resolution reconstruction of turbulent flows

We present a new turbulent data reconstruction method with supervised machine learning techniques inspired by super resolution and inbetweening, which can recover high-resolution turbulent flows from grossly coarse flow data in space and time. For the present machine learning based data reconstruction, we use the downsampled skip-connection/multi-scale model based on a convolutional neural network to incorporate the multi-scale nature of fluid flows into its network structure. As an initial example, the model is applied to a two-dimensional cylinder wake at $Re_D$ = 100. The reconstructed flow fields by the proposed method show great agreement with the reference data obtained by direct numerical simulation. Next, we examine the capability of the proposed model for a two-dimensional decaying homogeneous isotropic turbulence. The machine-learned models can follow the decaying evolution from coarse input data in space and time, according to the assessment with the turbulence statistics. The proposed concept is further investigated for a complex turbulent channel flow over a three-dimensional domain at $Re_{\tau}$ =180. The present model can reconstruct high-resolved turbulent flows from very coarse input data in space, and it can also reproduce the temporal evolution when the time interval is appropriately chosen. The dependence on the amount of training snapshots and duration between the first and last frames based on a temporal two-point correlation coefficient are also assessed to reveal the capability and robustness of spatio-temporal super resolution reconstruction. These results suggest that the present method can meet a range of flow reconstructions for supporting computational and experimental efforts.


Introduction
In recent years, machine learning methods have been utilized to tackle various problems in fluid dynamics (Brunton et al. 2020;Fukami et al. 2020;Brenner et al. 2019). Applications of machine learning for turbulence modeling have been particularly active in fluid dynamics (Duraisamy et al. 2019;Kutz 2017). Ling et al. (2016) proposed a tensor-basis neural network based on the multi-layer perceptron (MLP) (Rumelhart et al. 1986) for Reynolds-Averaged Navier-Stokes simulation. Embedding the Galilean invariance into the machine learning structure was formed to be important and was verified by considering their model for flows in a duct and on a wavy-wall. For large eddy simulation (LES), subgrid modeling assisted by machine learning was proposed by Maulik et al. (2019b). They showed the capability of machine learning assisted subgrid modeling in both a priori and a posteriori tests for the Kraichnan turbulence. Furthermore, machine learning is proving itself as a promising tool for developing reduced order models (ROM). For instance, Murata et al. (2020) proposed nonlinear mode decomposition using an autoencoder (Hinton & Salakhutdinov 2006) based on convolutional neural networks (CNNs) (LeCun et al. 1998) and demonstrated its use on transient and asymptotic laminar cylinder wakes at Re D = 100. Their method shows a great potential of autoencoder in terms of the feature extraction of flow fields in lower dimension. More recently, Hasegawa et al. (2020) combined a CNN and the long short term memory (LSTM) (Hochreiter & Schmidhuber 1997) for developing an ROM for a two-dimensional unsteady wake behind various bluff bodies. Although the aforementioned examples deal with only laminar flows, the strengths of machine learning have been capitalized for reduced order modeling of turbulent flows. San & Maulik (2018) utilized an extreme learning machine (Huang et al. 2004) based on MLP for developing an ROM of geophysical turbulence. Srinivasan et al. (2019) used LSTM to predict temporal evolution of the coefficients of nineequation turbulent shear flow model. They demonstrated that the chaotic behavior of those coefficients can be reproduced well. They also confirmed that the statistics obtained from machine learning agreed well with the reference data. In addition to the above references, combinations of machine learning and fluid dynamics have attracted increasing attentions in the community (Maulik & San 2017;Maulik et al. 2019a;Lui & Wolf 2019;Salehipour & Peltier 2019;Wu et al. 2018).
Of particular interest here for fluid dynamics is the use of machine learning as a powerful approximator (Kreinovich 1991;Hornik 1991;Cybenko 1989;Baral et al. 2018), which can incorporate nonlinearities. We recently proposed a super resolution reconstruction method for fluid flows, which was tested with two-dimensional laminar cylinder wake and two-dimensional decaying homogeneous isotropic turbulence (Fukami et al. 2019a). We demonstrated that the high resolution two-dimensional turbulent flow field of 128 × 128 grid can be reconstructed from the input data of a coarse 4 × 4 grids via machine learning method. Applications and extensions of super resolution reconstruction can be considered for not only computational (Onishi et al. 2019;Liu et al. 2020) but also experimental fluid dynamics (Deng et al. 2019). Although these attempts showed the great potential of machine learning based super resolution methods to handle high-resolved fluid big data efficiently, their applicability has been so far limited only to two-dimensional spatial reconstruction.
Recently, a machine learning based temporal super resolution technique called inbetweening was demonstrated by Li et al. (2019) to estimate the snapshot sequences between the start and last frames for image and video processing. They tried to reconstruct 14 frames between two frames of videos using machine learning to save on storage. In the fluid dynamics community, a similar concept has recently been considered by Krishna et al. (2020) to temporal data interpolation for PIV measurement. They developed a model based on the rapid distortion theory and Taylor's hypothesis.
In the present study, we perform a machine learning based spatio-temporal super resolution analysis inspired by the aforementioned spatial super resolution and temporal inbetweening techniques to reconstruct high-resolution turbulent flow data from extremely low resolution flow data both in space and time. The present paper is organized as follows. We first introduce our machine learning based spatio-temporal super resolution approach in section 2 with a simple demonstration for a two-dimensional laminar cylinder wake at Re D = 100. We then apply the present method to two-dimensional decaying isotropic turbulence and turbulent channel flow over three-dimensional domain in section 3. The capability of machine learning based spatio-temporal super resolution method is assessed through various turbulence statistics. Finally, concluding remarks are provided in section 4.

Spatio-temporal super resolution flow reconstruction with machine learning
The objective of this work is to reconstruct high-resolution flow field data q(x HR , t HR ) from low-resolution data in space and time q(x LR , t LR ). To achieve this goal, we combine spatial super resolution analysis with temporal inbetweening. Super resolution analysis can reconstruct a spatially high-dimensional data from spatially low-dimensional input data, as illustrated in figure 1(a). Temporal inbetweening is able to find the temporal sequences between the first and the last frames in the time-series data, as shown in figure 1(b). We describe how one can combine these two reconstruction methods in section 2.2.
In the present study, we use a supervised machine learning model to reconstruct fluid flow data in space and time. For supervised machine learning, we prepare a set of input x and output (answer) y as the training data. We then train the supervised machine learning model with these training data such that a nonlinear mapping function y ≈ F(x; w) can be built, where w holds weights within the machine learning model. The training process here can be mathematically regarded as an optimization problem to determine the weights w such that w = argmin w [E(y, F(x; w))], where E is the loss (cost) function.
For the machine learning models for super resolution in space F x and time F t , we use a hybrid downsampled skip-connection and multi-scale (DSC/MS) model (Fukami et al. 2019a) presented in figure 2. The DSC/MS model is based on a convolutional neural network (CNN) (LeCun et al. 1998) which is one of the widely used supervised machine learning methods for image processing. Here, let us briefly introduce the mathematical framework for the CNN. The CNN is trained with filter operation such that q (l) where q (l) is an output at layer l, h is the filter, K is the number of variables per each position of data, and ϕ is an activation function which is generally chosen to be a monotonically increasing nonlinear function. In the present paper, we use the rectified linear unit (ReLU), ϕ(s) = max(0, s), as the activation function ϕ. It is widely known that the use of ReLU enables machine learning models to be stable during the weight update process (Nair & Hinton 2010). As shown in figure 2, the present machine learning model is comprised of two models: namely the downsampled skip-connection model (DSC) model shown in blue and the multiscale (MS) model shown in green. The DSC model is robust against rotation/translation of the objects within the input images by combining compression procedures and skip-connection structures (Le et al. 2010;He et al. 2016). On the other hand, the MS model (Du et al. 2018) is able to take multi-scale properly of the flow field account into its model structure. Readers are refereed to Fukami et al. (2019a) for additional details on the hybrid machine learning model and its capability for spatial super resolution reconstruction of two-dimensional turbulent flows. The DSC/MS model is utilized for both spatial and temporal data reconstruction in the present study.

Order of spatio-temporal super resolution reconstruction
For the reconstruction of the flow field, we can consider the following two approaches: (i) Apply the spatial super resolution model F * (ii) Apply the inbetweening model F * t : R nLR×mLR → R nLR×mHR , then the spatial super resolution model F x : R nLR×mHR → R nHR×mHR such that where n is a spatial dimension of data, m is a temporal dimension of data, tx is the error for the first case, and xt is the error for the second case. The subscripts LR and HR represent low-resolution and high-resolution variables, respectively. We seek the approach that achieves lower error between the above two formulations. The L p norms of these error are assessed as where x is an error from the spatial super resolution algorithm for the first case and t is an error from the inbetweening process for the second case. The difference in the magnitudes of the above errors depend on x and t . Since spatial super resolution algorithm is not a function of the temporal resolution algorithm in our problem setting, x is not affected by data coarseness in time. On the other hand, t is the error resulting from inbetweening with spatial low-resolution data which lacks the phase information more than a spatially highresolution data. For this reason, the error t is likely to be large due to the spatial coarseness. This leads us to first establish a machine learning model for spatio-temporal super resolution reconstruction as illustrated in figure 3 for the example of a cylinder wake. The cylinder flow example confirms the above trend for error. Each of these machine learning models is trained individually for the spatial and temporal super resolution reconstructions. In the supervised machine learning process for regression tasks, the training process is formulated as an optimization problem to minimize a loss function in an iterative manner. The objectives of the two machine learning models can be expressed as where w x and w t are the weights of the spatial and temporal super resolution models, respectively. In the present study, we use the L 2 norm to determine the optimized weights w for each of the machine learning models. Moreover, we use p = 2 for assessing the errors in what follows. We emphasize that the present approach goes for beyond traditional interpolation schemes, which are not suitable for reconstruction of physical phenomena with transport nature.

Demonstration: two-dimensional laminar cylinder wake
As a demonstration, let us apply the proposed formulation to the two-dimensional cylinder wake at Re D = 100. The snapshots for this wake are generated by two-dimensional direct numerical simulation (DNS) (  numerically solves the incompressible Navier-Stokes equations, Here u, p and Re D are the non-dimensionalized velocity vector, pressure and Reynolds number, respectively. For this example, we use five nested levels of multi-domains with the finest level being (  192,112). For the present study, we use 70% of the snapshots for training and the remaining 30% for validation. An early stopping criterion (Prechelt 1998) with 20 iterations of the learning process is also utilized to avoid overfitting such that the model retains generality for any unseen data in the training process.
The results of the preliminary examination with the undersampled cylinder wake data are summarized in figure 4. The machine learning models are trained by using n snapshot,x = 1000 for spatial super resolution and n snapshot,t = 100 for inbetweening. Here, the spatial super resolution model F x has the role of a mapping function from the low spatial resolution data q(x LR ) ∈ R 12×7 to the high spatial resolution data q(x HR ) ∈ R 192×112 . Next, two spatial high-resolved flow fields illustrated at t = 1∆t and 9∆t in figure 4 are used as the input for the temporal super resolution model F t , so that the inbetween snapshots from t = 2∆t to 8∆t corresponding to a period in time can be reconstructed. As shown in figure  4, the spatio-temporal super resolution analysis achieves excellent reconstruction of the flow field that is practically indistinguishable from the reference DNS data. The L 2 error norm = ||q DNS − q ML || 2 /||q DNS || 2 is shown in the middle of figure 4. The L 2 error level is approximately 5% of the reference DNS data. As the machine learning model is provided with the information at t = 1∆t and 9∆t, the error level shows slight increase between those two instances.

Results
3.1. Example 1: two-dimensional decaying homogeneous isotropic turbulence As the first example, let us consider two-dimensional decaying homogeneous isotropic turbulence. The training data set is obtained by numerically solving the two-dimensional vorticity transport equation, where u = (u, v) and ω are the velocity and vorticity, respectively (Taira et al. 2016). The size of the biperiodic computational domain and the numbers of grid points here are L x = L y = 1 and N x = N y = 128, respectively. The Reynolds number is defined as Re 0 ≡ u * l * 0 /ν, where u * is the characteristic velocity obtained by the square root of the spatially averaged initial kinetic energy, l * 0 is the initial integral length, and ν is the kinematic viscosity. The initial Reynolds numbers are Re 0 = u * (t 0 )l * (t 0 )/ν = 81.2 for training/validation data and 85.4 for test data. For the input and output attributes to the machine learning model, we use the vorticity field ω.
For spatio-temporal super resolution analysis of two-dimensional turbulence, we consider four cases comprised of two spatial and two temporal coarseness levels as shown in figure  5. For spatial super resolution analysis, we prepare two levels of spatial coarseness: medium-(16 × 16) and low resolution (8 × 8 grids) data, analogous to our previous work (Fukami et al. 2019a). These spatial low resolution data are obtained by an average downsampling of the reference DNS data set. For the temporal resolution set up, we define medium-(∆T = 1.0) and wide time step (∆T = 4.0), where ∆T is the time step between the first and last frames of the inbetweening analysis. Note here that the training data includes the low Taylor Reynolds number portion (Regime II in figure 5) so as to assess the influence on the decaying physics compared to Regime I. For the training process, we considered a fixed number of snapshots (n snapshot,x , n snapshot,t ) = (10000, 10000) for this two-dimensional example.
In the example of two-dimensional turbulence, the machine learning model for inbetweening analysis plays the role of a mapping function to reconstruct 8 snapshots between the first and last frames (given by the spatial reconstruction model). The flow fields reconstructed from spatio-temporal super resolution analysis of two-dimensional turbulence with various coarse input data are summarized in figure 6. On the left side, the reconstructed fields from Regime I with coarse data in space and time are shown. As shown, the temporal evolution of the complex vortex dynamics can be accurately reconstructed by the machine-learned models. For almost all cases, the L 2 error norms = ||q DNS − q ML || 2 /||q DNS || 2 listed below the contour plots for the spatially low-resolution input show larger errors compared to the medium-resolution case due to the effect of input coarseness in space. On the right side of the figure, we show the results from Regime II. Analogous to the results for Regime I, the reconstructed flow fields are in agreement with the reference DNS data. Noteworthy here is the peak L 2 error norm of 0.198 appearing at t = (n + 7)∆t for Regime II using low resolution input and a wide time step. This is in contrast with the other cases that give peak errors at t = (n + 4)∆t. This is caused by the periodic spatial boundary condition within the present machine learning model having to handle the temporal evolution over a bi-periodic domain with the relatively large structure (i.e., bottom left on the color map). Furthermore, the model is also affected by the error from the spatial super resolution reconstruction. For this reason, the peak in error here is shifted in time against the other cases in figure 6. To examine the dependence on the regime of test data with simple assessment, the timeensemble L 2 error norms of medium-and low spatial input cases are shown in figure 7. For all cases, the errors for Regime I are larger than those for Regime II. One of considerable reasons here is that the relative change in vortex structures for Regime II is less than that for Regime I, which we can see in figure 6. We also find that the reconstructions are affected by the input coarseness in space as evident from comparing figures 7(a) and (b).
Next, let us present the kinetic energy spectrum and the probability density function of the vorticity field ω for all coarse input cases with spatial and temporal reconstructions in figure  8. For comparison, we compute these statistics for Regimes I (purple) and II (yellow). The statistics with all coarse input data show similar distributions with the reference DNS trends. The high wavenumber region of the kinetic energy spectrum obtained from the reconstructed fields do not match with the reference curve due to the lack of correlation between the low and high wavenumber components. Besides, the effect of temporal coarseness for data reconstruction is larger than that of spatial coarseness in our problem setting, as it can be seen in figures 7 and 8. However, we should note that these observations do not imply that a temporal reconstruction is more challenging than a spatial reconstruction, since that depends highly on the problem settings.

Example 2: turbulent channel flow over three-dimensional domain
To investigate the applicability of machine learning based spatio-temporal super resolution reconstruction to three-dimensional turbulence, let us consider a turbulent channel flow at Re τ = 180 (Fukagata et al. 2006). The governing equations are the incompressible Navier-Stokes equations, where u = [u v w] T represents the velocity vector with components u, v and w in the streamwise (x), wall-normal (y) and spanwise (z) directions. Here, t is time, p is pressure, and Re τ = u τ δ/ν is the friction Reynolds number. The variables are normalized by the halfwidth δ in the channel and the friction velocity u τ . The size of the computational domain and the number of grid points here are (L x , L y , L z ) = (4πδ, 2δ, 2πδ) and (N x , N y , N z ) = (256, 96, 256), respectively. The grids in the x and z directions are taken to be uniform. A non-uniform grid is utilized in the y direction. As the baseline data, we prepare the data snapshots on a uniform grid interpolated from the non-uniform grid data of DNS. We also examine the influence of grid type in Appendix for completeness. A no-slip boundary condition is imposed on the walls and a periodic boundary condition is prescribed in the x and z directions. The flow is driven by a constant pressure gradient at Re τ = 180.
For the present study, subspace of the whole computational domain is extracted and used for machine learning, i.e., (L * x , L * y , L * z ) = (2πδ, δ, πδ), x, y, z ∈ [0, 128, 48, 128). Due to the symmetry of turbulence statistics in the y Figure 9. The problem set up for the present study. We consider two spatial coarseness with three temporal resolutions. Note that Q + = 0.005 and 0.07 are used for visualization of spatial and temporal resolutions, respectively. The plot on the upper right shows the temporal two-point correlation coefficients R at y + = 11.8 for the present turbulent channel flow.
direction and homogeneity in the x and z directions, the extracted subdomain maintains the turbulent characteristics of the channel flow over the original domain size. We generally use 100 training data set for both the spatial and temporal super resolution analyses in this case. The dependence of the reconstruction on the number of snapshots is investigated later. For the input and output attributes to the machine learning model, we use the velocity fields u = [u v w] T . We illustrate in figure 9 the problem setting of the spatio-temporal super resolution analysis for three-dimensional turbulence. Regarding the spatial resolution, medium-and low resolutions are defined as 16 × 6 × 16 and 8 × 3 × 8 grids in the x, y, and z directions, respectively. These coarse input data sets are generated by the average downsampling operation from the reference DNS data of 128 × 48 × 128 grids. Note that we are unable to detect the vortex core structures at Q + = 0.005 with low-resolution input in figure 9 due to the gross coarseness. As shown in figure 10, the vortex structure cannot be seen with a contour level of Q + = 0.07 with either medium or low spatial input data. For the inbetweening reconstruction, three time steps are considered; ∆T + = 12.6 (medium), 25.2 (wide), and 126 (super-wide time step) in viscous time units, where ∆T + is the time step between first and last snapshots. These ∆T + correspond to temporal two-point correlation coefficients at y + = 11.8 of R = R + uu (t + )/R + uu (0) ≈ 0.50 (medium), 0.25 (wide), and 0.05 (super-wide time step), respectively (Fukami et al. 2019b).
Let us summarize the reconstructed flow field visualized by the Q-criteria isosurface based on n snapshots,x = 100 in figure 10. The machine learning models are able to reconstruct the flow field from extremely coarse input data, despite the input data showing almost no vortex-core structures in the streamwise direction as shown in figure 10(a). We also present the velocity contours at a y − z section (x + = 1127) in figure 11. We hereafter report the L 2 error norms normalized by the fluctuation component = ||u i,DNS − u i,ML || 2 /||u i,DNS || 2 in this example to remove the influence on magnitude of each velocity attribute in the present turbulent channel flow. With both coarse input data, the reconstructed flow fields are in reasonable agreement with the reference DNS data in terms of the contour plots and the L 2 error norms listed below the reconstructed flow field. We also assess the turbulence statistics as summarized in figure 12. Noteworthy here is that the trends in wall-normal direction can be captured by the machine learning model from as little as 6 (medium) or 3 (low resolution)  grid points, as shown in figures 12(a) and (b). Regarding the kinetic energy spectrum at y + = 11.8, the maximum wavenumber k max in the streamwise and spanwise directions can also be recovered from the extremely coarse input data as presented in figures 12(c) and (d).
The high wavenumber components for both cases are underestimated due to the fact that the dissipation range has no strong correlation with the energy-containing region.
Next, let us combine the spatial super resolution reconstruction with inbetweening to obtain the spatio-temporal high-resolution data q(x HR , t HR ), as summarized in figure 13. We show in figure 13(a) only the results for the medium spatial resolution input. With the medium time step, the reconstructed flow fields show reasonable agreement with the reference DNS data in terms of both the Q isosurface and L 2 error norm listed below the isosurface plots. In contrast, the flow fields cannot be reconstructed with wide-and super-wide time steps due to the lack of temporal correlation, as summarized in figure 9. Although the vortex core can be somewhat captured with the wide time step at n + 2 and n + 7, the reconstructed flow fields are essentially attenuated since the machine-learned models for inbetweening are given only the information with low correlation at the first and last frames obtained from spatial super resolution reconstruction, as shown in figure 13(b). The time-ensemble L 2 error norm with all combinations of coarse input data in space and time are summarized in figure  13(c). It can be seen that the results with the machine-learned models are more sensitive to the temporal resolution than the spatial resolution level. This observation coincides with the previous example in section 3.1.
Let us demonstrate the robustness of the composite model against noisy input data for spatio-temporal super resolution analysis in figure 14. For this example, we use the medium spatially coarse input data with the medium time step. Here, the L 2 error norm for noisy input is defined as noise = ||q HR − F(q LR + κn)|| 2 /||q HR || 2 , where n is the Gaussian noise, κ is the magnitude of noisy input, and q HR is the fluctuation component of the reference velocity. The reported values on the right side of figure 14 are the ensemble-averaged L 2 error ratio against the original error without noisy input, / κ=0 . As it can be observed, the error increases with the magnitude of noise κ for both coarse input levels. The x − z sectional streamwise velocity contours from intermediate output of inbetweening at t = (n + 5)∆t  In the above discussions, we used 100 snapshots for both spatial and temporal super resolution analyses with three-dimensional turbulent flow. Here, let us discuss the dependence  of the results on the number of snapshots for the spatial super resolution analysis n snapshot,x and inbetweening n snapshot,t . The ensemble L 2 error norm with various number of snapshots is presented in figure 15. In the figure, we summarize the effect from the number of training snapshots on the spatial and temporal reconstructions. With up to n snapshot,t = 50, the L 2 errors are approximately 0.4 and the reconstructed fields cannot detect the vortex core structures from the Q-value visualizations, even if n snapshot,x is increased. For cases with n snapshot,t 100 with n snapshot,x 100, the errors drastically decreases. For this particular example with the three-dimensional turbulent channel flow, one hundred training data sets is the minimum requirement for recovering the flow field for both in space and time. These findings suggest that data sets consisting with as few as 100 snapshots with the appropriate spatial and temporal resolutions hold sufficient physical characteristics for reconstructing the turbulent channel flow. We also assess the computational costs with increasing number of training data for the NVIDIA Tesla V100 graphics processing unit (GPU), as shown in figure 16. The computational time per an iteration (epoch) linearly increases with the number of snapshots for both spatial and temporal super resolution analyses. Plainly speaking, complete training process takes approximately 3 days with {n snapshot,x , n snapshot,t } = {100, 100} and 15 days with {n snapshot,x , n snapshot,t } = {1000, 1000}. The computational costs for the full iterations can deviate slightly from the linear trend since the error convergence is influenced within the machine learning models due to early stopping.
Let us also discuss the challenges associated with the spatio-temporal machine learning based super resolution reconstruction. As discussed above, supervised machine learning model is trained to minimize a chosen loss function through an iterative training process. In other words, the machine learning models aim to solely minimize the given loss function, which can be a different objective than actually learning physics. We here discuss the dependence of the error in the real space and wave space.
The L 2 error distribution over each direction of turbulent channel flow is summarized in figure 17. The case with combination of medium spatial input (16 × 6 × 16 grids) with medium time step (∆T + = 12.6) is presented. As shown in figure 17, the errors at the edge of the domain in all directions are large. This may be due to the difficulty in predicting a temporal evolution over boundaries and padding operation of convolutional neural networks. Noteworthy here is the error trend in the wall-normal direction in figure 17(c). The errors for all attributes are high near the wall. One of possible reasons is the low probability of velocity attributes near wall region. Since the present machine learning models are trained with L 2 minimization as mentioned above, it is relatively tougher to predict those region than high probability for fluctuations.
We further examine how well the machine learning model performs over the wavenumber space. The kinetic energy spectrum at y + = 11.8 in the streamwise and spanwise directions of spatio-temporal super resolution reconstruction is shown in figure 18. For the input data, spatial medium resolution (16 × 6 × 16 grids) with medium-(∆T + = 12.6) and wide time steps (∆T + = 25.2) are considered. The L 2 error here is defined as With the medium time step, the error over the high-wavenumber space are higher than that over the low-wavenumber space. This observation agrees with the machine learning models being able to recover the lowwavenumber space from grossly coarse data seen in figure 12. With the case of wide time step, we can infer the influence of temporal coarseness, as discussed above. The  The machine learning models capture low-wavenumber components preferentially to minimize the reconstruction error.

Conclusions
We developed supervised machine learning methods for spatio-temporal super resolution analysis to reconstruct high-resolution flow data from grossly under-resolved input data both in space and time. First, a two-dimensional cylinder wake was considered as the preliminary demonstration. The machine-learned model was able to recover the data in space and reconstruct the temporal evolution from only the first and last frames.
As the first turbulent flow example, a two-dimensional decaying homogeneous isotropic turbulence was considered. In this example, we considered two spatial resolutions based on our previous work (Fukami et al. 2019a) and two different time steps to examine the capability of the proposed model. The reconstructed flow fields were in reasonable agreement with the reference data in terms of the L 2 error norm, the kinetic energy spectrum, and the probability density function of vorticity field. We also found that the machine-learned models were affected substantially by the temporal range of training data, i.e., between Regimes I and II.
We further examined the capability of the proposed method using a turbulent channel flow over three-dimensional domain at Re τ = 180. The machine learning based spatio-temporal super resolution analysis showed its great capability to reconstruct the flow field from grossly coarse input data in space and time when an appropriate time step size between the first and the last frames is used. The proposed method, however, was unable to recover the turbulent flow fields in time when the temporal two-point correlation coefficient was R + 0.25 because of the nature of supervised machine learning model whose weights are updated in an iterative manner based on the correlation between the input and the output. It was also seen that the machine learning models tend to preferentially extract the features in the low-wavenumber space so as to minimize a loss function efficiently. For improving the accuracy of the spatiotemporal super resolution analysis, we likely need to prepare a well-designed architecture which can take physics into account in its structure, e.g., loss function (Lee & You 2019;Raissi et al. 2020) and choice for input and output attributes, i.e., feature engineering. Such efforts will be undertaken in future work.
The robustness of the present model for noisy input and dependence on the amount of the training snapshots were also investigated. The proposed model showed reasonable capability for up to 10% noisy input in terms of both qualitative and quantitative assessments. We found that the flow field can be reconstructed by the machine learning-based methods with as little as 100 training data sets for both the spatial and temporal models.
We foresee a range of applications for the spatio-temporal super resolution analysis in fluid dynamics. For example, we may be able to leverage the current technique for large-eddy simulations as an augmentation tool. We may also be able to consider super resolution as a compression tool to store big fluid data. In fact, in the present paper, we can recover the three-dimensional turbulent channel flow which has 7864320 grid points in space and time, from low-resolution data which has 3072 grid points in space and time: approximately 0.04% in terms of data compression. Although these applications are just a few possible examples, we hope that the present paper would provide a hint for fluid dynamicists who try to analyze and handle big fluid flow data efficiently with data-oriented methods. filter operation of the convolutional neural network is not sensitive to the choice of the spatial discretization at least for this problem.