Sparse identification of nonlinear dynamics with low-dimensionalized flow representations

We perform a sparse identification of nonlinear dynamics (SINDy) for low-dimensionalized complex flow phenomena. We first apply the SINDy with two regression methods, the thresholded least square algorithm (TLSA) and the adaptive Lasso (Alasso) which show reasonable ability with a wide range of sparsity constant in our preliminary tests, to a two-dimensional single cylinder wake at $Re_D=100$, its transient process, and a wake of two-parallel cylinders, as examples of high-dimensional fluid data. To handle these high dimensional data with SINDy whose library matrix is suitable for low-dimensional variable combinations, a convolutional neural network-based autoencoder (CNN-AE) is utilized. The CNN-AE is employed to map a high-dimensional dynamics into a low-dimensional latent space. The SINDy then seeks a governing equation of the mapped low-dimensional latent vector. Temporal evolution of high-dimensional dynamics can be provided by combining the predicted latent vector by SINDy with the CNN decoder which can remap the low-dimensional latent vector to the original dimension. The SINDy can provide a stable solution as the governing equation of the latent dynamics and the CNN-SINDy based modeling can reproduce high-dimensional flow fields successfully, although more terms are required to represent the transient flow and the two-parallel cylinder wake than the periodic shedding. A nine-equation turbulent shear flow model is finally considered to examine the applicability of SINDy to turbulence, although without using CNN-AE. The present results suggest that the proposed scheme with an appropriate parameter choice enables us to analyze high-dimensional nonlinear dynamics with interpretable low-dimensional manifolds.


Introduction
Sparse identification of nonlinear dynamics (SINDy) (Brunton et al. 2016a) is one of the prominent data-driven tools to obtain governing equations of nonlinear dynamics in a form that we can understand.The SINDy algorithm enables us to discover a governing equation from a time-discretized data and identify dominant terms from a large set of potential terms that are likely to be involved in the model.Recently, the usefulness of the SINDy has been demonstrated in various fields (Champion et al. 2019b;Hoffmann et al. 2019;Zhang & Schaeffer 2019;Deng et al. 2020).Here, let us introduce some efforts, especially in fluid dynamics community.Loiseau et al. (2018b) utilized SINDy to present general reduced order modelling (ROM) framework for experimental data: sensordata and particle image velocimetry data.The model was investigated using a transient and post-transient laminar cylinder wake.They reported that the nonlinear full-state dynamics can be modeled with sensor-based dynamics and SINDy-based estimation for coefficients of ROM.The SINDy with taking into account control inputs (called SINDYc) was also investigated by Brunton et al. (2016b) using the Lorenz equations.Loiseau & Brunton (2018) combined SINDy and POD to enforce energy-preserving physical constraints in the regression procedure toward the development of a new data-driven Galerkin regression framework.For the time-varying aerodynamics, Li et al. (2019) identified vortex-induced vibrations on a long-span suspension bridge utilizing the SINDy algorithm extended to parametric partial differential equations (Rudy et al. 2017).As a novel method to perform the order reduction of data and SINDy simultaneously, there is a customized autoencoder (Champion et al. 2019a) introducing the SINDy loss in the loss function of deep autoencoder networks.The sparse regression idea has also recently been propagated to turbulence closure modeling purposes (Schmelzer et al. 2020;Beetham & Capecelatro 2020;Beetham et al. 2021;Duraisamy 2021).As reported in these studies, by employing the SINDy to predict the temporal evolution of a system, we can obtain ordinary differential equations, which should be helpful to many applications, e.g., control of a system.In this way, the propagation of the use of SINDy can be seen in the fluid dynamics community.
We here examine a possibility of SINDy-based modeling of low-dimensionalized complex fluid flows presented in figure 1.Following our preliminary tests with the van der Pol oscillator and the Lorenz attractor presented in Appendix, we apply the SINDys with two regression methods, 1. TLSA and 2. Alasso, to a two-dimensional cylinder wake at Re D = 100, its transient process, and a wake of two-parallel cylinders as examples of highdimensional dynamics.In this study, a convolutional neural network-based autoencoder (CNN-AE) is utilized to handle these high dimensional data with SINDy whose library matrix is suitable for low-dimensional variable combinations (Kaiser et al. 2018).The CNN-AE here is employed to map a high-dimensional dynamics into low-dimensional latent space.The SINDy is then utilized to obtain a governing equation of the mapped low-dimensional latent vector.Unifying the dynamics of the latent vector obtained via SINDy with the CNN decoder which can remap the low-dimensional latent vector to the original dimension, we can present the temporal evolution of high-dimensional dynamics and avoid the issue with high-dimensional variables as discussed above.The scheme of the present ROM is inspired by Hasegawa et al. (2019Hasegawa et al. ( , 2020a,b) ,b) who utilized a long short term memory instead of SINDy in the present ROM framework.The paper is organized as follows: we provide details of SINDy and CNN-AE in section 2. We present the results for high-dimensional flow examples with the CNN-AE/SINDy reduced-order modeling in sections 3.1, 3.2 and 3.3.In section 3.4, we also discuss its applicability to turbulence using the nine-equation turbulent shear flow model, although without considering the CNN-AE.At last, concluding remarks with outlook are offered in section 4.

Sparse identification of nonlinear dynamics (SINDy)
Sparse Identification of Nonlinear Dynamics (SINDy) (Brunton et al. 2016a) is performed to identify nonlinear governing equations from time series data in the present study.The temporal evolution of the state x(t) in a typical dynamical system can be often represented in a form of ordinary differential equation, ẋ(t) = f (x(t)). (2.1) To explain SINDy algorithm, let x(t) be (x(t), y(t)) hereinafter, although the SINDy can also handle the dynamics with higher dimensions as will be considered later.First, the temporally discretized data of x are collected to arrange a data matrix X, We then collect the time series data of the time-differentiated value ẋ(t) to construct a time-differentiated data matrix Ẋ, In the present study, the time-differentiated values are obtained with the second-order central-difference scheme.Note in passing that results in the present study are not sensitive to the differential scheme with appropriate time stpdf for construction of Ẋ.
Then, we prepare a library matrix Θ(X) including nonlinear terms of X, where X Pi is ith order polynomials constructed by x and y.The set of nonlinear potential terms here includes up to 5th order terms, although what terms are included can be arbitrarily set.We finally determine a coefficient matrix Ξ, in the state equation, using an arbitrary regression method, such as TLSA, Lasso, and so on.The subscript l in equation (2.5) denotes the row index of the library matrix.Once the coefficient matrix Ξ is obtained, the governing equation is identified as (2.7) In this study, we use the thresholded least square algorithm (TLSA) used in Brunton et al. (2016a) and the adaptive Lasso (Alasso) (Zou 2006) to obtain the coefficient matrix Ξ following our preliminary tests.Details can be found in Appendix.

Convolutional neural network-based autoencoder
We use a convolutional neural network (LeCun et al. 1998)-based autoencoder (Hinton & Salakhutdinov 2006) (CNN-AE) for order reduction of high-dimensional data, as shown in figure 2. As an example, we show the CNN-AE with three hidden layers.In a training process, the autoencoder F is trained to output the same data as the input data q such that q ≈ F(q; w), where w denotes the weights of the machine learning model.The process to optimize the weights w can be formulated as an iterative minimization of an error function E, w = argmin w [E(q, F(q; w))]. (2.8) For the use of autoencoder as a dimension compressor, the dimension of an intermediate space called the latent space η is smaller than that of the input or output data q as illustrated in figure 2. When we are able to obtain the output F(q) similar to the input q such that q ≈ F(q), it can be guaranteed that the latent vector r is a low-dimensional representation of its input or output q.In an autoencoder, the dimension compressor is called as the encoder F e (the left part in figure 2) and the counterpart is referred to as the decoder F d (the right part in figure 2).Using them, the internal procedure of the autoencoder can be expressed as r = F e (q), q = F d (r). (2.9) In the present study, the dimension of the latent space for the problems of a cylinder wake and its transient process is set to 2 following our previous work (Murata et al. 2020).Murata et al. (2020) reported that the flow fields can be mapped into a lowdimensional latent space successfully while keeping the information of high-dimensional flow fields.For the example of a two-parallel cylinder wake in section 3.3, the dimension of the latent space is set to 4 to handle its quasi-periodicity accordingly.For construction of the autoencoder, the L 2 norm error is applied as the error function E and the Adam optimizer (Kingma & Ba 2014) is utilized for updating the weights in the iterative training process.Next, let us briefly explain the convolutional neural network.In the scheme of convolutional neural network, a filter h, whose size is H × H, is utilized as the weights w as shown in figure 3(a).The mathematical expression here is (2.10) where C = floor(H/2), q (l) is the output at layer l, and K is the number of variables in the data (e.g., K = 1 for black-and-white photo and K = 3 for RGB images).Although not shown in figure 3, b is a bias added to the results of filter operation.The activation function ψ is usually a monotonically increasing nonlinear function.The autoencoder can achieve more effective dimension reduction than the linear theory-based method, i.e., proper orthogonal decomposition (POD) (Lumley 1967), thanks to the nonlinear activation function (Milano & Koumoutsakos 2002;Murata et al. 2020;Fukami et al. 2020b).In the present study, a hyperbolic tangent function ψ(a) = (e a −e −a )•(e a +e −a ) −1 is adopted for the activation function following Murata et al. (2020).Moreover, we use the convolutional neural network to construct the autoencoder because the CNN is good at handling high-dimensional data with lower computational costs than fully-connected models, i.e., multi-layer perceptron, thanks to the filter concept which assumes that pixels of images have no strong relationship with those of far areas.Recently, the use of CNN has been emerged to deal with high dimensional problems including fluid dynamics (Fukami et al. 2020a(Fukami et al. , 2019b,c;,c;Omata & Shirayama 2019;Fukami et al. 2021a;Morimoto et al. 2020aMorimoto et al. , 2021;;Fukami et al. 2019aFukami et al. , 2021b;;Matsuo et al. 2021), although this concept was originally developed in the field of computer science.

CNN-SINDy based reduced-order modeling
By combining the CNN-AE and SINDy, we present the machine-learning-based reduced-order modeling (ML-ROM) as shown in figure 4. The CNN-AE first works to map a high-dimensional flow field into low-dimensional latent space.The SINDy is then performed to obtain a governing equation of the mapped low-dimensional latent vector.Unifying the predicted latent vector by the SINDy with the CNN decoder, we can obtain the temporal evolution of high-dimensional flow field as presented in figure 4.Moreover, the issue with high-dimensional variables and SINDy can also be avoided.Note again that the present ROM is inspired by Hasegawa et al. (2019Hasegawa et al. ( , 2020a,b) ,b) who capitalized on a long short term memory instead of SINDy.We apply this framework to a two-dimensional cylinder wake in section 3.1, its transient process in section 3.2, and a wake of two-parallel cylinders in section 3.3.In this study, we perform a five-fold cross-validation (Brunton & Kutz 2019) to create all machine learning models, although only the results of a single case will be presented for brevity.The sample code for the present reduced-order model is available from https://github.com/kfukami/CNN-SINDy-MLROM.

Example 1: periodic cylinder wake
Here, let us consider a two-dimensional cylinder wake using a two-dimensional direct numerical simulation (DNS).The governing equations are the incompressible continuity and Navier-Stokes equations, where u and p represent the velocity vector and pressure, respectively.All quantities are made dimensionless by the fluid density, the free-stream velocity and the cylinder diameter.The Reynolds number based on the cylinder diameter is Re D = 100.The size of the computational domain is L x = 25.6 and L y = 20.0 in the streamwise (x) and the transverse (y) directions, respectively.The origin of coordinates is defined at the center of the inflow boundary.A Cartesian grid with the uniform grid spacing of ∆x = ∆y = 0.025 is used; thus, the number of grid points is (N x , N y ) = (1024, 800).We use the ghost cell method (Kor et al. 2017) to impose the no-slip boundary condition on the surface of cylinder whose centre is located at (x, y) = (9, 0).In the present study, we utilize the flows around the cylinder as the training data set, i.e., 8.2 x 17.8 and −2.4 y 2.4 with 384,192).The fluctuation components of streamwise velocity u and As mentioned above, the SINDy is employed to identify the ordinary differential equations that govern the temporal evolution of the mapped vector obtained by the CNN encoder.The time history and the trajectory of the mapped latent vector r = (r 1 , r 2 ) are shown in figure 5.For the assessment of the candidate models, we integrate the differential equations to reproduce the time history.Note in passing that the indices based on L 2 error are not suitable since the slight difference between the period of reproduced oscillation and that of original one results in large L 2 errors for oscillating systems.We here use the amplitude and the frequency of oscillation to evaluate the similarity between the reproduced waveform and the original one.Hereafter, the error rate of the amplitude and the frequency are shown in the figures below for simplicity.The number of terms is also considered for the parsimonious model selection.
We consider two regression methods for SINDy, i.e., TLSA and Alasso, following our previous tests.Let us present the results of parameter search utilizing TLSA in figure 6 .Note here that we use 10000 mapped vectors for construction of a SINDy model.Using α = 1, the reproduced trajectory is in excellent agreement with the original one.However, there are many terms in the ordinary differential equations and some coefficients are too large although the oscillation are presented with a shallow range (−1, 1).The result with the TLSA here is likely because we do not use the data processing, which causes the large regression coefficients and non-sparse results.On the other hand, the reproduced trajectory with α = 10 converges to a point around (r 1 , r 2 ) = (0.6, 0), although the model becomes parsimonious.Since the predicted latent vector converges after t ≈ 4 as shown in the time history of figure 6, the temporal evolution of the flow field by the CNN decoder is also frozen as shown in figure 7. Other observation here is that there remains no terms in the ordinary differential equations with high threshold, i.e., α 50.
With Alasso as the regression method, we can find the candidate models with low errors at α = 2 × 10 −5 and 1 × 10 −4 as shown in figure 8.At α = 2 × 10 −5 , the ordinary differential equations consist of four coefficients in total and the reproduced trajectory gradually converges as shown.On the other hand, by choosing the appropriate sparsity constant, i.e., α = 1 × 10 −4 , the circle-like oscillating trajectory, which is similar to the reference, can be represented with only two terms.We then check the reproduced flow field by the combination of the predicted latent vector by SINDy and CNN decoder, as shown in figure 9.The temporal evolution of high-dimensional dynamics can be reproduced well using the proposed model, although the period of two fields are slightly different.
The quantitative analysis with the probability density function (PDF), the mean streamwise velocity at y = 0, and the time history of energy reconstruction rate is   summarized in figure 10.The energy reconstruction rate R is defined as where (u Rep , v Rep ) and (u DNS , v DNS ) denote the fluctuation components of the reproduced velocity and the original velocity, respectively.For the PDF, the distribution of CNN-SINDy model is in great agreement with the reference DNS data.Since the SINDy model of the present case integrates the latent vector via the CNN-AE which can map a high-dimensional flow field into low-dimensional space efficiently thanks to nonlinear activation function (Murata et al. 2020), the SINDy outperforms POD with 2 modes as long as the appropriate waveforms can be obtained.In other words, the obtained equation can be integrated stably.The success of the CNN-SINDy based modeling can be also seen with time-averaged velocity and energy containing rate in figure 10.

Example 2: transient wake of cylinder flow
As the second example for the combination of CNN and SINDy, we consider the transient process of a flow around a circular cylinder at Re D = 100.For taking into account the transient process, the streamwise length of computational domain and that of flow field data are extended to L x = 51.2 and 8.2 x 37, i.e., N * x = 1152, same setup with Murata et al. (2020).The time history and the trajectory in the latent space For SINDy, we use the data with t = [60, 160].The library matrix contains up to 5th order terms.Although equations, which can provide the correct temporal evolution of the latent vector, can be obtained with Alasso by including higher order terms, the model here is, of course, not sparse and difficult to interpret.
The parameter search results with TLSA for transient wake are summarized in figure 12. Since the trajectory here is simple as shown, we can obtain the governing equations, which provide the reasonable agreement with the reference, at α = 7 × 10 −2 .With a higher sparsity constant, i.e., α = 0.7, the equation provides a temporal evolution of the latent vector with almost no oscillation.
Alasso is also considered as shown in figure 13.Although the number of terms is larger than that in the periodic shedding example, the trajectory can be reproduced successfully at α = 1.2 × 10 −8 .With a higher sparsity constant, e.g., α = 1.2 × 10 −2 , the developing oscillation is not reproduced due to a lack of number of terms.Noteworthy here is that the error for frequency is almost negligible in the range of 5 × 10 −6 α 10 −3 in spite of the high error rate regarding x 2 + y 2 .This is because the slight oscillation occurs with the similar frequency to the solution.Although the identified equation here is more complex than the well-known transient system with a two-dimensional paraboloid manifold (Loiseau et al. 2018a;Loiseau & Brunton 2018;Sipp & Lebedev 2007), it is likely caused by the complexity of captured information inside the AE modes compared to the POD modes (Murata et al. 2020).This observation enables us to notice the trade-off relationship between the used information associated with the number of modes and the sparsity of equation.Hence, taking a constraint for nonlinear AE modes to be handy, e.g.,

Example 3: two-parallel cylinders wake
To demonstrate the applicability of the present CNN-SINDy based reduced-order modeling to more complex wake flows, let us here consider a flow around two-parallel cylinders whose radii are different with each other (Morimoto et al. 2020b(Morimoto et al. , 2021)), as presented in figure 1.Unlike the wake dynamics of two identical circular cylinders as described in Kang (2003), the coupled wakes behind two side-by-side uneven cylinders exhibits more complex vortex interactions, due to the mismatch in their individual shedding frequencies.Training flow snapshots are prepared with a direct numerical simulation with the open-source CFD toolbox OpenFOAM (Weller et al. 1998), using second-order discretization schemes in both time and space.We arrange the two circular cylinders with a size ratio of r and a gap of gD, where g is the gap ratio and D is the diameter of the lower cylinder.The Reynolds number is fixed at Re D = 100.The two cylinders are placed 20D downstream of the inlet where a uniform flow with velocity U ∞ is prescribed, and 40D upstream of the outlet with zero pressure.The side boundaries are specified as slip and are 40D apart.The time stpdf for the DNS and the snapshot sampling are respectively 0.01 and 0.1.A notable feature of wakes associated with the present two-cylinder position is complex wake behavior caused by varying the size ratio r and the gap ratio g (Morimoto et al. 2020b(Morimoto et al. , 2021)).In the present study, the wake with the combination of {r, g} = {1.15,2.0}, which is a quasi-periodic in time, is considered Prior to the use of SINDy, let us determine the size of latent variables n r of this example, as summarized in figure 15.We here construct AEs with two n r cases of 3 and 4. As presented in figure 15(a), the recovered flow fields through the AE-based low-dimensionalization are in reasonable agreement with the reference DNS snapshots.Quantitatively speaking, the L 2 error norms between the reference and the decoded fields are approximately 14% and 5% for each case.Although both models achieve the smooth spatial reconstruction, we can find the significant difference by focusing on their temporal behaviors in figure 15(b).While the time trace with n r = 4 shows smooth variations for all four variables, the curves with n r = 3 exhibit a shaky behavior despite the quasiperiodic nature of the flow.This is caused by the higher error level of the n r = 3 case.Furthermore, it is also known that a high-error level due to an overcompression via AE is highly influential for a temporal prediction of them since there should be exposure bias (Endo et al. 2018) occurred by a temporal integration (Hasegawa et al. 2020b,a;Nakamura et al. 2021).In what follows, we use the AE with n r = 4 for the SINDy-based reduced-order modeling.
Let us perform SINDy with TLSA and Alasso for the AE latent variables with n r = 4.For the construction of a library matrix of this example, we include up to third-order terms in terms of four latent variables.The relationship between the sparsity constant α and the number of terms in identified equations is examined in figure 16.Similarly to the previous two examples with a single cylinder, the trends for TLSA and Alasso are different with each other.Note that this map can only be referred to check the sparsity trend of the identified equation since there is no model equation.To validate the fidelity of the equations, we need to integrate identified equations and decode a flow field using the decoder part of AE.The identified equations via the TLSA-based SINDy are investigated in figure 17.As examples, the cases with α = 0.1 and 0.5 are shown.The case with α = 1 which contains nonlinear terms shows a diverging behavior at t ≈ 90.The sparse equation obtained with α = 0.5 can provide stable solutions for r 1 and r 2 ; however, its sparseness causes the unstable integration for r 3 and r 4 .Note in passing that the TLSA-based modeling cannot identify the stable solution although we have carefully examined the influence on the sparsity factors for other cases not shown here.
We then use Alasso as a regression function of SINDy, as summarized in figure 18.The cases with α = 5 × 10 −7 and 10 −6 are shown.The identified equation with α = 5 × 10 −7 can successfully be integrated and its temporal behavior is in good agreement with the reference curve provided by the AE-based latent variables.However, what is notable here is that the equation with α = 10 −6 shows the decaying behavior with increasing the integration window despite that the equation form is very similar to the case with α = 5 × 10 −7 .Some nonlinear terms are eliminated due to the slight increase in the sparsity constant.These results indicate the importance of the careful choice for both regression functions and the sparsity constants associated with them.
Based on the results above, the integrated variables with the case of Alasso with α = 5 × 10 −7 are fed into the CNN decoder, as presented in figure 19.The reproduced flow fields are in reasonable agreement with the reference DNS data, although there is a slight offset due to numerical integration.Hence, the present reduced-order model is able to achieve a reasonable wake reconstruction by following only the temporal evolution of low-dimensionalized vector and caring the selection of parameters, which is akin to the observation of single cylinder cases.

Outlook: nine-equation shear flow model
One of the remaining issues of the present CNN-SINDy model is the applicability to flows where a lot of spatial mode are required to reconstruct the flow, e.g., turbulence.With the current scheme for the CNN-AE, it is tough to compress turbulent flow data while keeping the information of high-dimensional dynamics (Murata et al. 2020).We have also recently reported that the toughness for turbulence low-dimensionalization is still remaining even if we use a customized autoencoder, although it can achieve the better mapping than that by conventional AE and POD (Fukami et al. 2020c).In addition, the number of terms in a coefficient matrix must be drastically increased for the turbulent case unless the flow can be expressed with few number of modes.Hence, our next question here is "Can we also use SINDy if a turbulent flow can be mapped into few number modes?".In this section, let us consider a nine-equation turbulent shear flow model between infinite parallel free-slip walls under a sinusoidal body force (Moehlis et al. 2004) as the preliminary example for the application to turbulence with low number modes.
In the nine-equation model, various statistics, including mean velocity profile, streaks, and vortex structures, can be represented with only nine Fourier modes u j (x).Analogous to POD, the flow fields can be mathematically expressed with superposition of temporal coefficient and mode such that u(x, t) = 9 j=1 a j (t)u j (x). (3.4) Here, nine ordinary differential equations for the nine mode coefficients are as follows: where ζ, µ, and γ are constant values, These coefficients are multiplied to corresponding Fourier modes which have individual role to reconstruct a flow, e.g., basic profile, streak, and spanwise flows.We refer enthusiastic readers to Moehlis et al. (2004) for details.
In this section, we aim to obtain the coefficient matrix for the simultaneous time differential equations (4.4) to (4.12) using SINDy.In the following, let us consider the Reynolds number based on the channel half height δ and laminar velocity U 0 at a distance of δ/2 from the top wall set to Re = 400.The initial condition for numerically integration of the equations above is (a 0 1 , a 0 1 , a 0 2 , a 0 3 , a 0 4 , a 0 5 , a 0 6 , a 0 7 , a 0 8 , a 0 9 ) = (1, 0.07066, −0.07076, 0, 0, 0, 0, 0) with a random small perturbation for a 4 , which is The SINDy in this section is also performed with TLSA and Alasso following the  discussions above.Since the equations for the temporal coefficients are constructed up to 2nd order terms, the coefficient matrix also includes up to 2nd order terms, as shown in figure 21(a).The total number of terms considered here is 55.The results with TLSA and Alasso are summarized in figure 21(b).The matrices located in the right area correspond to the coefficient matrix in figure 21(a).Similar to the results above, the model using TLSA has some huge values due to lack of penalty terms.Especially, the effects by overfitting can be seen for the low-order portion.On the other hand, the remarkable ability of the SINDy can be seen with the Alasso.By giving the appropriate sparsity constant α, the governing equations can be represented successfully.The details of each magnitude of coefficients are shown in figure 22.It is striking that the dominant terms are perfectly captured by using SINDy, although the magnitudes are slightly different.These noteworthy results indicate that a governing equation of low-dimensionalized turbulent flows can be obtained from only time series data by using SINDy with appropriate parameter selections.In other words, we may also be able to construct a machine-learning based reduced-order model for turbulent flows with the interpretable sense as a form of equation, if a well-designed model for mapping into low-dimensional manifolds can be constructed.
The robustness of SINDy against noisy measurements observed in the training pipeline for this example is finally assessed in figure 23.We here introduce the Gaussian perturbation for the training data as; where f is a target vector, γ is the magnitude of the noise, and n denotes the Gaussian noise.As the magnitude of noise, four cases γ m = {0.05,0.10, 0.20, 0.40} are considered.With γ = 0.05, SINDy is still be able to identify the equation accordingly.However, the first-order terms are eliminated and unnecessary high-order terms are added with increasing the noise magnitude of the training data, although the whole trend of coefficient matrix can be captured.These results imply that care should be taken for noisy observations in training data depending on problem setting users handle, although the SINDy can guarantee its robustness for a slight perturbation.

Conclusion
We performed a sparse identification of nonlinear dynamics (SINDy) for lowdimensionalized fluid flows and investigated influences of the regression methods and parameter considered for construction of SINDy-based modeling.Following our preliminary test, the SINDys with two regression methods, the TLSA and the Alasso, were applied to the examples of a wake around a cylinder, its transient process, and a wake of two-parallel cylinders, with a convolutional neural network-based autoencoder (CNN-AE).The CNN-AE was employed to map a high-dimensional flow data into a two-dimensional latent space using nonlinear functions.Temporal evolution of the latent dynamics could be followed well by using SINDy with an appropriate parameter selection for both examples, although the required number of terms for the coefficient matrix are varied with each other due to the difference in complexity.
At last, we also investigated the applicability of SINDy to turbulence with lowdimensional representation using a nine-shear flow model.The governing equation could be obtained successfully by utilizing Alasso with the appropriate sparsity constant.The results indicated that machine-learning based ROM for turbulent flows can be perhaps presented with the interpretable sense as a form of equation, if a well-designed model for mapping high-dimensional complex flows into low-dimensional manifolds can be constructed.With regard to the low-dimensional manifolds of fluid flows, the novel work by Loiseau et al. (2018a) discussed the nonlinear relationship between lowerorder POD modes and higher-order counterparts of cylinder wake dynamics by relating to a triadic interaction among POD modes.We may be able to borrow the concept there to investigate the possible nonlinear correlations in the data, although it will be tackled in the future since it is tough to apply their idea directly to the nineequation shear flow model which is composed of more complex relationships among modes.Otherwise, combination with uncertainty quantification (Maulik et al. 2020), transfer learning (Guastoni et al. 2020a;Morimoto et al. 2020b), and Koopman-based frameworks (Eivazi et al. 2021) may also be possible extensions toward more practical applications.
The current results for CNN-SINDy modeling make us skeptical against applications of CNN-AE to the flows where a lot of spatial modes are required to represent the energetic field, e.g., turbulence, despite that recent several studies have shown its potential for unsteady laminar flows (Taira et al. 2020;Carlberg et al. 2019;Bukka et al. 2021;Xu & Duraisamy 2020;Agostini 2020).Since a few studies have also recently recognized the challenges of the current form of CNN-AE for turbulence (Fukami et al. 2020c;Nakamura et al. 2021;Glaws et al. 2020;Brunton et al. 2020), care should be taken for the choice of low-dimensionalization tools depending on flows users handle.Hence, the utilization of other well-sophisticated low-dimensionalization tools such as spectral POD (Towne et al. 2018;Abreu et al. 2020), t-distributed stochastic neighbor embedding (Wu et al. 2017), hierarchical AE (Fukami et al. 2020c), and locally linear embedding (Roweis & Lawrence 2000;Ehlert et al. 2019) can be considered to achieve better mapping of flow fields.Taking constraints for coordinates extracted via AE is also a candidate to promote the combination of AE modes and conventional flow control tools (Champion et al. 2019a;Jayaraman & Mamun 2020;Ladjal et al. 2019;Gelß et al. 2019).We finally believe that the results of our examples above enable us to have a strong motivation for future work and notice the remarkable potential of SINDy for fluid dynamics.

Appendix A. Regression methods for SINDy
As introduced in section 1, the regression method used to obtain the coefficient matrix Ξ is important since the resultant coefficient matrix Ξ may vary by the choice of regression method.Here, let us consider a typical regression problem, where P and Q denote the response variables and the explanatory variable, respectively.
In this problem, we aim to obtain the coefficient matrix β representing the relationships between P and Q.Using the linear regression method, an optimized coefficient matrix β can be found by minimizing the error between the left-hand and right-hand side such that The thresholded least square algorithm (TLSA), originally used in Brunton et al. (2016a) for performing SINDy, is based on this formulation in equation ( 4.2) and obtains the sparse coefficient matrix by substituting zero for the coefficient below the threshold α.
Its algorithm is summarized in algorithm 1.

5:
Change P and Q so that the components which were set to zero in the previous step will be output as zero.6: until the estimate β no longer changes Although the linear regression method has widely been used due to its simplicity, Hoerl & Kennard (1970) cautioned that estimated coefficients are sometimes large in absolute value for the non-orthogonal data.This is known as the overfitting problem.As a considerable surrogate method, they proposed the Ridge regression (Hoerl & Kennard 1970).In this method, the squared value of the components in the coefficient matrix is considered as the constraint of the minimization process such that where α denotes the weight of the constraint term.For uses of SINDy, the coefficient matrix β is desired to be sparse, i.e., some components are zero, so as to avoid construction of a complex model and overfitting.As the sparse regression method, the least absolute shrinkage and selection operator (Lasso) (Tibshirani 1996) was proposed.With Lasso, the sum of the absolute value of components in the coefficient matrix is incorporated to determine the coefficient matrix: If the sparsity constant α is set to a high value, the estimation error becomes relatively large while the coefficient matrix results in sparse.
The Lasso algorithm introduced above has, however, two drawbacks and there are solutions for each: the elastic net (Enet) and the adaptive Lasso (Alasso), as mentioned in introduction.Enet (Zou & Hastie 2005) combines L 1 and L 2 errors, i.e., those of Ridge and Lasso.The optimal coefficient β is obtained as The following two parameters are set for the Enet: the sparsity constant α = α 1 + α 2 and the sparse ratio α 1 /(α 1 + α 2 ), respectively.The property for the sparsity constant α is the same as that of Lasso: a higher α results in large error and sparse estimation.The sparse ratio is the ratio of L 1 and L 2 errors as seen its definition.The sparse model can be obtained with a L 1 ratio, although the collinearity problem may not be solved.
Another solution, the adaptive Lasso (Zou 2006), uses adaptive weights for the penalizing coefficients in L 1 penalty term.In the adaptive Lasso, β are given by where w j = (|β j |) −δ denotes the adaptive weight with δ > 0. The use of w j enables us to avoid the issue of multiple local minima.The weights w j and coefficients β are updated repeatedly like the algorithm 2: Algorithm 2 Adaptive lasso 1: α ← − a certain value (set the sparsity constant).

7:
w j ← − (|β j |) −δ .8: until the estimate β no longer changes In general it is necessary to choose the appropriate value for the threshold or the sparsity constant α in order to obtain a desired equation which is sparse and in an interpretable form.Hence, we focus on seeking the appropriate α of each regression method in this study.The algorithm for the parameter search applied to the current study is described in algorithm 3. Regarding the assessment ways used in the procedures  Perform SINDy to obtain the ordinary differential equation at a certain α.

5:
Assess the candidate model with an evaluation index.6: end for 7: Choose the best model by the assessment.8: Model is identified.5 and 7 in algorithm 3, number of total remained terms of the obtained equation are utilized as the evaluation index for the preliminary test.For cylinder examples, the number of total terms, the error in the amplitudes of obtained wave forms, and the error ratio in the period between the obtained wave and the solution are assessed.
where κ > 0 is a constant parameter and we set κ = 2 in this preliminary test.The trajectory converges to a stable limit cycle determined by κ under any initial state except for the unstable point (x = 0, y = 0) as shown in figure 24.
Let us perform SINDy with each of the four regression methods, i.e., TLSA, Lasso, Enet and Alasso, with various sparsity constant α.Here, we use 20000 discretized points as the training data sampled within t = [0, 200] with the time step ∆t = 1 × 10 −2 , although we will investigate the dependence of SINDy performance on both the length of time step and the length of training data range later.The library matrix is arranged including up to the 5th order terms.In this example, we assess candidate models by using the total number of terms contained in the obtained ordinary differential equation.As denoted in equations 4.1 and 4.2, the true model has four non-zero terms in total in two governing equations.The relationship between the total number of terms in the candidate models and α is shown in figure 25.As shown, the uses of Lasso and Enet cannot reach the true model whose total number of terms is four.On the other hand, using TLSA or Alasso, the correct governing equation can be identified.It is striking that we should choose the appropriate parameter α for the correct model identification, as shown in figure 25.In this sense, the Alasso is suitable for the problem here since a wide range of α, i.e., 10 −9 α 10 −2 , can provide the correct equation.The difference here is likely due to the scheme of each regression.The TLSA is unable to discard the components of β with low α values: in other words, a lot of terms are remained through the iterative process.On the other hand, the Alasso can identify the dominant terms clearly thanks to adaptive weights even if α is relatively low.
To investigate the dependence on the time step of the input data, SINDy is performed to the data with different time stpdf ∆t = 1 × 10 −2 , 0.1 and 0.2.Here, we consider only TLSA and Alasso following the aforementioned results.Note that the number of input data is different due to time step since the integration time length is fixed to t max = 200 in all cases.We have checked in our preliminary tests that this difference has no significant effect on the results.The trajectory, the results for the parameter search, and the example of obtained equations are summarized in figure 26.With the fine data, i.e., ∆t = 1 × 10 −2 , the governing equations are correctly identified with some α for both methods.The true model can be found even if we use very small α, i.e., α = 1 × 10 −8 with Alasso as mentioned before.With increasing ∆t, i.e., ∆t = 0.1, the coefficients of terms are slightly underestimated.Additionally, the range of α where the number of terms is correctly identified as four becomes narrower with both methods.With a wider time step, i.e., ∆t = 0.2, the dominant terms can be found only with Alasso.On the other hand, the true model cannot be obtained with further wider time step data, i.e., ∆t = 0.5, even with Alasso, due to the temporal coarseness as shown in figure 26(d).It suggests that the data with appropriate time step is necessary for construction of the model.In sum, Alasso is superior in terms of the ability to correctly identify the dominant terms for a wider range of sparsity constant.Let us then investigate the dependence of the performance on the length of training data range, as demonstrated in figure 27.We here consider three length of training data range as (a) t = [0, 8], (b) t = [0, 12], and (c) t = [0, 200] (baseline), while keeping the time step ∆t = 1 × 10 −2 .As shown, the trends of the relationship between the sparsity constant and the number of terms in the identified equation are similar over the covered time length.In addition, the SINDys with the TLSA (α = 1 × 10 −2 ) and Alasso (α = 1 × 10 −3 ) are able to identify the equations successfully for all time length cases.Only in the case of t = [0, 8], the model identification is not performed well with the Alasso around α = 1 × 10 −8 .The time length has few effects to model identification if the time length is more than one period.
For considering models in which higher order terms may be dominant due to higher nonlinearity, it should be valuable to find the way to select the appropriate potential terms that should be included in the library matrix.From this view, the dependence on the number of potential terms is investigated using the data with ∆t = 1 × 10 −2 .In the case where the maximum order of the potential terms is 5th, the true model can be found with two regression methods, i.e., TLSA and Alasso, as shown in figure 26(a).Here, the case with higher maximum order, i.e., 15th and 20th, are also investigated as shown in figure 28.Note that there are 136 and 231 potential terms for each differential equation in these cases.For both cases with different number of potential terms, we cannot find the true model with TLSA while the true model is identified in a wide range of α with Alasso.Furthermore, in the case including up to 15th terms, some coefficients in the equation obtained by TLSA become huge.These large coefficients are due to the different functions included in the library being almost colinear.

Pre-test 2: Lorenz attractor
As the second preliminary test, we consider the problem of identifying a chaotic trajectory called the Lorenz attractor (Lorenz 1963).It is written in a form of nonlinear ordinary differential equations: Lorenz (1963) showed that the dynamics governed by the equations above shows chaotic behavior with the certain coefficients, i.e., (σ, ρ, ι) = (10, 28, 8/3) with the initial values (x, y, z) = (−8, 8, 27), as shown in figure 29.
As with the preliminary test of van der Pol oscillator, we consider four regression methods: TLSA, Lasso, Enet and Alasso, as shown in figure 30.Note that 10000 discretized points with ∆t = 1 × 10 −2 from the time range of t = [0, 100] are used as the training data and the potential terms up to 5th order are considered here.Similar to the van der Pol oscillator, we can obtain the true model whose total number of terms is seven with TLSA or Alasso, although Lasso or Enet cannot identify the equation correctly.
Comparing the two methods which can obtain the correct model, the range of α, where the governing equations are identified, is wider using Alasso.We also perform SINDy with different time stpdf as shown in figure 31.Note that the considered time range is   not changed depending on the time stpdf: in other words, the number of training data is changed per a considered time step, which is the same setting as that for the van der Pol example.Similarly to the case of van der Pol oscillator, we cannot reach the correct model using TLSA unless α is fine-tuned, and the coefficients are underestimated using Alasso with the wider time step data, i.e., ∆t = 2 × 10 −2 .Moreover, the dependence on how many orders are considered in the library matrix is investigated in figure 32.There are 286 or 816 potential terms for each differential equation with the case including up to 10th or 15th order terms, respectively.Analogous to the van der Pol oscillator example, the true model cannot be identified with TLSA; however, we can identify the true equation using Alasso even if there are as many as 816 potential terms with an appropriate range of α.Based on these results, TLSA and Alasso are selected as candidate regressions for high-dimensional complex flow problems.
In addition, we here note that the use of a simple data processing (e.g., the standardization and the normalization) for both AE and SINDy pipelines may help Lasso and Enet to improve the regression ability, although it highly relies on a problem setting.Taking the data processing usually limits the model for practical uses because these data processing can only be successfully employed when we can guarantee an ergodicity, i.e., the statistical features of training data and test data are common with each other (Guastoni et al. 2020b;Morimoto et al. 2020b).When we perform the data processing, an inverse transformation error through the normalization or standardization is usually added since parameters for the processing, e.g., minimum value, max value, average value, and variance, may be changed over the training and test phases.Hence, toward practical applications, it is better to prepare the regression method which is robust for the amplitudes of data oscillations.

Figure 1 .
Figure 1.Covered examples of fluid flows in the present study.

Figure 3 .
Figure 3. Internal procedure of convolutional neural network.(a) Filter operation with activation function.(b) Max pooling and upsampling.

Figure 5 .
Figure 5. Latent vector dynamics of a periodic shedding case: (a) time history and (b)trajectory.

Figure 6 .
Figure 6.Results with TLSA for the periodic cylinder example.Parameter search results, examples of obtained equations, and reproduced trajectories with α = 1 and α = 10 are shown.Color of amplitude plot indicates the total number of terms.In the ordinary differential equations, the latent vector components (r1, r2) are represented by (x, y) for the distinctness.

Figure 7 .
Figure 7. Temporally evolved flow fields of DNS and the reproduced flow field with α = 1 and α = 10.With α = 10, the flow field is frozen after around t = 4 (enhanced using blue portion).

Figure 8 .
Figure 8. Results with Alasso of a periodic cylinder example.Parameter search results, examples of obtained equations, and reproduced trajectories with α = 2 × 10 −5 and α = 1 × 10 −4 are shown.Color of amplitude plot indicates the total number of terms.In the ordinary differential equations, the latent vector components (r1, r2) are represented by (x, y) for the distinctness.

Figure 9 .
Figure 9.Time history of the latent vector and temporal evolution of the wake of DNS and the reproduced field at (a) t = 1025, (b) t = 1026, (c) t = 1027 and (d) t = 1028.

Figure 10 .
Figure 10.Evaluation of reproduced flow field with a periodic shedding.(a) Probability density function, (b) mean streamwise velocity at y = 0, and (c) the energy reconstruction rate (ERR).Simple moving average (SMA) of ERR is shown here for the clearness.

Figure 11 .
Figure 11.Latent vector dynamics of the transient process.(a) time history and (b) trajectory.

Figure 12 .
Figure 12. Results with TLSA of the transient example.Parameter search results, examples of obtained equations, and reproduced trajectories with α = 7 × 10 −2 and α = 0.7 are shown.Color of amplitude plot indicates the total number of terms.In the ordinary differential equations, the latent vector components (r1, r2) are represented by (x, y) for the distinctness.

Figure 13 .Figure 14 .Figure 15 .
Figure 13.Results with Alasso of the transient example.Parameter search results, examples of obtained equations, and reproduced trajectories with α = 1.2 × 10 −8 and α = 1.2 × 10 −2 are shown.Color of amplitude plot indicates the total number of terms.In the ordinary differential equations, the latent vector components (r1, r2) are represented by (x, y) for the distinctness.(a)(b) (c) (d)

Figure 16 .Figure 17 .Figure 18 .
Figure16.The relationship between the sparsity constant α and the number of terms in identified equations via SINDys with TLSA and Alasso for the two-parallel cylinder wake example.TLSA, α = 0.1

Figure 19 .
Figure 19.Reproduced fields with the CNN-SINDy based reduced-order modeling of the two-parallel cylinders example.The case of Alasso with α = 5 × 10 −7 is used for SINDy.DNS flow fields are also shown for the comparison.

Figure 20 .
Figure 20.(a) Pairwise correlations of nine coefficients.(b) Example contours of velocity ux and velocity uy at midplane.(c) Temporal evolution of amplitudes ai.

Figure 21 .
Figure 21.SINDy for the nine-equation shear flow model.(a) Schematic of coefficient matrix β.(b) Relationship between the sparsity constant α and the number of terms with the obtained coefficient matrices.

Figure 22 .
Figure 22.Comparison of the governing equation for temporal coefficients.
Figure 23.Noise robustness of SINDy for the nine-shear turbulent flow example.

Algorithm 3
Parameter search 1: Prepare data X and Ẋ. 2: Prepare the set of sparse parameters α. 3: for all α do

Figure 24 .
Figure 24.Dynamics of van der Pol oscillator with κ = 2: (a) time history and (b) trajectory on x − y plane.The initial point is set to (x, y) = (−2, 5) in this example.

Figure 25 .
Figure 25.Relationship between α and the number of terms, and the examples of obtained equations for four regression methods.

Figure 28 .
Figure 28.Relationship between α and number of terms and the examples of obtained equations for data for different library matrices: (a) including up to 15th potential terms and (b) including up to 20th potential terms.

Figure 29 .
Figure 29.Dynamics of the Lorenz attractor: (a) time history and (b) trajectory.

Figure 30 .
Figure 30.Relationship between α and the total number of terms, with the examples of obtained equations.

Figure 31 .
Figure 31.Trajectory on x − y plane, the relationship between α and number of terms, and the examples of obtained equations for data for different time stpdf: (a) ∆t = 1 × 10 −2 and (b) ∆t = 2 × 10 −2 .

Figure 32 .
Figure 32.Relationship between α and number of terms, and the examples of obtained equations for data for different library matrices: (a) including up to 10th potential terms and (b) including up to 15th potential terms.