1. Introduction
Creating an energy source by means of harnessing energy released in nuclear fusion reactions has been a long outstanding goal in energy research. The prospect of clean energy production with virtually inexhaustible fuel reserves has motivated the efforts to overcome the many engineering challenges and to fill the gaps in our understanding of relevant fundamental processes.
For practical fusion applications hydrogen isotopes – deuterium and tritium – need to be heated to extreme temperatures at which the fuel becomes a high-temperature plasma. The plasma can be thermally insulated and confined by a strong magnetic field, which is why the most advanced fusion concepts rely on magnetic confinement. The most promising is the tokamak: a device with a toroidal geometry featuring closed magnetic flux surfaces traced out by the field lines. The tokamak field geometry reduces the loss of charged particles in the direction perpendicular to the confining field. The experiments have shown that the quality of plasma confinement scales with the confining field strength and that it is determined by mechanisms which drive cross-field transport from the core to the edge and, in this way, drive the energy loss from the system. Thus our capability of putting fusion energy on the grid has been limited by both our capability to build large-scale high-field fusion magnets and our understanding of cross-field transport mechanisms – in particular of turbulence. Massachusetts Institute of Technology's SPARC (Creely et al. Reference Creely, Greenwald, Ballinger, Brunner, Canik, Doody, Fülöp, Garnier, Granetz and Gray2020) tokamak currently under development will incorporate high-temperature superconductors (HTS) (Hartwig et al. Reference Hartwig, Vieira, Sorbom, Badcock, Bajko, Beck, Castaldo, Craighill, Davies and Estrada2020) into large-scale high-field fusion magnetsFootnote 1 and will realize a tokamak magnetic field strength over a factor of $2$ larger than presently possible. The higher magnetic field will improve confinement and overall performance but it will also put the reactor into an unexplored parameter range when it comes to turbulence in the plasma edge. For this reason, edge turbulence modelling and its reliable extrapolation to SPARC conditions is critical.
Nonlinear gyrokinetic models (Wang et al. Reference Wang, Lin, Tang, Lee, Ethier, Lewandowski, Rewoldt, Hahm and Manickam2006; Brizard & Hahm Reference Brizard and Hahm2007; Garbet et al. Reference Garbet, Idomura, Villard and Watanabe2010; Cheng et al. Reference Cheng, Dominski, Chen, Chen, Merlo, Ku, Hager, Chang, Suchyta and D'Azevedo2020) capture the microscopic plasma behaviour and can simulate the plasma dynamics but at a high computational cost. Computationally less expensive fluid models, in certain parameter ranges of interest, can be applied for modelling the large-scale plasma dynamics but at the cost of excluding potentially important microscopic processes. The development of new reduced plasma models that optimally balance between physical accuracy and computational complexity are therefore essential for modelling these systems. However, their theoretical development is hindered by the complexity and the nonlinearity of plasma turbulence where first principle derivations are difficult and often elusive.
Machine learning and data science tools are offering a new approach to develop these models based on the data. Valuable results for fusion plasmas have been achieved by using neural networks trained on extensive data bases either generated by kinetic codes (van de Plassche et al. Reference Van de Plassche, Citrin, Bourdelle, Camenen, Casson, Dagnelie, Felici, Ho and Van Mulders2020) or experimental data (Churchill et al. Reference Churchill, Tobias and Zhu2020). Such neural networks are fast and for this reason useful in practical applications such as routine simulation of tokamak discharge evolution and control. Limitations of the neural network approach are its necessity for an extensive training data base and the lack of interpretability of the learning algorithms. The extrapolation of the results to scenarios not originally present in the training data base present a challenge although significant efforts are being made to improve the extrapolation capabilities of neural networks (Raissi, Perdikaris & Karniadakis Reference Raissi, Perdikaris and Karniadakis2019; Moseley, Markham & Nissen-Meyer Reference Moseley, Markham and Nissen-Meyer2020; Moseley, Nissen-Meyer & Markham Reference Moseley, Nissen-Meyer and Markham2020). For these reasons the application of neural networks to complex systems modelling leads to reduced models of limited validity. Other data-driven methods exist which could potentially be applied to turbulence in fusion plasmas. Among others these include: nonlinear Laplacian spectral analysis (Giannakis & Majda Reference Giannakis and Majda2012), Koopman analysis (Mezic Reference Mezic2005, Reference Mezic2013), dynamic mode decomposition (Rowley et al. Reference Rowley, Mezic, Bagheri, Schlatter and Henningson2009; Schmid Reference Schmid2010; Kutz et al. Reference Kutz, Brunton, Brunton and Proctor2016), symbolic regression (Bongard & Lipson Reference Bongard and Lipson2007; Schmidt & Lipson Reference Schmidt and Lipson2009; Schmidt et al. Reference Schmidt, Vallabhajosyula, Jenkins, Hood, Soni, Wikswo and Lipson2011; Daniels & Nemenman Reference Daniels and Nemenman2015a,Reference Daniels and Nemenman2015b) and sparse regression (Brunton, Proctor & Kutz Reference Brunton, Proctor and Kutz2016; Rudy et al. Reference Rudy, Brunton, Proctor and Kutz2017).
In this paper we explore a novel data-driven approach for the discovery of reduced plasma models based on sparse regression (Brunton et al. Reference Brunton, Proctor and Kutz2016; Rudy et al. Reference Rudy, Brunton, Proctor and Kutz2017). Our approach also stems from machine learning methods and attempts to infer nonlinear differential equations from synthetic time-series data. Sparse regression in theory selects parsimonious models by balancing model accuracy and complexity, enabling identification of governing partial differential equations directly from the data. The identified equation terms are dominant features selected from a matrix which spans the feature space containing all equation candidate terms. Unlike the artificial neural network or other data-driven model discovery methods (DDMD), sparse regression does not require large amounts of data. Moreover, the output of this algorithm is an interpretable partial differential equation, which offers insight into the underlying physics and has a greater capability to generalize beyond the training data. In addition, the integral formulation has more recently been proposed to offer robustness to noise (Schaeffer & McCalla Reference Schaeffer and McCalla2017; Alves & Fiuza Reference Alves and Fiuza2020; Reinbold, Gurevich & Grigoriev Reference Reinbold, Gurevich and Grigoriev2020; Messenger & Bortz Reference Messenger and Bortz2021) crucial for applications to experimental data.
Sparse regression has been shown to successfully recover the fundamental hierarchy of plasma physics models from the kinetic Vlasov equation to single-fluid magneto- hydrodynamics from first principles (Alves & Fiuza Reference Alves and Fiuza2020; Kaptanoglu et al. Reference Kaptanoglu, Morgan, Hansen and Brunton2021). Here, we extend that work by applying it to resistive drift-wave turbulence simulations based on the Hasegawa–Wakatani (HW) and modified Hasegawa–Wakatani (mHW) models (Hasegawa & Mima Reference Hasegawa and Mima1978; Wakatani & Hasegawa Reference Wakatani and Hasegawa1984; Hasegawa & Wakatani Reference Hasegawa and Wakatani1987). These models couple the continuity and vorticity equations in reduced magnetohydrodynamics to describe drift-wave turbulence near the edge of a tokamak plasma. They are important yet simplified models and are therefore often used for benchmarking of fluid based turbulence simulations. The successful application of DDMD to HW and mHW shows the potential of this approach for inferring reduced but accurate descriptions of plasma turbulence which are directly relatable to analytic theory. We use the data generated by numerically solving the HW and mHW equations as input for our DDMD procedure and demonstrate that we can recover the correct set of coupled partial differential equations. To further probe the potential for applications to experimental data, we repeat the procedure at increasing noise levels and lay the groundwork for the development of feature space completeness analysis. Detailed description of our method as well as the HW turbulence model is given in § 2. In § 3 we present our main findings and explore the issue of feature space completeness. The discussion and conclusions are given in §§ 4 and 5.
2. Method
2.1. DDMD and sparse regression
In general, sparse (symbolic) regression is a type of machine learning algorithm which searches a predefined feature space in order to output mathematical expressions which best describe the input dataset. The objective is to use sparse regression to infer the underlying partial differential equations (PDEs) from the data. The inference of a PDE is done by regression on a predefined line up of candidate PDE terms – the feature space. The method used for this work is suited to be applied to time-series data with spatio-temporal variation, in particular data sets containing the evolution of the system dynamics in the nonlinear regime. The feature space $\varTheta$ is defined as a matrix in which each column represents a scaled feature: a numerical estimate of a variable, its derivatives, linear combinations of variables and derivatives or polynomial combinations of terms up to a specified order, over a data sample over time. The algorithm discovers both the model structure and the model parameters. In practice prior knowledge and physical intuition about the systems dynamics informs the design of candidate terms – features – included in the feature space. However, in theory, prior knowledge about the system dynamics is not used. This makes the approach potentially suitable for both model validation as well as probing for new underlying physics.
The balance between the discovered model accuracy and complexity is achieved via Pareto analysis that prevents overfitting. In practice this leads to the optimization favouring models with less selected features (equation terms) over the ones with more features at the same accuracy. The cost function being minimized is of the following general form:
where the variable $F$ is a quantity whose dynamics is governed by a PDE that we seek to discover. The terms of this equation are the features in $\varTheta$ selected by multiplication with the sparsest vector $\xi$ which contains the model parameters (coefficients in front of the corresponding terms in the PDE). The vector elements of $\xi$ are computed using a thresholded least-squares algorithm (Brunton et al. Reference Brunton, Proctor and Kutz2016), while its level of sparsity is obtained by means of a 30-fold cross-validation procedure. Vector $\lambda$ contains $100$ regularization parameters used in sequentially thresholded least-squares optimization of $\xi$. The values of the regularization parameters are logarithmically increased from $10^{-6}$ up to $1$ to obtain increasingly sparser solutions of $\xi$. Each candidate feature in $\varTheta$ and the time derivatives vector $\partial _t F$ are approximated by central finite differencing and integrated over volumes on the sampled data. The angle brackets denote averaging over the sampled volumes. Therefore the governing PDE is discovered in its integral form, this methodology has been described in detail in Alves & Fiuza (Reference Alves and Fiuza2020) where it is applied to a hierarchy of basic plasma physics models.
2.2. The HW model
In our particular case the input data are generated through direct numerical simulation of resistive drift-wave turbulence according to two-dimensional (2-D) HW and mHW models (Hasegawa & Mima Reference Hasegawa and Mima1978; Wakatani & Hasegawa Reference Wakatani and Hasegawa1984; Hasegawa & Wakatani Reference Hasegawa and Wakatani1987) in the $(x,y)$ plane perpendicular to the magnetic field along the $z$ axis. These models are a good test bed for our model discovery method for two main reasons. Firstly, these are relevant models providing a description of drift-wave turbulence near the tokamak plasma edge. Secondly, they are simple consisting of two coupled PDEs with a relatively small number of terms and therefore allow straightforward interpretation and checking of obtained results. The HW couples the continuity and vorticity equations in reduced MHD (Wakatani & Hasegawa Reference Wakatani and Hasegawa1984; Hasegawa & Wakatani Reference Hasegawa and Wakatani1987; Biskamp & Zeiler Reference Biskamp and Zeiler1995)
where $n$ and $\omega$ represent the electron density and vorticity,Footnote 2 respectively, $\phi$ stands for the electrostatic potential and the constants $\alpha$ and $\kappa$ are the electron adiabaticity and the background density gradient driver. The Poisson bracket for two functions of phase space and time, f and g, is defined in the usual manner $[f,g]=\hat {z} \times \boldsymbol {\nabla } f \boldsymbol {\cdot } \boldsymbol {\nabla } g$. Similarly, mHW couples the continuity and vorticity equations but captures the electron adiabatic response with the substitutions $n \rightarrow \tilde {n}$ and $\omega \rightarrow \tilde {\omega }$ where the new variables are defined by
and the equations being numerically solved become
Inclusion of the electron adiabatic response enables (for certain parameter values) the simulation of elongated asymmetric vortex modes, so-called, zonal flows.Footnote 3 The introduced substitutions as defined by (2.4) relaxes the coupling between the electric potential $\phi$ and the density $n$ for these modes as they are not subject to the coupling arising from the parallel momentum equation. Zonal flows are driven by nonlinear energy transfer from drift waves and represent a self-regulation agent for drift-wave transport and turbulence (Diamond et al. Reference Diamond, Itoh, Itoh and Hahm2005). Zonal flows play an important role in fusion plasmas because they suppress plasma turbulence and thus reduce the transport driven by it in turn improving plasma confinement.
It should be noted here that the standard drift-wave normalizations (Wakatani & Hasegawa Reference Wakatani and Hasegawa1984; Biskamp & Zeiler Reference Biskamp and Zeiler1995) were used: $\phi \longrightarrow (e\phi /T_e)L_n/\rho _s$,$n \longrightarrow (n/n_0)L_n/\rho _s$, $t \longrightarrow tc_s/L_n$, $\boldsymbol {\nabla }_{\perp } \longrightarrow \rho _s \boldsymbol {\nabla }_{\perp }$ and $\boldsymbol {\nabla }_{\parallel }\longrightarrow L_{\parallel }\boldsymbol {\nabla }_{\parallel }$, where $\rho _s$ is the ion Larmor radius, $n_0$ the background electron density, e the electron charge, $c_s$ the ion acoustic velocity, $\nu _{ei}$ the electron ion collision rate and the $L_n$ the scale length of the density background. The parallel scale length is defined by $L_{\parallel }=(L_n T_e/m_e c_s \nu _{ei})^{1/2}$.
2.3. Simulation set-up
The simulation outputs the changes in plasma density, vorticity and potential in the ($x,y$) plane. The size of the simulated domain was $16{\rm \pi} \times 16{\rm \pi}$. The temporal resolution was varied while keeping the total simulated time $t$ constant and sufficient to allow for nonlinear effects to develop and evolve. Once the input data sets are available what follows is setting up the data sampling routine. In our particular case the data were sampled with varying sampling density both spatially (distribution and number of sampling volumes) and temporally (time-series length). Uniform sampling was applied only in space while the time sampling was non-uniform/random within the specified time interval. This time sampling allows enables capturing both fast and long-time scale features. The sampling is a routine which may require adjustment tailored to the input data structure and the expected underlying physics. For some problems the uniform sampling might not be appropriate and one might choose to use threshold-based sampling or targeted sampling restricted to predefined regions of the phase space.
The features (the potential terms in the sought after PDE) need to be chosen and evaluated on the sampled data points. The choice of features to include in the $\varTheta$ matrix relies on the physical intuition about the problem at hand. Once the feature matrix is constructed, sequentially thresholded least-squares regression is applied to the sampled data in order to obtain the sparsity vector $\xi$ which reveals the retained features and the numerical values of their corresponding coefficients (for code structure diagram see figure 1). Plotting the fraction of variance unexplained (FVU) as a function of the number of retained features reveals the discovered PDE. The FVU is in this work defined as the ratio of the model's mean squared error and the variance of the partial time derivative of the dependent variable (density $n$ or vorticity $\omega$). Therefore, for the density equation we have the following definition of FVU:
and similarly for the vorticity we have
A significant reduction in FVU is observed for the optimal number of features necessary for describing the dynamics underlying the input data.
2.4. Structural similarity index
Successful regression requires that the sampled data set contains sufficient change in the system dynamics in the nonlinear regime. To quantify this change in our sample we use the so-called, structural similarity index (SSI). The index was originally defined for the purpose of image quality assessment based on degradation of structural information (Wang et al. Reference Wang, Bovik, Sheikh and Simoncelli2004). It is calculated for two given images and has the value in the range $[1,-1]$ where the extremes of the range indicate that the two images are identical (${\rm SSI}=1$) or completely different (opposite) (${\rm SSI}=-1$). The SSI for two images is defined by the following expression:
where $x$ and $y$ are two images of size $N \times N$ which are being compared based on the luminance $l$, contrast $c$ and structure $s$ functions weighed by the exponents $\alpha$, $\beta$ and $\gamma$. The luminance function $l(\boldsymbol {x},\boldsymbol {y})$ is given by
where $\mu$ denotes the average of all pixel values (mean luminance intensity) and the constant $C_1$ prevents instability for denominators approaching $0$. Similarly, the contrast function $c(\boldsymbol {x},\boldsymbol {y})$ is given by
where $\sigma$ denotes the standard deviation of all pixel values and $C_2$ is a constant. Finally, the structure function $s(\boldsymbol {x},\boldsymbol {y})$ is given by
where the correlation coefficient $\sigma _{xy}$ is equal to
Equations (2.9)–(2.13) are given here for the sake of completeness, for more details on the derivation and the SSI metric we refer the reader to the original reference (Wang et al. Reference Wang, Bovik, Sheikh and Simoncelli2004). Our data is a series of snapshots of the configuration space which can be converted to a series of images (see figure 2) to which a sequence of SSI can be attributed.
3. Results
3.1. Data generation and sampling
Simulation data were obtained by running the phw.f90 codeFootnote 4 on the MIT Engaging cluster. The code solves (2.2) and (2.3) in two dimensions for the density and vorticity. The simulated domain was temporally resolved with $\Delta t = 0.01$ ms and spatially resolved with $\Delta x = 0.19$ and $\Delta y = 0.19$ on a grid of size $256 \times 256$. The total simulated time allowing for the nonlinear behaviour to develop was $t = 1000$. Snapshots (time frames) of the density, potential and vorticity were collected and stacked together forming a long time series containing the evolution of the system over the total simulated time. Values of other input parameters used for the simulation data in this work are given in table 1. Prior to building of the feature matrix $\varTheta$, the dynamics captured by the data has to be sampled adequately in phase space and in time. Designing the sampling routine is done with the input data structure in mind. In this work the sampling is volumetric and uniform: volumes sampled in phase space are uniformly distributed over the simulation grid. Both the sample grid size and the length of the time series from which the data points are sampled were varied (see figure 4). The results of this sensitivity study are given in § 3.4.
3.2. Constructing the feature space $\varTheta$
Once the data sampling is complete, we proceed to construct the matrix $\varTheta$ containing the possible terms in the sought after PDE. Each column of the matrix represents a feature evaluated at each sampled data point. For our HW data, the primary features are the variables density $n$, electrostatic potential $\phi$ and vorticity $\omega$. Secondary features we considered are the spatial gradients of the primary features up to fourth order and the polynomial combinations of the primary and secondary features up to $n$ order. In total we considered $120$ features (columns of $\varTheta$) in our analysis. The second-order centred finite differencing is applied to sampled data neighbouring points and is used to evaluate the derivative terms. A 30-fold sequentially thresholded least-squares regression is applied to the sampled data divided into training, test and validation sets in order to obtain the sparsity vector.
3.3. Results of sparse regression
The 30-fold sequentially thresholded least-squares regression applied to our $\varTheta$ feature matrix yields the accuracy/complexity curves and the sparsest vector $\xi$. The accuracy/complexity curves show the dependence of FVU (accuracy) on the number of features retained (complexity), while the multiplication of $\varTheta$ and $\xi$, for given complexity or number of features retained, yields the features with their corresponding coefficients. The last step enables the reconstruction of the PDE system of (2.2) and (2.3). The $\varTheta$ matrix contained all features that were necessary to completely describe the dynamics in the input data. The success or failure of the procedure is indicated by the inflection in the accuracy/complexity curves, for the HW data, presented in figure 4. The FVU suddenly drops due to the thresholding of a dominant term, this marks the trade-off between model accuracy and complexity. The pronounced inflection in the accuracy/complexity curves is observed at the model complexity of $5$ terms for the density equation and at the model complexity of $4$ terms for the vorticity equation (see figure 3 and tables 2 and 3). This corresponds to the complete system of HW equations (2.2) and (2.3) given here in their explicit form for completeness
3.4. Sensitivity study
Once the regression has been shown to successfully recover the complete system of HW equations, a sensitivity study has been designed to test the method's robustness to three main aspects: spatial sampling domain, length of time series considered in the sampling and the noise level. Each of the aspects has been studied independently from the others; for each a reference regression set-up has been fixed and only the parameter directly related to the aspect studied has been varied.
3.4.1. Spatial domain size
Assessing the sensitivity to the spatial sampling domain size entailed reducing the domain size from the optimal, which covered most of the grid, to a size comparable to characteristic lengths in the data (such as the eddy dimensions) (see figure 4). An increase in the error on the model coefficients as well as a change in model-form variance (change in the sparsity vector $\xi$) was observed when reducing the spatial sampling domain size (see table 4). Here, the number of sampled points per sampling volume has not been varied in order to probe solely the effect of sampling domain size reduction. However, the results can be improved by increasing the number of sampled points per sampling volume until the mean coefficient error saturates (provided that the sampling domain size remains comparable to the characteristic lengths in the data). This would be analogue to the increase in number of points per volume for the case of clean data (see table 6). More details on this technique have been given in the supplementary material of Alves & Fiuza (Reference Alves and Fiuza2020).
3.4.2. Time-series length
Assessing the sensitivity to the sampled time-series length entailed gradually increasing the number of frames to sample from (see table 5) and establishing the minimal number necessary for recovery of the HW system of equations. This minimal length of the sampled time series is determined by the change or evolution of the system dynamics contained in it. The quantification of this change is not trivial and to our knowledge has not been examined in available literature on application of sparse regression to identification of PDE. However, it is important for establishing whether a sample is suitable for sparse regression and it is useful in reducing the sample size. The metric proposed and used in this work is the rate of change in the slope of the, previously introduced, SSIs calculated over samples of differing length. Namely, we set the reference image for SSI calculation to be the first frame in our data set. We then chose the sampling time-series length (the number of frames) and compare our reference frame with consecutive frames from the series by calculating the SSI. The sequence of SSI indices obtained characterizes the system evolution from the reference frame to the last frame in the sample. As expected the value of SSI will decrease and level off as the similarity between the reference frame and the consecutive frames becomes lesser due to the system evolving further from its initial state. For the sparse regression to successfully recover the HW system of equations, it proves optimal to set the length of the sample to the length for which the SSI value levels off. To find this value we look at the extremum in rates of change in the slope of the SSI – the first derivative of SSI for samples of different length. The SSI levels of when the first derivative of SSI becomes ${\approx }0$ see figure 5.
3.4.3. Noise sensitivity
A particularly interesting application of sparse regression would be application directly to experimental data. On the one hand, due to the possibility of discovering reduced turbulence models useful in practical applications such as control, and on the other due to the possibility of discovering new physics. However, experimental data always contain noise and it is therefore necessary to investigate to what extent the noise corrupts the recovery of underlying PDEs. To do this, prior to applying the sparse regression, we added Gaussian noise to our simulation data. The noise had a mean $\mu =0$ and a standard deviation defined as a function of the input data: $\sigma = \sigma ({\rm input} \ {\rm data})\cdot p$, where $p$ denotes percentage. The noise level was increased from $0\,\%$ up to $18\,\%$ while the sampling settings were kept fixed. As the noise level is increased the model error increases and the inflection in the accuracy/complexity curve is reduced until it vanishes completely, as shown in figure 6. With noise, model-form variance and the error of the model coefficients increase while the number of recovered features progressively decreases. The model-form variance is observed not to be affected by the size of the integration volumes. However, the model coefficient errors are affected by the changes in integration volumes and can be reduced up to a certain extent, but not indefinitely, by increasing their size.
It is important to note that, in the presence of noise, fewer terms from the governing equations may be recovered. This is illustrated in table 7 for $6\,\%$ noise added to the input data set. As the size of the sampling volumes is increased the algorithm is able to recover progressively more terms. For the example of a relatively low noise level this eventually leads to the recovery of all features and accompanying coefficients. The dominant features recovered in the example for the smallest sampling volume considered ($5 \times 5 \times 5$) are: $({\partial \phi }/{\partial y})({\partial n}/{\partial x})$, $({\partial \phi }/{\partial x})({\partial n}/{\partial y})$, ${\partial \phi }/{\partial y}$, $({\partial \omega }/{\partial y})({\partial n}/{\partial x})$ and $({\partial \omega }/{\partial x})({\partial n}/{\partial y}$).
3.5. Feature space completeness
This section has been added for the sake of completeness as a comment specifically on the model bias error – associated with missing features in the $\varTheta$ matrix (Alves & Fiuza Reference Alves and Fiuza2020). Spatio-temporal distribution of the model bias error is defined as
If $\varTheta$ is complete, contains all features needed to describe the input data set, the model bias error vanishes for the correct model
where $\eta _n$ and $\eta _{\omega }$ represent intrinsic noise.
4. Discussion and outlook
We have shown that the sparse regression algorithm described in Alves & Fiuza (Reference Alves and Fiuza2020) can be applied to synthetic plasma drift-wave turbulence data to extract the governing PDEs. Two parts of the algorithm are particularly important for successful PDE identification: the sampling and the construction of the $\varTheta$ feature matrix. Sampling used in this work is uniform across the simulation grid meaning that the sampling volumes are uniformly distributed (as depicted in figure 4), each volume containing a number of data points in phase space and time.Footnote 5 Sensitivity to both spatial and temporal sampling region bounds has been studied. For successful SR it suffices to set the spatial sampling domain size to be larger than the characteristic length. This, for turbulence data, is the typical eddy size. Similarly, the sampled time-series length can be related to the characteristic time in the data, in this case the time correlation length. In an attempt to define a systematic approach to choosing the sampling time-series length for successful PDE identification, we proposed the use of the SSI commonly found in applications involving image quality and data compression. This metric is applicable to synthetic turbulence 2-D data for quantification of the system evolution over time. Calculating the slope of SSI over the sampled time series enables the evaluation of a time series truncation threshold beyond which including more subsequent time frames does not improve the results of SR on the sample. Defining the spatio-temporal bounds for sampling in this way provides a methodology for minimizing the data requirements for successful SR on the sample. In the prospect of applying SR to synthetic data from more complex turbulence simulations, such as the full gyro-kinetic codes (Wang et al. Reference Wang, Lin, Tang, Lee, Ethier, Lewandowski, Rewoldt, Hahm and Manickam2006; Cheng et al. Reference Cheng, Dominski, Chen, Chen, Merlo, Ku, Hager, Chang, Suchyta and D'Azevedo2020) which produce large data sets in comparison with what has been used in this work, the SSI based threshold enables one to define the minimal part of a data set from which to sample in preparation for SR.
We recovered all the physical terms from the HW and mHW equations. Terms which were not recovered are the higher-order diffusion terms, namely $D_n \nabla ^{4}n$ and $D_{\omega } \nabla ^{4}\omega$, which are terms added for numerical stability and have the purpose of dissipating the grid size eddies.Footnote 6 However, one can encounter data sets in which higher-order terms are physical and therefore relevant. We speculate that recovering such terms, a few orders of magnitude smaller than the leading terms, is possible and requires the use of targeted sampling: predominantly sampling from regions of phase space where fine effects are expected to be dominant.
The features in the $\varTheta$ matrix were chosen based on domain knowledge and the matrix contained all the featuresFootnote 7 needed to fully describe the dynamics in the data. In cases when $\varTheta$ does not contain all the dynamical terms for a particular problem, one should expand the feature library until the empirical signatureFootnote 8 of a successful PDE identification is observed and the spatio-temporal error distribution is minimized. The spatio-temporal error distribution can further be used to asses the importance of discovered terms as described in § 3.5.
When the sampling and $\varTheta$ construction are complete, SR can be applied to extract the governing equations in their integral form. The integral formulation strategy makes the approach suitable for application to noisy data as it mitigates the numerical differentiation errors through averaging over more sampled points. The HW and mHW models were successfully recovered with mean coefficient errors of ${\approx }2\,\%$ for the clean data sets and $8\,\%$ for noisy data sets up to a significant level of added noise (up to $18\,\%$ noise) at the sampled volume size $5 \times 5 \times 5$. The increase of the size of sampled volumes reduces the effects of noise until the level of irreducible error is reached. At this level, as the noise is increased, SR recovers progressively fewer terms from the governing equations. Nevertheless, the retained terms correspond to the dominant processes underlying the dynamics. This is promising in the prospect of applying SR methodology directly to experimental data which inevitably contain noise. An important consideration in this context is the distinction between the measurements and variable noise. Presence of such noise could corrupt the discovered model form by selecting features from $\varTheta$ characterizing the noise rather than solely the physical phenomenon of interest. One approach has been proposed in Fasel et al. (Reference Fasel, Kutz, Brunton and Brunton2022) and relies on statistical ensembling to improve robustness and accuracy for SR-based model discovery. In particular, the work relies on bootstrap aggregating to identify ensembles of ordinary and PDEs that govern the dynamics from noisy data. Inclusion probabilities for features of $\varTheta$ can be obtained thus increasing the success rate for discovering the correct model and discarding the noise related terms. It remains to be investigated whether the approach improves on our integral formulation of SR for plasma turbulence simulation. In addition, future work will focus on applying the SR methodology to more complex turbulence simulations for development of computationally efficient reduced models. In the first instance, we will apply the methodology to the global drift ballooning model (Fisher & Rogers Reference Fisher and Rogers2017; Zhu et al. Reference Zhu, Francisquez and Rogers2017). This requires expanding the $\varTheta$ matrix with rational function nonlinearities and their ratios, in order to increase the descriptive capacity of SR for the more complex simulated dynamics and facilitate the application to experimental data. This application is of particular interest for control purposes. There are substantial data bases of experimental data of turbulent dynamics from fusion devices where the novel method could search for the underlying physical models. What has to be considered when applying this method to experimental data is that some coefficients are not constant over a wide range of plasma parameters. Therefore, when the method is applied to experimental data it can be used to recover the values of parameter-dependent coefficients. Based on prior knowledge about the data and data availability, we can distinguish among the following situations:
(i) We have spatio-temporal data available in a wide range of plasma parameters:
in this case we intentionally retain a large number of degrees of freedom in $\varTheta$ and apply the methodology to spatio-temporal data in different parameter ranges. This should enable to recover both the underlying physical model as well as the coefficients which depend on plasma parameters.
(ii) We have spatio-temporal data available in a narrow range of plasma parameters:
in this case we need to proceed with caution and keep in mind that the coefficient can depend on the plasma parameters. The methodology in this case is primarily applied to recover the underlying dynamical terms but the interpretation of the recovered coefficients has to be made based on physical intuition and prior knowledge.
If we decide to assume a physics model describing the experimental data, this assumption can be used to reduce the number of degrees of freedom in our feature space $\varTheta$. The methodology can then be applied identify plasma parameters featuring in our assumed model.
Furthermore, we can use our methodology in an attempt to construct a reduced reconciling model for the plasma edge and core dynamics as this is still lacking due to the complexity of the collision operator and the asymptotic matching between fluid and gyrokinetic models. Furthermore, if we could use our method to point out what the most important terms in the feature space for turbulence data are, the result could inform reduced model selection from the myriad of already established reduced models. It should be pointed out that one can adopt a different approach to searching for the important terms in the feature space for turbulence data. Namely, valuable attempts have been made by using convolution neural networks such as the one described in Frezat et al. (Reference Frezat, Le Sommer, Fablet, Balarac and Lguensat2021).
5. Conclusion
In summary, we have presented a proof of principle study of the application of data-driven integral sparse regression to fusion plasma drift-wave turbulence simulations. We have shown that the methodology can successfully be applied to discover the system of PDEs governing the simulated plasma dynamics both from clean and noisy data. We have also conducted a sensitivity study of spatio-temporal characteristics of the sampling used for SR and established that the spatio-temporal sample bounds can be defined in relation to the minimal eddy size and correlation length characterizing the data. This approach ensures that the sampling phase-space region contains sufficient system evolution in the nonlinear regime for SR while at the same time ensuring one can use a subset of a large data set for the study. We have also discussed the completeness of the feature space $\varTheta$ from which terms of the PDEs are selected by SR and formulated criteria for $\varTheta$ completeness based on the spatio-temporal error distribution following the reasoning expressed in Alves & Fiuza (Reference Alves and Fiuza2020).
Acknowledgements
The authors would like to thank Dr B. Zhou and Dr M. Francisquez for allowing their drift-wave turbulence codes to be used for producing input data sets for this study. As well as their support and suggestions which have improved this work and helped it reach its present form. Also a special thank you goes to Professor Dr N.L. Cardozo for his participation in discussions on the applicability and application of the presented methodology in fusion research. This publication is part of the project Tackling Turbulence in the Innovative MIT Fusion Reactor Concept (with project number 019.193EN.033 of the research programme (grant) Rubicon which is financed by the Dutch Research Council (NWO).
Editor William Dorland thanks the referees for their advice in evaluating this article.
Declaration of interests
The authors report no conflict of interest.