1. Introduction
Runaway electrons (REs) pose one of the leading concerns regarding the integrity and duty cycle of future fusion reactors (Boozer Reference Boozer2018). As RE generation in disruptions is exponentially sensitive to the plasma current, $I_{p}$ (Rosenbluth & Putvinski Reference Rosenbluth and Putvinski1997), and $I_{p}$ is projected to increase from the level of a few MA in large present-day tokamaks to the range of 10–20 MA in power plant relevant tokamaks, an unmitigated RE beam at reactor scale would be expected to cause severe damage and extended downtime and repair costs (Boozer Reference Boozer2018; Breizman et al. Reference Breizman, Aleynikov, Hollmann and Lehnen2019). Therefore, there is a strong need for validated tools to predict and avoid disruptions and RE beam generations when entering the era of reactor-scale devices.
To address this need, several numerical tools have been developed for disruption and RE analysis, such as the nonlinear magnetohydrodynamic codes JOREK (Huysmans & Czarny Reference Huysmans and Czarny2007) and NIMROD (Sovinec et al. Reference Sovinec, Glasser, Gianakon, Barnes, Nebel, Kruger, Schnack, Plimpton, Tarditi and Chu2004), the kinetic code CQL3D (Harvey et al. Reference Harvey, Chan, Chiu, Evans, Rosenbluth and Whyte2000) and the fluid-kinetic framework DREAM (Hoppe, Embreus & Fülöp Reference Hoppe, Embreus and Fülöp2021). However, validating these simulation tools with experimental data is complicated, as typically some of the input parameters of the simulations are not well constrained by the available experimental information. In such a situation, the user must first specify values for these uncertain input parameters to calibrate the model to appropriately represent the investigated system. This challenge is common to other numerical tools applied in magnetic confinement fusion research as well, such as scrape-off layer plasma simulations conducted with SOLPS-ITER (Wiesen et al. Reference Wiesen, Reiter, Kotov, Baelmans, Dekeyser, Kukushkin, Lisgo, Pitts, Rozhansky and Saibene2015). A conventional approach to calibrating the model is to use previous experience and domain knowledge to conduct the necessary parameter fitting manually, aiming to find a set of input parameters that minimises the discrepancy between the synthetic and measured diagnostic data. The remaining discrepancy is then evaluated and documented as a degree of validity of the model.
However, this type of expert modeller approach becomes intractable as the number of uncertain parameters increases. With multiple uncertain input parameters, manual optimisation is prone to lead to subjective reasoning for the trajectory selection through the space of optimised parameters. As a result, the degree of confidence in the obtained solution and its uncertainty is likely to remain ambiguous. Furthermore, the ad hoc nature of subjective reasoning is prone to lack of scientific rigour, which is problematic from the point of view of aiming to establish objective and reproducible scientific results. These challenges can be alleviated by applying a regimented approach, such as grid search, instead of subjective reasoning, but such an approach is intractable to conduct manually with multiple uncertain input parameters and is computationally inefficient when operating with costly simulations. Due to these inefficiencies of the expert modeller approach, an optimisation algorithm that would take the human out-of-the-loop, provide a rigorous systematic approach with natural uncertainty quantification and select the samples from the search space in a computationally efficient manner would be a very attractive alternative approach (Brochu, Cora & de Freitas Reference Brochu, Cora and de Freitas2010; Shahriari et al. Reference Shahriari, Swersky, Wang, Adams and Freitas2016; Frazier Reference Frazier2018).
In this study, approximate Bayesian computation (ABC) and Bayesian optimisation (BO) are applied to find the optimal values and provide inverse uncertainty quantification of uncertain input parameters (Brochu et al. Reference Brochu, Cora and de Freitas2010; Marin et al. Reference Marin, Pudlo, Robert and Ryder2012; Shahriari et al. Reference Shahriari, Swersky, Wang, Adams and Freitas2016; Frazier Reference Frazier2018) in DREAM RE simulations. Inverse uncertainty quantification here refers to quantifying the uncertainty of the uncertain input parameters of the model given observed experimental data (Oberkampf & Trucano Reference Oberkampf and Trucano2002; Wu et al. Reference Wu, Kozlowski, Meidani and Shirvan2018). As the algorithm provides the uncertainty distributions for the free input parameters of the model, the human scientist can focus on analysing the resulting optimal parameters and the degree of validity of the model instead of manual tuning of the fitting parameters. The aim of this study is to provide a proof-of-principle approach to using these methods in calibrating the uncertain input parameters in RE simulations, while the methodology could be used broadly in validating other predictive tools within magnetic confinement fusion research. The implementation is based on the BO for likelihood-free inference (BOLFI) method of the engine for likelihood-free inference Python software package (Gutmann & Corander Reference Gutmann and Corander2016; Lintusaari et al. Reference Lintusaari, Vuollekoski, Kangasrääsiö, Skytén, Järvenpää, Marttinen, Gutmann, Vehtari, Corander and Kaski2018).
2. Bayesian approach
This section describes the methodology used in this study. Section 2.1 provides an overview of ABC, § 2.2 an introduction to BO, § 2.3 describes the usage of Gaussian process regression as a probabilistic surrogate model in BO and § 2.4 describes the functionality of the acquisition functions in BO.
2.1. Approximate Bayesian computation
Bayesian inference aims to establish the conditional probability distribution, $P(\boldsymbol {x}|D)$, called the posterior, of the uncertain input parameters, $\boldsymbol {x}$, given observed experimental measurements, $D$. Here, $P(\boldsymbol {x}|D)$ represents the best estimate and uncertainty of the input parameters for the investigated system. Bayesian inference applies the Bayes’ theorem
which states that the posterior is proportional to the likelihood of $D$ given $\boldsymbol {x}$, $P(D|\boldsymbol {x})$, multiplied by the prior probability distribution for $\boldsymbol {x}$, $P(\boldsymbol {x})$. The marginal probability of the experimental measurements, $P(D)$, represents an integral over all possible data generating input values, $\int P(D|\boldsymbol {x}')P(\boldsymbol {x}')\,{\rm d}\boldsymbol {x}'$, which would be computationally challenging to evaluate. However, it is typically sufficient to establish the relative posterior probabilities of various values of $\boldsymbol {x}$. Therefore, $P(D)$ does not need to be directly evaluated, and it is sufficient to apply the proportionality
When the likelihood function, $P(D|\boldsymbol {x})$, is either not available analytically or cannot be evaluated within the available computational or time resources, the standard alternative is to use ABC (Marin et al. Reference Marin, Pudlo, Robert and Ryder2012; Järvenpää et al. Reference Järvenpää, Gutmann, Vehtari and Marttinen2018); ABC aims to establish the approximate Bayesian posterior
where $y \in \mathbb {R}^d$ are data generated with the simulation model with input parameters $\boldsymbol {x}$, and $\varDelta : \mathbb {R}^d\times \mathbb {R}^d \rightarrow \mathbb {R}_{>0}$ is a discrepancy function between the simulated and measured data. Here, $\epsilon$ represents the threshold parameter controlling the trade-off between posterior estimation accuracy and efficiency, and $H(x)$ is the Heaviside step function which takes a value 1 whenever $\epsilon$ is greater than the discrepancy. Small values of $\epsilon$ lead to more accurate approximate posteriors, but also increase the computational challenge.
One of the simplest ABC algorithms that could be applied to numerically estimate the integral (2.3) is rejection sampling (Marin et al. Reference Marin, Pudlo, Robert and Ryder2012; Lintusaari et al. Reference Lintusaari, Gutmann, Dutta, Kaski and Corander2017; Järvenpää et al. Reference Järvenpää, Gutmann, Vehtari and Marttinen2018):
(i) Draw a random sample from $\boldsymbol {x}' \sim P(\boldsymbol {x})$.
(ii) Generate $y' \sim P(y|\boldsymbol {x}')$.
(iii) Accept $y'$ if $\varDelta (D, y') \le \epsilon$.
(iv) Go back to step (i) until sufficient number of accepted samples are collected.
The accepted values represent the approximate posterior distribution. The drawback of the standard rejection sampling is the number of required function evaluations. For a typical simulation tool in magnetic confinement fusion, a function evaluation takes at least several minutes and more typically hours or days. Therefore, it is not computationally feasible to collect sufficiently many samples to get an accurate ABC posterior distribution using a rejection sampler. Furthermore, a rejection sampler with a small threshold parameter, $\epsilon$, is computationally very inefficient as a large fraction of the sampled function evaluations are rejected. While it is possible to improve the efficiency by applying approaches such as Markov chain Monte Carlo (MCMC) (Marjoram et al. Reference Marjoram, Molitor, Plagnol and Tavaré2003), the direct sampling based ABC algorithms are still expected to be computationally too costly for the type of applications targeted in this work.
The inefficiency of the direct sampling based ABC approach can be circumvented by applying Bayesian optimisation to traverse the space of optimised input parameters (Gutmann & Corander Reference Gutmann and Corander2016; Järvenpää et al. Reference Järvenpää, Gutmann, Pleska, Vehtari and Marttinen2019). This leads to a probabilistic surrogate model based ABC approach that is observed to be several orders of magnitude more efficient in terms of full function evaluations than the direct sampling based ABC algorithms. At each step of the algorithm, the ABC posterior is estimated using the surrogate model as $P_{\text {ABC}}(\boldsymbol {x}|D) \propto P(\boldsymbol {x}) \mathbb {P}(\varDelta _{\boldsymbol {x}}< \epsilon )$, where the probability is computed using the probabilistic surrogate model.
2.2. Bayesian optimisation
Bayesian optimisation offers a powerful approach for global optimisation of costly-to- evaluate, non-convex functions, without access to first- or second-order derivatives (Brochu et al. Reference Brochu, Cora and de Freitas2010; Shahriari et al. Reference Shahriari, Swersky, Wang, Adams and Freitas2016; Frazier Reference Frazier2018). The problem of finding optimal values, $\boldsymbol {x_*}$, for the uncertain input parameters, $\boldsymbol {x}$, can be represented as a task of finding the optimum of a nonlinear function $f(\boldsymbol {x})$ of a compact set $\mathcal {A}$, called the search space herein. If $f(\boldsymbol {x})$ represents the discrepancy between the synthetic and measured diagnostic data, then the problem can be formulated as
The target for a BO algorithm is to be able to traverse the search space efficiently in terms of function evaluations and to find the globally optimum solution by applying prior belief about the optimised function and by balancing exploration and exploitation of the search space (Brochu et al. Reference Brochu, Cora and de Freitas2010; Shahriari et al. Reference Shahriari, Swersky, Wang, Adams and Freitas2016; Frazier Reference Frazier2018). In exploitation, samples are collected in regions of the search space that are known to lead to near optimal function values based on prior belief, and in exploration, samples are collected in regions that encompass a large uncertainty.
A standard BO algorithm consists of two main components (Brochu et al. Reference Brochu, Cora and de Freitas2010; Shahriari et al. Reference Shahriari, Swersky, Wang, Adams and Freitas2016; Frazier Reference Frazier2018):
(i) A probabilistic model for the objective function.
(ii) An acquisition function for recommending the next sampling point.
The probabilistic model represents essentially a low evaluation cost surrogate model for the objective function, and the uncertainties retained in the probabilistic model represent the degree of confidence on the surrogate model predictions. The acquisition function applies the mean and variance of the probabilistic model to balance exploitation and exploration. The collected sample values are then used to update the probabilistic surrogate model.
2.3. Gaussian process regression
The usual choice for the probabilistic model is to use Gaussian process regression (GPR), also known as Kriging (Stein Reference Stein1999; Rasmussen & Williams Reference Rasmussen and Williams2006, and references therein). Kriging surrogate-based optimisation was previously used for parameter optimisation of plasma transport codes by Rodriguez-Fernandez et al. (Reference Rodriguez-Fernandez, White, Creely, Greenwald, Howard, Sciortino and Wright2018). Other examples of GPR applications in plasma physics can be found in von Nessi et al. (Reference von Nessi, Hole, Svensson and Appel2012); Li et al. (Reference Li, Svensson, Thomsen, Medina, Werner and Wolf2013); von Nessi & Hole (Reference von Nessi and Hole2013); Romero & Svensson (Reference Romero and Svensson2013); Chilenski et al. (Reference Chilenski, Greenwald, Marzouk, Howard, White, Rice and Walk2015, Reference Chilenski, Greenwald, Hubbard, Hughes, Lee, Marzouk, Rice and White2017); Ho et al. (Reference Ho, Citrin, Auriemma, Bourdelle, Casson, Kim, Manas, Szepesi and Weisen2019) and references therein.
Gaussian process regression is a Bayesian regression technique and is very powerful for interpolating small sets of data as well as retaining information about the uncertainty of the regression (Stein Reference Stein1999; Rasmussen & Williams Reference Rasmussen and Williams2006, and references therein). Gaussian process (GP) is a stochastic process, for which any finite collection of random values has a multivariate normal distribution; GP is specified by the mean function, $m(\boldsymbol {x}) = \mathbb {E}[f(\boldsymbol {x})]$, and the covariance function, $k(\boldsymbol {x},\boldsymbol {x'}) = \mathbb {E}[(f(\boldsymbol {x}) - m(\boldsymbol {x}))(f(\boldsymbol {x'}) - m(\boldsymbol {x}'))]$ (Rasmussen & Williams Reference Rasmussen and Williams2006)
The GPR represents a family of functions, for which the point to point variance is described by the covariance function. Usually, the mean function is assumed to be zero, although other assumptions are possible. This means that the mean of the prior assumption on variation of the objective function value when propagating from a collected data point is zero. The covariance function or kernel describes the smoothness assumption on the possible functions $f$. The GP essentially describes a normal distribution over possible functions, $f$, conditioned with observations $\{(\boldsymbol {x}_i, f_i),\ i = 1,\ldots,n\}$. Assuming a collection of possibly noisy observations with a Gaussian noise variance $\sigma _n^2$, the posterior probability distribution function of $f$ at point $\boldsymbol {x}$ is Gaussian with posterior mean $\mu _n(\boldsymbol {x})$ and posterior variance $v_n(\boldsymbol {x}) + \sigma _n^2$
Assuming a zero mean function, $m(\boldsymbol {x})$, the mean and variance can be obtained as
where $\boldsymbol {k_*}$ represents the vector of covariances between the test point, $\boldsymbol {x_*}$, and the $n$ observations, $f_n$ is a vector of the $n$ observations and $\boldsymbol {K}$ is the covariance matrix (Rasmussen & Williams Reference Rasmussen and Williams2006; Gutmann & Corander Reference Gutmann and Corander2016). Since the function evaluations in this work are deterministic, the $\sigma _n$ term is constrained to a low value that does not impact the predictions. An estimate of the likelihood at $w$ is given by
where $F(x)$ is the cumulative distribution function of $\mathcal {N}(0,1)$. In this work, we will set $\epsilon$ equal to the current optimal value provided by the probabilistic surrogate model, which is also the default in BOLFI (Gutmann & Corander Reference Gutmann and Corander2016).
A key step in building a GP regression is to select the covariance function or kernel (Rasmussen & Williams Reference Rasmussen and Williams2006). Usually the default choice is the radial basis function (RBF), also known as squared exponential or Gaussian kernel
where $\boldsymbol {l} = [l_1,\ldots,l_d]$ is a vector of covariance length scales for each dimension, $d$, and $\sigma _f^2$ is the variance. In the applications in this work, the single constant $\boldsymbol {l}$ was observed to be often too restrictive and a rational quadratic (RQ) kernel was used instead
which is equivalent to summing many RBF kernels with varying $\boldsymbol {l}$. The hyperparameter $\alpha$ represents the relative weighting between large and small $\boldsymbol {l}$ values. Before applying the model, the hyperparameters ($\boldsymbol {l}$, $\alpha$, $\sigma _f^2$, $\sigma _n$) must be optimised. These can be estimated by maximising the marginal log-likelihood (Rasmussen & Williams Reference Rasmussen and Williams2006). The GPR library used in this work as well as in BOLFI is the Python GP framework, GPy (GPy since 2012), which encompasses the applied optimisation routines.
2.4. Acquisition function
The acquisition function applies the mean and variance of the probabilistic surrogate model to recommend the next sampling point for the objective function (Brochu et al. Reference Brochu, Cora and de Freitas2010; Shahriari et al. Reference Shahriari, Swersky, Wang, Adams and Freitas2016; Frazier Reference Frazier2018). The acquisition functions are typically constructed to recommend sampling points that either encompass the optimal predicted mean for the objective function, exploitation, or large uncertainty, exploration. The sampling point is selected by optimising the acquisition function. Several acquisition functions have been developed and can be found in the reviews in Brochu et al. (Reference Brochu, Cora and de Freitas2010), Shahriari et al. (Reference Shahriari, Swersky, Wang, Adams and Freitas2016) and Frazier (Reference Frazier2018). The acquisition function used in sequential sampling in this work is the lower confidence bound selection criterion (LCBSC), which is also the default acquisition function in BOLFI (Brochu et al. Reference Brochu, Cora and de Freitas2010; Srinivas et al. Reference Srinivas, Krause, Kakade, Seeger, Fürnkranz and Joachims2010; Gutmann & Corander Reference Gutmann and Corander2016, and references therein). This function can be written as
where the coefficient $\eta _n$ is the trade-off parameter between exploration and exploitation, $\epsilon _\eta \in (0,1)$ is a constant chosen by the user and $d$ is the dimensionality of the search space. Optimising the acquisition function provides a deterministic answer for the next sampling point. An example of the application of this acquisition function is shown in § 3.2.
Since the next sampling point is obtained deterministically for a given state of the surrogate model, the approach is naturally sequential: (i) the objective function is evaluated for the sampling location provided by the optimum of the acquisition function, (ii) the GPR surrogate model is updated and (iii) the acquisition function is optimised again, using the updated GPR, to recommend the next sampling point. However, with complicated, multi-dimensional optimisation tasks with computationally time consuming function evaluations, it would be more attractive to conduct several objective function evaluations in parallel with each other to reduce the overall time consumption of the optimisation task, especially when suitable high-performance computing resources are available.
To conduct parallel BO, stochastic acquisition rules can be used. Various approaches for parallel BO and batch acquisition have been developed (Thompson Reference Thompson1933; Chapelle & Li Reference Chapelle, Li, Shawe-Taylor, Zemel, Bartlett, Pereira and Weinberger2011; Järvenpää et al. Reference Järvenpää, Gutmann, Vehtari and Marttinen2018; Kandasamy et al. Reference Kandasamy, Krishnamurthy, Schneider and Poczos2018; Järvenpää et al. Reference Järvenpää, Gutmann, Pleska, Vehtari and Marttinen2019; Palma et al. Reference Palma, Mendler-Dünner, Parnell, Anghel and Pozidis2019). In this work, the randmaxvar approach developed by Järvenpää et al. (Reference Järvenpää, Gutmann, Pleska, Vehtari and Marttinen2019) is used as the stochastic acquisition method. The approach is based on the maxvar acquisition rule also presented in Järvenpää et al. (Reference Järvenpää, Gutmann, Pleska, Vehtari and Marttinen2019). The maxvar acquisition method recommends a sample in a location that encompasses the maximum variance of the unnormalised ABC posterior. Basically, due to limited information, represented by the collected samples, there is uncertainty in the GPR representation of the objective function and this uncertainty is propagated as uncertainty of the unnormalised ABC posterior. The maxvar method aims to collect samples that lead to maximum reduction of this uncertainty. In the stochastic version of this method, samples are collected from the distribution that represents the variance of the unnormalised ABC posterior. Since samples are collected stochastically, several samples can be collected without updating the GPR surrogate in between the samples, enabling parallelisation. Furthermore, sampling can be done asynchronously by simply updating the GPR surrogate whenever new results are added to the dictionary of collected samples and by sampling new values whenever an idle processor becomes available.
3. Application to a JET RE experiment
This section describes proof-of-principle applications of the methodology discussed in § 2 for a RE experiment at JET. Section 3.1 describes the investigated JET plasma discharge and the DREAM set-up, § 3.2 documents a one-dimensional (1-D) proof-of-principle optimisation, §§ 3.3, 3.4 and 3.5 extend the search to four- to seven-dimensional search spaces and § 3.6 evaluates the number of samples required for convergence as a function of dimensionality of the search.
3.1. Simulated experiment and DREAM set-up
The BO and ABC approach discussed in the previous section is applied for DREAM RE simulations of current quench (CQ) in the disruption of JET discharge $\#$95135. This was a deuterium limiter plasma with an argon massive gas injection induced disruption, described in detail by Reux et al. (Reference Reux, Paz-Soldan, Aleynikov, Bandaru, Ficker, Silburn, Hoelzl, Jachmich, Eidietis and Lehnen2021) and Brandström (Reference Brandström2021). DREAM is a numerical tool for self-consistently simulating the evolution of temperature, poloidal flux and impurity densities, along with the generation and transport of REs in tokamak disruptions (Hoppe et al. Reference Hoppe, Embreus and Fülöp2021). The DREAM simulations in this manuscript are similar to those presented by Hoppe et al. (Reference Hoppe, Hesslow, Embreus, Unnerfelt, Papp, Pusztai, Fülöp, Lexell, Lunt and Macusova2021) and Brandström (Reference Brandström2021), with the exception that only a fluid model for RE electrons is used here, as kinetic simulations were not necessary for the proof-of-principle of the Bayesian approach. For a full description of the physics model in DREAM, we refer the reader to Hoppe et al. (Reference Hoppe, Embreus and Fülöp2021).
The simulations are started at the peak of the total plasma current, $I_{p}$, obtained during the disruption. An instantaneous thermal quench (TQ) is assumed, after which all background plasma quantities, except the electron temperature, $T_{e}$, are evolved self-consistently. The post-disruption $T_{e}$ is instead given as an uncertain input parameter. The background plasma density is obtained from pre-disruption measurement with the high resolution Thomson scattering (Pasqualotto et al. Reference Pasqualotto, Nielsen, Gowers, Beurskens, Kempenaars, Carlstrom and Johnson2004; Frassinetti et al. Reference Frassinetti, Beurskens, Scannell, Osborne, Flanagan, Kempenaars, Maslov, Pasqualotto and Walsh2012). Even though the uncertainty of the electron density measurement could be taken into account in the Bayesian approach, for simplicity, we neglect it here. The argon density in the plasma is obtained from the estimated amount of injected argon, volume of the vessel and fraction of argon that is assimilated. Argon is assumed to be uniformly distributed in the plasma. While the injected amount can be obtained from the experiment, $N_\text {Ar} \sim 8\times 10^{20}$ atoms (Brandström Reference Brandström2021), the assimilated fraction is given as an uncertain input parameter, $f_\text {Ar}$.
In our simulations, the initial total plasma current $I_{{p}}$ (combination of ohmic and prescribed runaway seed current) is constrained to match the experimentally measured peak value, $I_{p} \sim 1.42$ MA, by adjusting the initial electric field in order for the avalanche multiplication factor to be constrained. The electric field is evolved self-consistently, during the CQ and RE plateau simulations. The conductivity of the wall is controlled by a characteristic wall time, $\tau _\text {wall} = L_\text {ext}/R_\text {wall}$, that is provided as an uncertain input parameter. Here, $L_\text {ext}$ is the external inductance and $R_\text {wall}$ the resistance of the wall. The full list of uncertain input parameters are $T_{e}$, $f_\text {Ar}$, RE seed distribution and $\tau _\text {wall}$. In this study, these parameters are allowed to vary independently from each other, while in reality $T_{e}$ and $f_\text {Ar}$ would be closely linked due to the strong radiation losses from argon. It is acknowledged that this risks the system having the capability to overfit the model, by applying mutually inconsistent input parameters. However, this is considered acceptable in this proof-of-principle study and further constraints to the degrees of freedom will be investigated in follow-up studies. In addition, the RE seed distribution is scaled with a multiplier such that the RE plateau current matches the experimentally measured value. This is done iteratively with a binary search algorithm. Runaway electrons are generated through a combination of the Dreicer, hot-tail and avalanche mechanisms. Due to the presence of magnetic perturbations during the TQ a part of the seed runaways are transported out of the plasma. After the thermal quench, when the conductivity is very low, the plasma current will be carried essentially entirely by the runaways. The final plateau runaway current profile is determined by the surviving post-TQ runaway seed and the avalanche multiplication factor. The latter is a robust quantity and depends on the initial and final plasma currents (or more precisely the change in the poloidal flux), see e.g. Boozer (Reference Boozer2015) and Hesslow et al. (Reference Hesslow, Embréus, Vallhagen and Fülöp2019). Since the magnitude of the runaway seed current is uniquely determined by the runaway plateau current and the avalanche multiplication factor it does not need to be added to the search space of the BO algorithm, while the seed profile must since it cannot be similarly directly inferred from experiment. To confirm this, a test was conducted to include the seed scale into the search space of the BO algorithm in the 7-D search (§ 3.4), leading essentially to the same result as obtained with the binary search. The algorithm is initialised by multiplying or dividing the multiplier by $10^{3}$ at each iteration until the predicted plateau current passes the experimentally measured value, after which binary search is applied to converge to the optimum. If the multiplier is reduced below a small value, ${\sim }10^{-3}$, the search algorithm stops without finding a solution that matches the RE plateau current and the objective function value at the point of reaching that threshold is propagated through the algorithm. This is simply due to the fact that it cannot be assumed that for all input values within the search space it is possible to find a multiplier that would enable matching the RE plateau current.
In the proof-of-principle examples that follow, the objective function is chosen as the L1-norm of the discrepancy between the measured and predicted $I_{p}$ during the CQ (figure 1)
Effectively this discrepancy function calculates the area between the two curves in figure 1. To avoid accumulating excessive integration within the RE plateau, the discrepancy is only calculated between 0 and 25 ms from the current peak.
The chosen proof-of-principle case is relatively simple to facilitate exploration of the methodology. With the released Python code, this can also provide a low threshold environment for the scientific community to become familiar with this methodology. A follow-up study will apply these methods to a more complicated validation exercise with a significantly higher computational overhead. The BO approach can also naturally extend to multiple objectives and this will be explored in follow-up studies.
3.2. One-dimensional search space
The first proof-of-principle test is to apply the Bayesian approach to search for the post-TQ $T_{e}$ that minimises the discrepancy function (3.1). The other uncertain input parameters are fixed as $f_\text {Ar} = 15\,\%$, uniform RE seed profile and $\tau _\text {wall} = 5$ ms. A RQ kernel is used and the length scale is constrained to be below 1.0. This is done to avoid the model becoming overconfident in regions that have not been sampled yet. If the length scale converges to a large value, it can suppress exploration prematurely and prevent the algorithm from finding the optimal solution. More generally, it seems that, in BO algorithms, it is probably better to have a model that has the capability to overfit rather than underfit the data. The BO algorithm is interpolating solutions within the search space and an overfitting model will just encourage exploration while an underfitting model might not have the generalisation capability to fit the solution near the optimum. The acquisition function is the LCBSC (2.12) with exploration constant $\epsilon _\eta = 0.2$. The search space for $T_{e}$ is bounded between 1.0 and 20 eV. A uniform prior uncertainty distribution is assumed.
Within less than 10 iterations, the algorithm starts to converge to the optimum value (figure 2). The first 3 samples are collected randomly from the uniform prior distribution. After these are collected, the GPR surrogate model is fitted to the data (figure 2). The mean and variance of the GPR are applied by the acquisition function to recommend the next sampling location. The algorithm proceeds by choosing the location of the minimum value of the acquisition function. By proceeding like this, the algorithm converges near the optimum value with a narrow confidence interval around the sample number 7 to 8 (figure 3a). Since the prior distribution is uniform, the posterior probability distribution can be obtained from the GPR by applying (2.9) (figure 3b). Finally, the simulated current value with the temperature value providing the highest posterior probability, $T_{e} \sim 5.84$ eV, can be compared with the experimental measurements (figure 3c). The results indicate that by only adjusting the constant post-TQ $T_{e}$, the model is not able to reproduce the experimentally measured CQ rate. Alternatively, these results indicate that the background plasma resistivity is changing during the CQ.
3.3. Five-dimensional search space
After the 1-D proof-of-principle, the next step is to extend the search space to include the other uncertain parameters as well. Since the RE seed is a 1-D profile, it is parameterised as the probability distribution function of the gamma distribution
where $\alpha$ and $\beta$ are free parameters and $\varGamma (\alpha )$ is the gamma function (figure 4). The intention is to provide a general parameterisation that constrains as little as possible the possible RE seed profile shapes. The search spaces for both $\alpha$ and $\beta$ are set as uniform distributions between 0.001 and 10. Ten randomly sampled RE seed profile shapes are shown in figure 4. It can be clearly seen that this parameterisation allows flexible representations of profiles peaking at the centre, middle or edge of the plasma. The search space for the argon assimilation fraction is set as uniform between 0.001 % and 100 %. The search space for the characteristic wall time is set such that $\ln (\tau _\text {wall}/1\,\mathrm {ms})$ is sampled uniformly between 0 and 7. As a result, the $\tau _\text {wall}$ values range between approximately 1 and 1100 ms. This allows sampling for very large $\tau _\text {wall}$ parameters exceeding 1.0 s, while encouraging collection of samples at low values.
Due to the expanded volume of the search space, more samples are expected to be needed than in the 1-D example. Therefore, the randmaxvar acquisition function was used with a batch size of 10 samples conducted in parallel. The first 50 samples were collected by sampling the search space randomly, after which the acquisition function was used to recommend sampling locations.
Similar to the 1-D search, a RQ kernel is used. The kernel parameters are restricted such that the power parameter in the RQ kernel was constrained to be between $10^{-10}$ and 0.03. The length scale constraints are altered for every batch of 10 samples. For even round batches, the length scales are constrained to be positive and manually initialised as the distance of the search domain for the dimension divided by the number of collected samples. After this preconditioning step, GP optimisation is conducted. There is no maximum length scale set up during the even rounds. During the odd round batches, the length scales for variations are constrained to be below 1 for the $T_{e}$ and $f_\text {Ar}$ dimensions, below 0.5 for the $\alpha$ and $\beta$ dimensions and below 0.1 for the $\ln (\tau _\text {wall})$ dimension. The lower limits for the length scales were set to $10^{-3}$. The even rounds perform essentially automatic relevance determination obtaining very long length scales for dimensions that do not show a significant impact on the objective function value, guiding the search to prioritise optimisation of input parameters that have a more significant impact on the objective function value. However, this approach alone would risk the surrogate model becoming overly confident early in the search and stop exploration for input parameters deemed unimportant. To counteract this risk, the odd rounds apply restrictions of length scales, such that the algorithm understands to explore input parameter values, which would simply be extrapolated and interpolated over by the long length scale surrogate model.
The sampling algorithm finds clear optima in the search space for the background plasma temperature and characteristic wall time (figure 5). It can be observed that there is a global optimum at $T_{e}$ around 5–7 eV and $\tau _\text {wall}$ less than approximately 2.7 ms ($\ln (\tau _\text {wall}/\text {1 ms}) < 1.0$), and a local optimum at $T_{e}$ approximately 15–20 eV and $\tau _\text {wall}$ larger than approximately 50 ms ($\ln (\tau _\text {wall}/\text {1 ms}) > 4.0$) (figure 5 red and black ellipses). The two solution branches can also be observed in the plot as a function of the argon assimilation fraction, such that the low temperature solution branch reaches the optimum at values above 20 %, while the higher temperature solution branch at lower values between 5 % and 30 % (figure 5c). The shape of the RE seed distribution does not seem to impact the discrepancy significantly (figure 5d,e).
After 290 samples, the algorithm estimates the global optimum to be $T_{e} \approx 6.1$ eV, $f_\text {Ar} \approx 54.2\,\%$, $\tau _\text {wall} \approx 1.1$ ms, $\alpha \approx 7.9$ and $\beta \approx 4.3$, where the 95 % confidence intervals for $f_\text {Ar}$, $\alpha$ and $\beta$ span most of the search space (figure 6). It should be noted that the problem set-up, where the assimilated argon fraction is kept independent from the temperature, is likely to impact this result, and a general conclusion about insensitivity of the plasma current on $f_\text {Ar}$ should not be drawn here. To further investigate this, an $f_\text {Ar}$ scan was conducted, while keeping all the other parameters at their inferred optimal values. It is observed that, at these low temperatures, scanning $f_\text {Ar}$ from 40 % to 100 % only increases the average $Z_\text {eff}$ in the plasma from approximately 1.5 to 1.9. While this does increase the CQ rate, the impact is sufficiently small that the optimisation algorithm is able to counteract this by small changes of $T_{e}$ and $\tau _\text {wall}$. The local uncertainties for the global optimum can be obtained by evaluating the local properties of the approximate posterior. The approximate posterior can be extracted from the probabilistic surrogate model by applying (2.9). While a global ABC posterior can be obtained through, for example, MCMC sampling, the local inverse uncertainty near the global optimum is likely of more practical interest in fusion energy research and can be obtained directly from the GPR representation of the posterior. This type of analysis was done by evaluating the 1-D posterior distribution for each search dimension from the global optimum, which can be integrated to obtain confidence intervals (figure 6). It should be noted that this analysis does not take into account the secondary optimum at higher temperatures as that would require nonlinear analysis of the approximate posterior, while MCMC sampling of the posterior would collect some distribution in that area also. However, since multimodality of the optimised function can be observed from the discrepancy plot already (figure 5), it is considered more important to obtain local uncertainty estimations near the global optimum than sample the full approximate posterior. For completeness, MCMC sampling was conducted for the approximate posterior and the resulting distributions for $T_{e}$ and $\ln (\tau _\text {wall})$ show the global optimum, also visible in (figure 6), as well as the secondary local optimum at higher temperatures (figure 7).
Analysing the convergence of the search it can be observed that, after approximately 80–110 samples, the posterior distributions for $T_{e}$ and $\ln (\tau _\text {wall})$ have converged (figure 6a,b). For the other parameters, the 95 % confidence intervals remain large, indicating that the discrepancy value is not very sensitive to these input parameters, as was observed in the discrepancy plot already (figure 5). The step observed at 200 samples highlights the stochastic nature of the GPR surrogate model fitting. Depending on the initial conditions, the optimisation algorithm might find somewhat different hyperparameters for the GPR, leading to a different shape of the approximate posterior and shift of the optimum and boundaries of the confidence interval as well. However, the large steps of the optimum for $f_\text {Ar}$, as well as the large steps at other sample numbers for $\alpha$ and $\beta$, are a result of the relatively flat posterior distribution shapes, while posterior shape for $T_{e}$ and $\ln (\tau _\text {wall})$ are not changed significantly.
Finally, the predicted and measured total plasma current are compared for the global optimum extracted by the algorithm (figure 6f). Comparing the result with the optimum case in the 1-D search space example, it can be clearly seen that, with a 5-D search space, the algorithm is able to obtain a significantly better fit to the experimentally measured current (figures 3c and 6f). In the 1-D example, the current was reducing significantly faster than experimentally measured during the early part of the CQ and the end of the CQ happened several ms later than experimentally measured, such that the average L1-norm discrepancy was minimised, while the rate of change of plasma current was poorly matched (figure 3c). On the other hand, in the 5-D example, the initial drop is also faster than measured experimentally, but soon the rate of change of the plasma current is matched to the experimentally measured rate of change, such that the end of CQ happens near the experimentally measured end of the CQ when the L1-norm between the two currents is minimised (figure 6f). The fact that the current is reducing faster than experimentally measured during the early parts of the CQ indicates that the plasma resistance that on average matches the plasma current evolution probably overestimates the resistance during early parts of the CQ. To address this, the final proof-of-principle test conducted in this manuscript is to allow linear variation of $T_{e}$ during the simulation by extending the search space to seven dimensions.
3.4. Seven-dimensional search space
As a next extension of the search, a parameterised variation of the background plasma $T_e$ during the simulation is allowed. Linear variation of $T_e$ is assumed, such that the search space is extended to seven dimensions, by adding final temperature, $T_{e, \text {final}}$, and the time at which $T_{e,\text {final}}$ is reached, $t_\text {final}$. After $t_\text {final}$ is reached, $T_{e}$ is assumed to stay constant. Same search spaces are used for the initial and final $T_{e}$, and the search space for $t_\text {final}$ is set as uniform between 1 and 44 ms.
The RQ kernel is used in the GPR. The power of the kernel was restricted similarly to the set-up in the 5-D search. Similar to the 5-D search, the length scale restrictions were altered between even and odd round batches. Batch size was set to 50. For the odd round batches, the length scale constraints are similar to those in the 5-D search with the same length scale constraint applied for the initial and final $T_e$. For the $t_\text {final}$ the minimum length scale is set as 10$^{-3}$ ms and maximum as 1 ms.
After 950 samples, the local 95 % confidence intervals around the optimum point, recommended by the algorithm, are $T_{e, \text {initial}} \in [8.5, 11.7]$ eV, $T_{e, \text {final}} \in [3.3, 5.8]$ eV, $t_\text {final} \in [11, 19]$ ms, $\tau _\text {wall} \in [1.1, 2.1]$ ms, $f_\text {Ar} \in [48, 97]\,\%$, $\alpha \in [0.2, 9.4]$ and $\beta \in [0.5, 9.7]$ (figure 8). The optimum point is $T_{e, \text {initial}} \approx 9.5$ eV, $T_{e, \text {final}} \approx 4.3$ eV, $t_\text {final} \approx 14$ ms, $\tau _\text {wall} \approx 1.4$ ms, $f_\text {Ar} \approx 76\,\%$, $\alpha \approx 4.3$ and $\beta \approx 2.2$. With these input parameters, the predicted CQ rate is very close to the measured values (figure 8f). Initially, the rate is faster than measured and also the transition to runaway plateau in the simulation occurs approximately 1 ms earlier than measured (figure 8f). However, between 3 and 12 ms, the predicted current is nearly exactly on top of the measured current.
3.5. Constraining $\tau _\text {wall}$ to 5 ms
In both the 5-D and 7-D searches, the algorithm found optimum $\tau _\text {wall}$ around 1.1–1.4 ms. This result was somewhat surprising when the conventional prior expectation would have suggested higher values in the range of 5–10 ms. Since the optimisation algorithm finds the set of parameters that minimise the discrepancy, it is possible that the algorithm compensates for missing physics in the model by reducing $\tau _\text {wall}$ below values that would actually be realistic. As a final test, further 4-D and 6-D optimisations were conducted, where $\tau _\text {wall}$ was fixed to 5 ms.
In the 4-D search, the optimum point recommended by the algorithm is $T_{e} \approx 7.0$ eV, $f_\text {Ar} \approx 52\,\%$, $\alpha \approx 4.6$ and $\beta \approx 5.5$ (figure 9a). The local 95 % confidence intervals are $T_{e} \in [6.4, 8.0]$ eV and $f_\text {Ar} \in [32, 91]\,\%$, $\alpha \in [0.3, 9.6]$ and $\beta \in [0.3, 9.7]$. In the 6-D search, the optimum point recommended by the algorithm is $T_{e, \text {initial}} \approx 16.2$ eV, $T_{e, \text {final}} \approx 5.9$ eV, $t_\text {final} \approx 10$ ms, $f_\text {Ar} \approx 82\,\%$, $\alpha \approx 1.3$ and $\beta \approx 6.1$ (figure 9b). The local 95 % confidence intervals are $T_{e, \text {initial}} \in [13.9, 19]$ eV, $T_{e, \text {final}} \in [5.1, 6.7]$ eV, $t_\text {final} \in [8, 12]$ ms, $f_\text {Ar} \in [48, 98]\,\%$ and $\alpha \in [0.1, 8.8]$, $\beta \in [0.5, 9.7]$.
As $f_\text {Ar}$, $\alpha$ and $\beta$ do not impact the discrepancy significantly, the best match obtained by the 4-D search seems very similar to the best match obtained by the 1-D search (figures 3c and 9a). When allowing linearly varying background plasma $T_{e}$, the algorithm is able to find a solution that matches the experimentally measured plasma current nearly as well as in the 7-D search (figures 8f and 9b). However, when fixing $\tau _\text {wall} = 5$ ms, the recommended initial $T_{e}$ is increased from 9.5 to 16.2 eV, and $t_\text {final}$ reduced from 14 ms to 10 ms, highlighting the nonlinear dependencies between the optimal input parameters.
3.6. Convergence as a function of dimensions
The computational challenge of the optimisation task increases with the number of dimensions of the search space. Figure 10 illustrates the discrepancy as a function of sample numbers for the 4-D, 5-D, 6-D and 7-D search tasks in this work. Evaluating the convergence based on the sample number after which the minimum discrepancy saturates, the 4-D search converges after approximately 40 samples, the 5-D search after approximately 80 samples, the 6-D search after approximately 250 samples and the 7-D search after approximately 300 samples. Beyond this point, increasing sample numbers will reduce the uncertainty of the posterior distribution while the minimum discrepancy is not reduced anymore.
Comparing with a grid search of eight samples for each dimension, the BO algorithm is very efficient (figure 11). Beyond four dimensions, the grid search algorithm would be calling for over 10 000 samples and soon become intractable. The Bayesian approach, on the other hand, obtains samples near the minimum discrepancy after a few hundred samples even in the case of the seven dimensional search space.
4. Summary
A Bayesian approach has been explored for validation of RE simulations. Many of the simulation tools applied in fusion energy research require the user to specify several input parameters that are not constrained by the available experimental information. Bayesian inference algorithms offer a promising approach to determine these free parameters with uncertainty quantification and are less subject to user bias than approaches based on manual parameter calibration. The main challenge in using an algorithmic approach to parameter calibration is the computational cost of simulating enough samples to construct the posterior distributions for the uncertain input parameters. By using probabilistic surrogate modelling, through GPR, with BO, it is possible to reduce the number of required simulations by several orders of magnitude. This type of BO framework was implemented in this work for a disruption RE analysis model, and explored for CQ simulations for a JET plasma discharge with an argon induced disruption. The algorithm is able to find optimal input parameters with uncertainties in 1-D to 7-D proof-of-principle cases, and is several orders of magnitude more sample efficient than a regimented grid search algorithm would have been.
The proof-of-principle application in this manuscript was based on fluid RE simulations with DREAM. These simulations are computationally relatively light, requiring only a few minutes of computing per sample. However, many simulations in fusion research require several hours, days or weeks of computing time. Obviously, even these relatively sample efficient BO algorithms will eventually begin to struggle with increasing search space dimensions, number of objective functions and computational requirements of the optimised model. Nevertheless, the discussed methodology does provide a principled, algorithmic approach to parameter fitting for these complicated models that is more sample efficient than, for example, grid or random search. Furthermore, running a few hundred simulations of a computational model that takes hours to converge is still a task that is manageable with a relatively standard high-performance computing environment.
Surrogate model specification is central to the performance of the search algorithm. Using the GP approach, the kernel parameters need to be appropriately constrained for the surrogate model to provide meaningful guidance for the search through the acquisition function. An overly smooth kernel with long maximum correlation length scales can make the surrogate model overly confident and not find the actual global optimum. On the other hand, limiting the length scales to small values will encourage exploration but also require more iterations for convergence. Finding the appropriate surrogate model specifications is an area that requires attention from the user of these algorithms. The most appropriate constraints are likely to be specific to each search task. Furthermore, both specifying appropriate kernel constraints and diagnosing potential issues with kernel constraints become more challenging with increasing number of search dimensions. Therefore, it would be desired to find default kernel constraints that are likely to work acceptably well in most circumstances. The approach chosen here was to alternate the kernel constraints at specific sample intervals between unconstrained but positive length scales and length scales constrained to be below a certain threshold that is a fraction of the width of the search dimension. By alternating the kernel constraints, the risk of the algorithm either oversmoothing a region or getting into a mode of infinite exploration is reduced. However, more generally applicable methods for constraining the surrogate model are likely to exist, and could improve the performance of the algorithm further.
Acknowledgements
The authors wish to acknowledge CSC – IT Center for Science, Finland, for computational resources.
Editor Per Helander thanks the referees for their advice in evaluating this article.
Funding
This work has been carried out within the framework of the EUROfusion Consortium, funded by the European Union via the Euratom Research and Training Programme (Grant Agreement No 101052200 – EUROfusion). Views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them. This work was carried out under the EUROfusion E-TASC Advanced Computing Hub project. The work of Eero Hirvijoki was supported by the Academy of Finland grant no. 315278. The work was supported in part by the Swedish Research Council (Dnr. 2018-03911).
Declaration of interests
The authors report no conflict of interest.