Real-time thermoacoustic data assimilation

Low-order thermoacoustic models are qualitatively correct, but they are typically quantitatively inaccurate. We propose a time-domain bias-aware method to make qualitatively low--order models quantitatively (more) accurate. First, we develop a Bayesian ensemble data assimilation method for a low-order model to self-adapt and self-correct any time that reference data becomes available. Second, we apply the methodology to infer the thermoacoustic states and heat release parameters on the fly without storing data (real-time). We perform twin experiments using synthetic acoustic pressure measurements to analyse the performance of data assimilation in all nonlinear thermoacoustic regimes, from limit cycles to chaos, and interpret the results physically. Third, we propose practical rules for thermoacoustic data assimilation. An increase, reject, inflate strategy is proposed to deal with the rich nonlinear behaviour; and physical time scales for assimilation are proposed in non-chaotic regimes (with the Nyquist-Shannon criterion) and in chaotic regimes (with the Lyapunov time). Fourth, we perform data assimilation using data from a higher-fidelity model. We introduce an echo state network to estimate in real-time the forecast bias, which is the model error of the low-fidelity model. We show that (i) the correct acoustic pressure, parameters, and model bias can be accurately inferred; (ii) the learning is robust as it can tackle large uncertainties in the observations (up to 50% the mean values); (iii) the uncertainty of the prediction and parameters is naturally part of the output; and (iv) both the time-accurate solution and statistics can be successfully inferred. Data assimilation opens up new possibility for real-time prediction of thermoacoustics by synergistically combining physical knowledge and experimental data.


Introduction
When the heat released by a heat source, such as a flame, is sufficiently in phase with the acoustic waves of a confined environment, such a gas turbine or a rocket, thermoacoustic oscillations may occur (Rayleigh, 1878). Thermoacoustic oscillations manifest themselves as large-amplitude vibrations, which can be detrimental to gas-turbine reliability (e.g., Lieuwen, 2012), and can be destructive in high-power-density motors such as rocket engines (e.g., Culick, 2006). The objective of manufacturers is to design devices that are thermoacoustically stable, which is the goal of optimisation, and suppress a thermoacoustic oscillation if it occurs, which is the goal of control (e.g., Magri, 2019). Both optimisation and control rely on a mathematical model, which provides predictions on the key physical variables, such as the acoustic pressure and the heat release rate. The accurate prediction of thermoacoustic oscillations, however, remains one of the most challenging problems faced by power generation, heating and propulsion manufacturers (e.g., Dowling & Morgans, 2005;Noiray et al., 2008;Lieuwen, 2012;Poinsot, 2017;Juniper & Sujith, 2018).
The prediction of thermoacoustic dynamics-even in simple systems-is challenging because of three reasons. First, thermoacoustics is a multi-physics phenomenon. For a thermoacoustic oscillation to occur, three physical subsystems (flame, acoustics and hydrodynamics) constructively interact with each other (e.g., Lieuwen, 2012;Magri, 2019). Second, thermoacoustics is a nonlinear phenomenon (e.g., Sujith & Unni, 2020). In general, the flame's heat release responds nonlinearly to acoustic perturbations (Dowling, 1999); and the hydrodynamics are typically turbulent (e.g., Huhn & Magri, 2020b). Third, thermoacoustics is sensitive to perturbations to the system. In the linear regime, small changes to the system's parameters, such as the flame time delay, can cause arbitrarily large changes of the eigenvalue growth rates at exceptional points (Mensah et al., 2018;Orchini et al., 2020). In the nonlinear regime, small changes to the system's parameters can cause a variety of nonlinear bifurcations of the solution. As a design parameter is varied in a small range, thermoacoustic oscillations may become chaotic, by either period doubling, or Ruelle-Takens-Newhouse scenarios (Gotoda et al., 2011(Gotoda et al., , 2012Kabiraj & Sujith, 2012;Kashinath et al., 2014;Orchini et al., 2015;Huhn & Magri, 2020b), or by intermittency bifurcations scenarios (Nair et al., 2014;Nair & Sujith, 2015). The rich bifurcation behaviour has an impact on the effectiveness of control strategies, which may work for periodic oscillations with a dominant frequency, but may not work for multi-frequency oscillations as effectively. Additionally, unpredictable changes in the operating conditions and turbulence, which can be modelled as random phenomena (Nair & Sujith, 2015;Noiray, 2017), increase the uncertainty on the prediction of the quantities of interest.
Thermoacoustics can be modelled with a hierarchy of assumptions and computational costs. Large-eddy simulations make assumptions only on the finer flow scales, which makes the final solution high-fidelity, but computationally expensive (Poinsot, 2017). Euler and Helmholtz solvers compute the acoustics that evolve on a prescribed mean flow, which makes the solution medium-fidelity and computationally less expensive than turbulent simulations (e.g., Nicoud et al., 2007). This is commonly achieved with flame models, which capture the heat-release response to acoustic perturbations with transfer functions (e.g., Noiray et al., 2008;Silva et al., 2013) and distributed time-delays (Polifke, 2020). Other medium-fidelity and medium-cost methods are based on flame-front tracking (e.g., Pitsch & de Lageneste, 2002) and simple chemistry models (e.g., Magri & Juniper, 2014), to name only a few. On the other hand, low-order models based on travelling waves and standing waves (Dowling, 1995) provide low-fidelity solutions, but with low-computational cost. These low-order models capture only the dominant physical mechanisms, which are the flame time delay, the flame strength (or index) and the damping. Low-order models, which are the subject of this study, are attractive to practitioners because they provide quick estimates on the quantity of interest. Along with modelling, accurate experimental data is becoming more accessible and available (O'Connor et al., 2015). To monitor the thermoacoustic behaviour, in both real engines and academic rig (such as the Rĳke tube), the pressure is experimentally measured by microphones (Lieuwen & Yang, 2005;Kabiraj et al., 2012a). Microphones sample the pressure amplitude at typically high rates, which generates large datasets in real time. Except when required for diagnostics and a-posteriori parameter identification (among many others, Polifke et al., 2001;Schuermans, 2003;Lieuwen & Yang, 2005;Noiray et al., 2008;Noiray, 2017;Polifke, 2020), the data is useful if it can be used in real time, i.e., on-the-fly, to correct (or update) our knowledge of the thermoacoustic states. The sequential assimilation method that we develop bypasses the need to store data, which enables the real-time assimilation of data as well as on-the-fly parameter estimation.
To summarise, in thermoacoustics, we have three ingredients to improve the design: (i) a human being, who identifies the physical mechanisms that need to be modelled depending on the objectives and resources; (ii) a mathematical model, which provides estimates of the physical states; and (iii) experimental data, which provides a quantitative measure of the system's observables. A model is good if the human being identifies the physical mechanisms needed to formulate a mathematical model that provides the system's states compatibly with the experimental data. The overarching objective of this paper is to propose a method to make qualitatively low-order models quantitatively (more) accurate every time that reference data becomes available. The ingredients for this are a physical low-order model, which provides the states; data, which provides the observables; and a statistical method, which finds the most likely model by assimilating the data in the model. In weather forecasting, this process is known as data assimilation (Sasaki, 1955). Data assimilation techniques have been applied to oceanographic studies (Eckart, 1960), aerospace control (Gelb, 1974), robotics, geosciences, cognitive sciences (Reich & Cotter, 2015), to name only a few. Data assimilation is a principled method, which, in contrast to traditional machine learning, uses a physical model to provide a prediction on the solution (the forecast), which is updated when observations become available to provide a corrected state (the analysis) (Reich & Cotter, 2015). The analysis is an estimator of the physical state (the true state), which is more accurate than the forecast.

Data assimilation
Data assimilation methods can be divided into two main approaches (Lewis et al., 2006): (i) variational and (ii) statistical assimilation methods. Variational data assimilation requires the minimisation of a cost functional-e.g., a Mahalanobis (semi)norm-in terms of a control variable to obtain a single optimal analysis state (Bannister, 2017). The most common variational methods are 3D-VAR and 4D-VAR, which are widely used in weather centres such as the Met Office in the UK or the European Centre for Medium-Range Weather Forecasts, and in academic research (Bannister, 2008). In thermoacoustics, variational data assimilation was introduced by Traverso & Magri (2019), who found the optimal thermoacoustic states given reference data from pressure observations. Because variational methods need batches of data, they are not naturally suited to real-time inference. On the other hand, statistical data assimilation combines concepts of probability and estimation theory. The aim of statistical data assimilation is to compute the probability distribution function of a numerical model to statistically combine it with data from observations. Because the probability distribution function is high dimensional, the practitioner is often interested in capturing only the first and second statistical moments of it. In reduced-order modelling, this was achieved in flame tracking methods by Yu et al. (2019), who implemented ensemble Kalman filters and smoothers to learn the flame parameters on the fly. In high-fidelity methods in reacting flows, data assimilation with ensemble Kalman filters have been applied in large-eddy simulation of premixed flames to predict local extinctions in a jet flame (Labahn et al., 2019), and under-resolved turbulent simulation to predict autoignition events (Magri & Doan, 2020). The ensemble Kalman filter has also been successfully applied to non-reacting flow systems that show high nonlinearities such as the estimation of turbulent near-wall flows (Colburn et al., 2011), uncertainties in Reynolds-averaged Navier-Stokes (RANS) equations (Xiao et al., 2016), aerodynamic flows (da Silva & Colonius, 2018). Statistical data assimilation based on Bayesian methods, which enable real-time prediction in contrast to variational methods, was introduced in thermoacoustics by Nóvoa & Magri (2020).

Model bias
Data assimilation methods are commonly derived under the assumption that forecast errors are random with zero mean (Evensen, 2009), or, in other words, the error is unbiased. However, in addition to state and parameter uncertainties, low-order models are affected by model uncertainty, which manifests as an error bias. Modelling the model bias is an active research area. To produce an unbiased analysis, both forecast and observation biases need to be estimated (Dee & da Silva, 1998). Friedland (1969 developed the Separate Kalman Filter to estimate the bias, which is a two-stage sequential filtering process that addresses the estimation of a constant bias term, but its application is limited to linear processes. Drécourt et al. (2006) extended the implementation of the Separate Kalman Filter and compared it to the Coloured Noise Kalman Filter, which augments the state vector with an auto-retrogressive model that describes the bias. They propose a feedback implementation of these methods, which allows a time-correlated representation of the bias, but the accuracy is limited by the prescribed model of the bias. More recently, da Silva & Colonius (2020) proposed a low-rank representation of the observation discretisation bias, based as well on an auto-regressive model. They performed parameter estimation with an Ensemble Kalman Filter to calibrate the parameters of the auto-regressive model. This successfully modelled the discretisation bias, but did not tackle the model bias, which is a more general form of error. The estimation of a nonlinear dynamical state in the presence of a model bias remains an open problem. We propose a framework to obtain an unbiased analysis in thermoacoustic low-fidelity models by inferring the model bias. This consists of the combination of data assimilation with a recurrent neural network, which infers the model error of the low-fidelity of the thermoacoustic system.
Recurrent neural networks are data-driven techniques that are designed to learn temporal correlations in time series (Rumelhart et al., 1986), with a variety of applications in timeseries forecasting. In fluid mechanics, recurrent neural networks have been employed to model unsteady flow around bluff bodies (Hasegawa et al., 2020), and as an optimisation tool for gliding control (Novati et al., 2019). Among recurrent neural networks, echo state networks based on reservoir computing, which are universal approximators (Grigoryeva & Ortega, 2018), proved to be successful in learning nonlinear correlations in data (Maass et al., 2002;Jaeger & Haas, 2004) and ergodic properties (Huhn & Magri, 2022). Training an echo state network consists of a simple linear regression problem. Because no gradient descent is necessary, vanishing/exploding gradient problems do not occur in echo state networks. In chaotic flows, echo state networks have been used, for instance, to learn and optimise the time average of thermoacoustic dynamics (Huhn & Magri, 2020a, 2022, to predict turbulent dynamics with physical constraints (Doan et al., 2021), and to predict the statistics of extreme events in turbulent flows (Racca & Magri, 2021). In this paper, we propose to model the model bias with an echo state network, which is a tool that is more versatile and general than auto-regressive models (p.306, Aggarwal et al., 2018).

Objectives and structure
The objective of this paper is five-fold. First, we develop a sequential data assimilation for a low-order model to self-adapt and self-correct any time that reference data becomes available. The method, which is based on Bayesian inference, provides the maximum a posteriori estimate model prediction, i.e., the most likely prediction. Second, we apply the methodology to infer the thermoacoustic states and heat release parameters on the fly without storing data. Third, we analyse the performance of the data assimilation algorithm on a twin experiment with synthetic data and interpret the results physically. Fourth, we propose practical rules for thermoacoustic data assimilation. Fifth, we extend the data assimilation method to account for a biased thermoacoustic model. This method is tested by assimilating observations from a higher-fidelity model with nonideal boundary conditions, a mean flow and the simulation of the flame front with a kinematic model (Dowling, 1999). The simulation of the flame dynamics is suitable for a time-domain approach, and it overcomes the limitations of flame response models.
The paper is structured as follows. § 2 provides a description of the nonlinear thermoacoustic model with the data assimilation technique and its implementation for thermoacoustics. § 3 presents the method developed for state and parameter estimation. § 4 presents the method developed for combining data assimilation with an echo state network. § 5 presents the nonlinear characterisation of the thermoacoustic dynamics. § 6 shows the results for non-chaotic regimes, whereas § 7 shows and discusses the results for chaotic solutions. § 8 shows and discusses the results for unbiased state and parameter estimation for a high-fidelity limit cycle. A final discussion and conclusions end the paper in § 9.

Thermoacoustic data assimilation
We consider a nonlinear thermoacoustic model, T , as where is the state of the system; is the vector of the system's parameters; y are the observables from reference data; F is a nonlinear operator, which, in general, is differential; is a model error; G is a nonlinear map from the state space to the observable state; and is the observation error. The thermoacoustic problem on which we focus is: Given some data (observables) y, and a mathematical model T , what are the most likely physical states and parameters, , of the system? To answer this, we use a Bayesian approach in the well-posed maximum a posteriori estimation framework. Although the framework is versatile, in the next sections, we specify the low-order model, F, the data, y, and the data assimilation approach.

Qualitative nonlinear thermoacoustic model
The system consists of an open-ended tube containing a heat source, such as a flame or an electrically heated gauze. Because the tube is sufficiently long with respect to the diameter, the cut-on frequency is such that only longitudinal acoustic waves propagate. This is known as the Rĳke tube, which is a common laboratory-scale device that has been employed in a variety of fundamental studies (Heckl, 1990;Balasubramanian & Sujith, 2008;Juniper, 2011;Magri et al., 2013). This device is represented in Figure 1. The Rĳke model used in this work is described by Balasubramanian & Sujith (2008) and Juniper (2011). The flow is assumed to be a perfect gas; the mean flow is sufficiently slow such that its effects are neglected in the acoustic propagation; and viscous and body forces are neglected. The acoustics are governed by the dimensionless linearised momentum and energy conservation equations where is the acoustic velocity; is the acoustic pressure; is the heat release rate; f is the flame location; is the Dirac delta distribution, which models the heat source as a point source (compact assumption); and is the damping factor, which encapsulates the acoustic energy radiated from both ends of the duct, and the thermo-viscous losses in boundary layers. The non-dimensional heat release rate perturbation, , is modelled with a qualitative nonlinear time-delayed model (Heckl, 1990) where is the strength of the source; f is the acoustic velocity at the flame location; and is the time delay. The heat release rate is a key thermoacoustic parameter for the system's stability. The dimensionless variables in (3)-(4) and the dimensional variables (with˜) are related as =˜/˜0, where˜0 is the length of the tube; =˜˜0/˜0, where˜0 is the mean speed of sound; =˜ /˜0; =˜ /˜0, where˜0 is the mean density; =˜ /(˜0˜2 0 ); =˜ ( − 1)/(˜0˜3 0 ), where is the heat capacity ratio; and ( − f ) =˜(˜−˜f)˜0. The open-ended boundary conditions are ideal, which means that the acoustic pressure is zero, i.e., = 0 at = {0, 1}. By separation of variables, the acoustic velocity and pressure are decomposed into Galerkin modes as (Zinn & Lores, 1971) where cos( ) and sin( ) are the eigenfunctions of the acoustic velocity and pressure, respectively, when = 0 and = 0; and is the number of acoustic modes kept in the decomposition. Substituting (5) into (3), multiplying (3b) by sin ( ), and integrating over = [0, 1], yield the governing ordinary differential equations, which physically represent a set of nonlinearly coupled oscillators − = 0 (6a) where the damping term is defined by modal components = 1 2 + 2 √ , which is physically motivated in Landau & Lifshitz (1987). The damping coefficients, 1 and 2 , are assumed to be constant. For reasons that will be explained in § 2.3, we introduce an advection equation to mathematically eliminate the time-delayed velocity term (Huhn & Magri, 2020b) where is a dummy variable that travels with non-dimensional velocity −1 in a dummy spatial domain such that Equation (8) is discretised with a Chebyshev method (Trefethen, 2000) with + 1 points in the interval 0 ≤ ≤ 1.
In a state-space notation, the thermoacoustic problem is governed by where the state vector ≡ ( ; ; v) ∈ R 2 + is the column-concatenation of the acoustic amplitudes, ≡ ( 1 , 2 , ..., ) ∈ R and ≡ ( 1 / , 2 /(2 ), ..., /( )) ∈ R , and the dummy velocity variables v ≡ ( 1 , 2 , ..., ) ∈ R , which arise from the discretisation of (7); the thermoacoustic parameters are contained in the vector = ( , , ) ∈ R ; F represents the nonlinear operator that consists of (6a),(6b) and (7), F : R 2 + + → R 2 + ; and M( ) is the measurement operator, which maps the state to the observable space at . The expression of the measurement operator depends on the nature of the observables being assimilated, as explained in § 3. To work with a reduced-order model that qualitatively captures the essential dynamics, we use = 10 acoustic modes. For the advection equation, = 10 ensures numerical convergence (Huhn & Magri, 2020b). The number of degrees of freedom of the reduced-order model is = 2 + = 30. The initial value problem (9) is solved with an automatic-stepsizecontrol method that combines fourth and fifth order Runge-Kutta methods (Shampine & Reichelt, 1997).

Qualitative and quantitative accuracy
We say that a low-order model is qualitatively correct when it captures the key physical parameters/mechanisms (e.g., the time delay). Although a low-order model may be physically motivated, it is subject to three sources of errors: (i) uncertainty in the state, (ii) uncertainty in the parameters, and (iii) bias in the model, i.e., the low-order equation does not contain all the terms necessary to model the phenomenon (the model bias is equivalently referred to as the model error). Data assimilation methods combine the forecast of a low-order model with observations from either real experiments, or high-fidelity simulations, which reduces the bias in the state ( § 3.1) and/or in the parameters of the model ( § 3.2). However, traditional data assimilation methods do not tackle the model bias because they assume that the forecast model is unbiased. In § 4, we propose an echo state network as a method to estimate the model bias, thereby closing the low-order model equations in the data assimilation. In summary, the aim of data assimilation is to make a qualitative accurate model more quantitatively correct.

Data assimilation
Data assimilation optimally combines the prediction from an imperfect model with data from observations to improve the knowledge of the system's state. The updated solution (analysis) optimally combines the information from the observations, y, and the model solution (forecast) with their uncertainties. In order to (i) update the system's knowledge any time that data becomes available, and (ii) not store the data during the entire operation, we assimilate sequentially assuming that the process is a Markovian process. The concept of Bayesian update is key to this process, as explained in § 2.3.1.

Bayesian update
In a Bayesian framework, we quantify our confidence in a model by a probability measure. Hence, we update our confidence in the model predictions every time we have reference data from observations. The rigorous framework to achieve this is probability theory, as explained in Cox's theorem (Jaynes, 2003).
To set a probabilistic framework at time = , the state, , and reference observation, y are assumed to be realisations of their corresponding random variables acting on the sample spaces Ω = R 2 + and Ω y = R y . Because we transformed the time-delayed problem into an initial value problem, the solution of (9) at the present depends on the solution at the previous time step only. In other words, we transformed a non-Markovian system into a Markovian system, which simplifies the design of the Bayesian update. We quantify our confidence in a quantity through a probability, P ∼ P ( | −1 , , F) y ∼ P (y | , , F), where | denotes that the quantity on the left is conditioned on the knowledge of the quantities on the right. The leftmost probability answers the question: "Given a model F, a set of parameters , and the state −1 , what is the probability that the state takes the value ?". The rightmost probability answers the question: "if we forecast the state from the model, what is the probability that we observe y ?". We assume that the observations are statistically independent and uncorrelated with respect to the forecast. To update our knowledge of the system, the prior knowledge from the reduced-order model and the reference observations are combined through Bayes' rule P ( |y , , F) = P (y | , , F)P ( , , F) P (y , , F) .
First, P ( , , F) is the prior, which measures the knowledge of our system prior to observing y . The prior evolves through the Chapman-Kolmogorov equation (Jazwinski, 2007), which involves multi-dimensional integrals. To numerically solve the Chapman-Kolmogorov equation, we use an ensemble method by integrating the model equations ( § 2.3.2), which provide a forecast on the state. Second, P (y | , , F) is the likelihood (10), which measures the confidence we have in our model prediction. The likelihood is prescribed (see § 2.3.2). Third, P (y , , F) is the evidence, which is the probability that the observable takes on the value y . This can be prescribed from the knowledge of the experimental uncertainties. Finally, P ( |y , , F) is the posterior, which measures the knowledge we have on the state, , after we have observed y . Here, we will select the most probable value of in the posterior (i.e., the mode) as the best estimator of the state (maximum a posteriori approach, which is a well-posed approach in inverse problems). The best estimator is called analysis in weather forecasting (Tarantola, 2005). Equation (11) provides the Bayesian update, which is key to this work and sequential data assimilation.

Stochastic ensemble filtering for sequential assimilation
For brevity, we will omit the subscript , unless it becomes necessary for clarity. We focus on a qualitative reduced-order model in which (i) the bias on the solution is negligible, (ii) the uncertainty on the state is represented by a covariance, (iii) the probability density function of the state is assumed to be symmetrical around the mean, and (iv) the dynamics at regime do not present frequent extreme events, i.e., the tails of the probability density function are not heavy. In section §4 we relax assumption (i) by introducing a methodology to estimate the bias of the solution, i.e., the model error.
The probability distribution to employ is the distribution that maximises the information entropy (Jaynes, 1957), which, in this scenario, is the Gaussian distribution. Therefore, the system's forecast and the observations are assumed to follow Gaussian distributions, i.e., f ∼ N , C f and y ∼ N (M , C ), respectively, where N denotes the normal distribution with the first argument being the mean, and the second argument being the covariance matrix. The forecast and observation covariance matrices are C f and C , respectively. If the dynamics were linear, the Bayesian update (11) would be exactly solved by the Kalman filter equations (Kalman, 1960) where the superscripts 'a' and 'f' denote analysis and forecast, respectively. Equation (12a) corrects the model prediction by weighting the statistical distance between the observations (data) and the forecast, according to the prediction and observation covariances (Evensen, 2003). The observation error covariance has to be prescribed based on the knowledge of the experimental methodology used.
In an ensemble method, the distribution is represented by the sample statistics where the -th column of the matrix is the deviation from the mean of the -th realisation, − , and is the number of ensemble members. Because (13) is a Monte Carlo Markov Chain Figure 2: Conceptual schematic of a sequential filtering process. Truth (green); observations and their uncertainties (red); forecast state and uncertainties (orange); and analysis state and uncertainties (blue). The circles represent pictorially the spread of the probability density functions: the larger the circles, the larger the uncertainty.
integration, the sampling error scales as O ( −1/2 ). The key idea of ensemble filters is to group forecast states from a numerical model (the ensemble) to obtain, on filtering, the analysis state. Ensemble methods describe the state's uncertainty by the spread in the ensemble at a given time to avoid the explicit formulation of the covariance matrices (Livings et al., 2008). The algorithmic procedure is as follows. First, the initial condition is integrated forward in time to provide the forecast state, f . Second, experimental observations, y, are statistically assimilated into the forecast to obtain the analysis state, a , which, in turn, becomes the initial condition for the next time step. The forecast accumulates errors over the integration period, which is reduced in the assimilation stage through observations with their experimental uncertainties. If the model is qualitatively correct and unbiased, after a sufficient number of assimilations, the ensemble concentrates around the true value. This sequential filtering process on one ensemble member is shown in Figure 2. The process is repeated in parallel for the other ensemble members.

Ensemble Square-Root Kalman Filter
In the ensemble Kalman filter (12), each ensemble member is updated with the assimilation of independently perturbed observation data. However, this method provides a sub-optimal solution that, in some cases, does not preserve the ensemble mean and is affected by sampling errors of the observations (Evensen, 2003). Moreover, the ensemble Kalman filter may require a fairly large ensemble to compensate the sampling errors of the observations (Sakov & Oke, 2008). The ensemble square-root Kalman filter (EnSRKF), which is an ensemble-transform Kalman filter, overcomes these issues (Livings et al., 2008). The key idea of the EnSRKF is to update the ensemble mean and deviations instead of each ensemble member. The EnSRKF for ensemble members and a state vector of size reads ) ∈ R × is the matrix that contains the ensemble members as columns; A = ( , . . . , ) ∈ R × contains the mean analysis states in each column; Y = (y, . . . , y) ∈ R × is the matrix containing the observations repeated times; the identity matrix is represented by I; and V and Σ are the orthogonal matrices of eigenvectors and a diagonal matrix of eigenvalues, respectively, from singular value decomposition. The largest matrices required in the EnSRKF algorithm have dimension × and × , therefore, the storage requirements are significantly smaller than those of non-ensemble based filters. In addition, this filter is non-intrusive and suitable for parallel computation. A derivation of the EnSRKF can be found in Appendix A.

Discussion
An ensemble method enables us to (i) work with high-dimensional systems because we do not need to propagate the covariance matrix, which has O ( 2 ) components; (ii) work with nonlinear systems, such as the thermoacoustic system under investigation; (iii) work with time-dependent problems; (iv) not store the data because we sequentially assimilate (real-time, i.e., on-the-fly, assimilation); and (v) avoid implementing tangent or adjoint solvers, which are required, for example, in variational data assimilation methods (Traverso & Magri, 2019). On the one hand, if the system were linear, a Gaussian prior would remain Gaussian under time integration. This makes the ensemble filter the exact Bayesian update in the limit of an infinite number of samples. On the other hand, if the system were nonlinear (e.g., in the present study), a Gaussian prior would not necessarily remain Gaussian under time integration. This makes the ensemble filter an approximate Bayesian update. The update of the first and second statistical moments, however, remains exact. In other words, we cannot capture the skewness, kurtosis, and other higher moments. (Particle filter methods overcome this limitation, but they may be computationally expensive (Pham, 2001).)

State and parameter estimation
This work considers both state estimation, in which the state is the uncertain quantity ( § 3.1); and combined state and parameter estimation, in which both the state and model parameters are uncertain ( § 3.2).

State estimation
State estimation is the process of using a series of noisy measurements into an estimation of the state of the dynamical system, . This paper considers two different scenarios in assimilating acoustic data in thermoacoustics: (i) assimilation of the acoustic modes; and (ii) assimilation of pressure measurements from mic microphones, which are located equidistantly from the flame location up to the end of the Rĳke tube ( Figure 1). The assimilation of acoustic modes assumes that observation data is available for the pressure and velocity acoustic modes, { , }. Hence, the state equations are Alternatively, in scenario (ii), from (5), the reference pressure measurements are computed as The statistical errors of the microphones are assumed to be independent and Gaussian. In the twin experiment, the pressure observations are created from the true state, with a standard deviation mic that mimics the measurement error. Pressure data cannot be assimilated directly with the EnSRKF because the state vector contains the acoustic modes, i.e., it does not contain the acoustic pressure. To circumvent this, we augment the state vector with the acoustic pressure at the microphones' locations according to (16). Therefore, the new state vector includes the Galerkin acoustic modes, the dummy velocity variables and the pressure at the different microphone locations, i.e., ≡ ( ; ; v; p mic ), with dimension = 2 + + mic . The augmented state equations are With this, the modes will be updated indirectly during the assimilation step using the microphone data and their experimental error.

Combined State and Parameter estimation
Combined state and parameter estimation is the process of using a series of noisy measurements into an estimation of the state of the dynamical system, , and the parameters, . The parameters are regarded as variables of the dynamical system so that they are updated in every analysis step. This is achieved by combining the governing equations of the thermoacoustic model with the equations that describe the evolution of parameters, which are constant in time, but can change when observations are assimilated. The equations for the augmented state of combined state and parameter estimation are With a slight abuse of notation, the state vector in (18) is equal to ≡ ( ; ; v) in (15) for the assimilation of acoustic modes, and equal to ≡ ( ; ; v; p mic ) in (17) for the assimilation of pressure measurements. The data assimilation algorithm is applied to the augmented system for both the forecast state and the parameters to be updated at every analysis step. The parameters need to be initialised for each ensemble member from a uniform distribution with a width of 25% of the mean parameter value. In other words, we assume that the parameters are uncertain by ±25%.

Performance metrics
The performance of the state estimation and combined state and parameter estimation are evaluated with three metrics: (i) the trace of the forecast covariance, C f , which globally measures the spread of the ensemble; (ii) the relative difference between the true pressure oscillations at the flame location and the filtered solution, which measures the instantaneous error; and (iii) for the combined state and parameter assimilation, the convergence of the filtered parameters normalised to their true values, as well as the root-mean square error with respect to the true solution.

Data assimilation with bias estimation
In this section, we analyse the case of state, parameter and model bias estimation. Sources of model bias in the model of § 2.1 include (i) idealised boundary conditions, (ii) simple heat-release law with no simulation of the flame, and (iii) zero mean flow effects. We infer the model bias to correct the biased forecast state prior to the analysis step. By performing state and parameter estimation on the unbiased forecast, we increase the quantitative accuracy of the model prediction. First, we define the time-dependent model bias U( ) as the difference between the true pressure state (from the higher-fidelity model) at the microphone locations p t mic , and the expected biased pressure p mic , i.e., the mean of the ensemble of pressures We propose an echo state network to predict the evolution of the model bias.

Echo State Networks
An Echo State Network (ESN) is a type of recurrent neural network based on reservoir computing (Lukoševičius, 2012). ESNs learn temporal correlations in data by nonlinearly expanding the data into a high-dimensional reservoir, which acts as the memory of the system, and is a framework that is more versatile than auto-regressive models (Aggarwal et al., 2018). Figure 3 shows a pictorial representation of an echo state network. The reservoir is defined by a high-dimensional vector r( ) ∈ R , and a state matrix W ∈ R × , where is the number of neurons in the reservoir. We use = 100 neurons, which are sufficient to represent the dynamics of the thermoacoutic system (Huhn & Magri, 2022). The inputs and outputs from the reservoir are vectors of dimension R mic because we define the bias as the pressure error at each microphone (19). Their input and output matrices are W in ∈ R ×( mic +1) and W out ∈ R mic ×( +1) , respectively. At every time , the input bias U( ) and the state of the reservoir at the previous time step r( −1 ) are combined to predict the reservoir state at the current time as well as the bias at the next time step U( +1 ) such that whereŨ is the input bias normalised by the range component-wise; and the constant 0.1 is used to break the symmetry of the ESN (Huhn & Magri, 2020a). The operator [ ; ] indicates vertical concatenation. The matrices W in and W are predefined as fixed, sparse and randomly generated. Specifically, W in has only one non-zero element per row, which is sampled from a uniform distribution in [− in , in ], where in is the input scaling. W is an Erdős-Renyi matrix with average connectivity = 5, in which each neuron (each row of W) has on average connections (non-zero elements), which are obtained by sampling from a uniform distribution in [−1, 1]; the entire matrix is then re-scaled by a multiplication factor to set the spectral radius, . The weights of W out are determined through training, which consists of solving a linear system for a training set of length tr where R ∈ R ( +1)× tr is the horizontal concatenation of the augmented reservoir state for each time in the training set, [r( ); 1] with = 1, ..., tr ; U train ∈ R mic × tr is the time-concatenation of

Thermoacoustic Echo State Network
On the one hand, in open loop (Fig. 4a), the true bias data (19) is fed into every forecast step to compute (20). The output bias from the ESN is disregarded in open loop, which is used to initialise the network (the washout), such that U w ∈ R mic × w . State and parameters are not updated during the washout. On the other hand, in closed loop (Fig. 4b), the true bias data (19) is fed in the first step, then the output bias from a forecast step (20) is used as the initial condition of the next step. The ESN forecast frequency is set to be five times smaller than the thermoacoustic model time step, to reduce the additional computation cost associated with the bias estimation. In detail, the pseudoalgorithm 1 summarises the procedure we propose for bias-aware data assimilation with an echo state network: (1) the ensemble of acoustic modes is initialised and forecast for the washout time; (2) we run the ESN in open loop to initialise the reservoir; and (3) we perform data assimilation. When measurements become available, we compute the ensemble of pressures by adding the estimated bias from the ESN, U, to the expectation of the forecast pressure at the microphones, such that the unbiased ensemble is centred around the unbiased pressure statep mic j = U + p mic j for = 1, ..., , where the (ˆ) indicates a statistically-unbiased quantity. Subsequently, we perform the analysis step. The EnSRKF obtains the optimal combination between the unbiased pressures and the observations, which indirectly updates the biased ensemble of acoustic modes and parameters. The resulting analysis ensemble is used to re-initialise the ESN for the next forecast in closed-loop, such that the bias is the difference between the observations and the expectation of the analysis pressures, i.e., U = p t mic − p a mic . Finally, the Rĳke model and the ESN are time-marched until the next measurement becomes available for assimilation.

Test case
The higher-fidelity models we use is based on the travelling-wave approach of Dowling (1999), which is described in detail by Aguilar Pérez (2019). The acoustic pressure and velocity are written as functions of two acoustic waves that propagate downstream and upstream of the tube (see Eqs. (3.5) and (3.7) in Dowling, 1999). As shown in Figure 5, the waves are defined as and , with convective velocities˜0 ±˜0 ,u in the region˜≤˜f; and ℎ and , with convective velocities 0 ±˜0 ,d in˜≥˜f. This model uses dimensional quantities, so we transform our dimensionless thermoacoustic system (3) into its dimensional form. The equations that relate the travelling waves (˜) and ℎ(˜) to the heat release rate˜ (˜) are, with a slight abuse of notation, where is the cross-sectional area of the duct;˜u and˜d are the times taken for the acoustic waves to travel from the flame to the upstream and downstream boundaries, respectively. For completeness, the matrices X and Y are provided in the Supplementary material. In contrast to the low-fidelity model of § 2.1, the travelling-wave approach (i) accounts for mean flow effects; and (ii) accounts for non-ideal open boundary conditions, such that (˜) = u (˜−˜u) and (˜) = d ℎ(˜−˜d), where u and d are the reflection coefficients. We set the mean flow velocity upstream of the flame to˜0 ,u = 10 m/s, and the mean heat release rate to¯ 0 = 2000 W. We test the bias-aware data assimilation with the echo state network for a limit cycle. We perform unbiased state estimation, and combined unbiased state and parameter estimation using synthetic pressure data obtained from the travelling-wave and flame kinematic model as the truth (higher-fidelity data). For brevity, we perform parameter estimation to the heat source strengthõ nly, which we physically expect to be˜∼ O (10 6 ) Ws 1/2 m 5/2 (Eq. (2.7) in Juniper, 2011). The results are discussed in Section 8.

Nonlinear characterisation
In order to assess the performance of data assimilation, we first characterise the nonlinear dynamics by analysing the solutions at regime (after the initial transient) with bifurcation analysis and nonlinear time series analysis (Kantz & Schreiber, 2003;Kabiraj et al., 2012b;Guan et al., 2020). The system's parameters are f = 0.2, 1 = 0.1, 2 = 0.06 and = 10. In bifurcation analysis, we examine the topological changes in the pressure oscillations, f , as the control parameters vary. First, we study the two-dimensional bifurcation diagram, which is shown in Figure 6. The classification in the two-dimensional diagram is obtained following the procedure of Huhn & Magri (2020b). This method consists of obtaining the Lyapunov exponents, through covariant-vector analysis. With this, the dynamical motions are identified as: (i) fixed point if 1 < 0; (ii) limit cycle if 1 = 0 and 2 < 0; (iii) quasiperiodic if 1 = 0 and 2 = 0; and (iv) chaotic if 1 > 0. For small and the system converges to a fixed point because the thermoacoustic energy is smaller than damping. As the heat source strength increases, the Rayleigh criterion is fulfilled and self-excited oscillations arise as limit cycles. When reaches values over 2.5, different types of solution appear, such as quasiperiodic or chaotic attractors. The refined region in Figure 6 shows that the type of solutions is sensitive to small changes in the control parameters, which has implications for data assimilation, as argued in the remainder of the paper.
These topological changes are further investigated with a one-dimensional bifurcation diagram for a fixed time delay ( = 0.2), shown in Figure 7. Because the nonlinear solutions at regime may vary with the initial condition, two sets of results are shown for a small initial condition ( = / = 0.005) and a large initial condition ( = / = 5) to capture subcritical behaviours. The bifurcation diagram is obtained by marching forward in time the governing equations of the nonlinear dynamical system until the system reaches a statistically stationary state. For each value of the control parameter, the bifurcation diagram shows the peaks and troughs of the acoustic pressure at the flame location. (The nonlinear time series analysis results are shown in the Supplementary material.) From left to right, first, the solution is the fixed point (region A), which is the case of no oscillations. Second, the appearance of periodic oscillations from a fixed point is observed with a large initial condition at = 0.26, with a small region of hysteresis from = 0.26 to = 0.34. This first self-sustained state is a period-1 limit cycle (region B), which originates from a subcritical Hopf bifurcation. Within region B, the system undergoes a period-doubling bifurcation at = 0.6 from period-1 to period-2 oscillations. Third, the period-2 limit cycle bifurcates into a 3-torus quasiperiodic motion at = 3.35 (region C). A quasiperiodic oscillation is an aperiodic solution that results from the interaction between two or more incommensurate frequencies, (also known as a Neimark-Sacker bifurcation) (Kabiraj et al., 2012b). Fourth, the solution becomes chaotic at = 3.65 (region D). In summary, the evolution from region A to region D shows that the system reaches a chaotic state via a quasiperiodic route to chaos, i.e., via a Ruelle-Takens scenario (Kabiraj  et al., 2012a). Fifth, after this first route to chaos, changes in the control parameter drive the system back to a periodic limit cycle through a tangent bifurcation (Kantz & Schreiber, 2003) at approximately = 4.25 (region E), with a second region of hysteresis from = 4.24 to = 4.28. These high amplitude limit cycles region becomes again chaotic at = 5.61 (region F). Sixth, when reaches 7.65, the system evolves towards a frequency-locked state (region G). Frequencylocked solutions arise from the competition between two or more frequencies, but in contrast to quasiperiodic signals, these frequencies are commensurate. Seventh, at = 7.85, the frequencylocked solution bifurcates into a quasiperiodic solution (region H). Region-H solutions show a two-dimensional toroidal structure, in contrast to the three-dimensional torus from region C. In region H, some of the simulations showed that there are areas of chaotic dynamics, which can be appreciated by the difference of the solutions from the small and large initial condition in Figure 7. (A higher region refinement could be performed to fully understand the bifurcations within this region, however, this is beyond the scope of this work.) The qualitative bifurcation behaviour of this reduced-order model is observed in experiments (Kabiraj et al., 2012b;Kabiraj & Sujith, 2012), which means that the reduced-order model qualitatively captures the nonlinear thermoacoustic dynamics.
The bifurcation analysis shows a rich variety of solutions in a relatively small range of parameters, i.e., small changes of a parameter, or a state, can generate solutions that are topologically different. This nonlinear sensitivity has implications in the design of a data-assimilation ensemble

Twin experiments in non-chaotic regimes
We perform a series of experiments with synthetic data, which is generated by the model. To mimic an experiment, we add stochastic uncertainty to the synthetic data by prescribing an observation covariance matrix. This approach is also known as the twin experiment (e.g., Traverso & Magri, 2019). The EnSRKF algorithm is tested in the different regions of Figure 7, for the different nonlinear regimes: fixed point, limit cycle, frequency-locked, quasiperiodic and chaotic. The filter is first tested in the non-chaotic regimes for the assimilation of (i) acoustic modes ( § 6.1), and (ii) acoustic pressure from microphones ( § 6.2). The assimilation of chaotic solutions, which presents further challenges, is investigated in § 7. Different simulations are performed to determine suitable values for the number of ensemble members ( ); the time between analysis (Δ analysis ); the standard deviation ( frac ), i.e., the observations' uncertainties during the acoustic modes assimilation; and the standard deviation of the microphone measurements ( mic ). Table 1 shows the parameters and initial conditions of the reference (i.e., "true") solution. This range of parameters is justified from the literature in thermoacoustic data assimilation (Traverso & Magri, 2019). Computational time is discussed in the Supplementary material.

Assimilation of the acoustic modes
This section includes results for state estimation ( § 6.1.1) and combined state and parameter estimation ( § 6.1.2).

State estimation
This section presents simulations performed assuming that there are observations available for all acoustic modes, i.e., the number of observations is = 2 = 20. (Including observations for the dummy velocity variables would further improve the filter convergence, however, they are not considered because the velocity advection field in the heat source region is not measured in a real engine.) Figure 8 shows the acoustic pressure before assimilation (unfiltered solution), after assimilation (filtered solution) and the data at the assimilation steps (analysis steps). Panels (a) to (d) show the transient of a fixed point, a period-1 limit cycle, a frequency-locked, and a quasiperiodic motion, respectively. In the filtered solution, data assimilation is performed during the first 50 time units, and it is marched in time without further assimilation for 10 more time units. The EnSRKF successfully learns (i.e., infers) the true solution for all the nonlinear regimes. As expected, the convergence is faster for the fixed point and limit cycle cases (Figs. 8a,b) because they are simpler dynamical motions. (The unfiltered solution also converges to the same value for these simple cases. This is due to the stable nature of their attractors, and because their regions are unaffected by the chaotic butterfly effect.) For multi-frequency dynamical regimes, Figures 8c,d show that the Bayesian update can learn the frequency-locked and quasiperiodic states of regions C and G in Figure 7. However, these show more discrepancies between the filtered and true solutions. Physically, this is due to the multiple bifurcations that occur in a small range of parameters, which is typical of thermoacoustic systems. In reference to Figure 7, region C is next to the chaotic region D; and region G is a short range region surrounded by the chaotic region F, and the mixed quasiperiodic-chaotic region H. Therefore, the discrepancy in these cases is caused by some ensemble members falling in different basins of attraction. To overcome this issue, we propose a strategy in § 6.2.2.
The data assimilation process depends on the observation's uncertainty, frac , and ensemble size, . Figure 9 shows the performance metrics ( § 3) for the quasiperiodic solution of Figure 8d. As expected, the filtered solution is more accurate for a smaller standard deviation because the observations are closer to the truth. Importantly, the algorithm is capable of learning the reference solution for an ensemble having an error as large as 50% of the mean of the acoustic modes, which means that the data assimilation algorithm is robust. For the pressure performance metric, the algorithm brings the relative error below 10% after 15 time units (in the worst case scenario, Figure 9a). For the covariance matrix trace performance metric, the EnSRKF continuously reduces the initial ensemble variance up to a final plateau, which cannot be zero because of the non-zero observation and forecast background noise (Figure 9c). The evolution of the trace is an indicator of the spread of the forecast ensemble, which informs on the uncertainty of the solution. The ensemble size does not have a strong influence in the ensemble uncertainty during the assimilation because the trace of the covariance matrix remains of the same magnitude independently of the value of ( Figure 9d). Nevertheless, the relative error is significantly higher for a small ensemble with = 4 (Figure 9c). This means that four ensemble members are not sufficient to give a sufficient ensemble distribution, therefore, the solution converges to an incorrect state, but with a small spread around it. Comparing the errors for ten and fifty ensemble members, we see no significant differences between the solutions, which shows that having an ensemble size larger than the number of degrees of freedom is not required. This is one of the benefits of using the square-root filter (in the standard ensemble Kalman filter larger ensembles are needed to avoid sampling errors (Livings et al., 2008)). However, the computational time required for 50 ensemble members was approximately 4 times longer than that for 10. Therefore, an ensemble size of = 10 provides a good approximation of  the true state for the assimilation of acoustic modes, while keeping the computation time minimal.

Combined state and parameter estimation
In this section, we analyse the combined state and parameter estimation to calibrate both the state and parameters. The two uncertain parameters ( = [ , ] in (18)) are added to the state vector and updated simultaneously with the acoustic modes and dummy velocity variables, as detailed in § 3.2. Figure 10 shows the evolution of the parameters, normalised to their true value, for the four non-chaotic solutions. The convergence shows that the EnSRKF update is capable of learning the true and values for the four dynamical motions.
For a comparison of combined state and parameter estimation with state estimation, we compute the root mean square (RMS) error. The RMS error at each time step is defined as the square-root of the trace of the covariance matrix of the filtered ensemble, relative to the true solution The RMS error is evaluated for the state estimation and the combined state and parameter estimation cases, using different initial uncertainties for and . This is achieved in state estimation by defining = true and = true , where is the defined initial uncertainty. For the combined state and parameter estimation, the initial and of each member in the ensemble are taken from an uniform distribution centred around true and true , with a sample standard deviation of 25%. Figure 11a shows the RMS error for the initial parameters set to their true value. The state estimation only outperforms the combined state and parameter estimation in this case, as the state estimation model works with constant true parameters while the combined state and parameter estimation updates the parameters in each analysis step with the EnSRKF update. The true parameters are perturbed by 5%,25% and 50% in Figs. 11b,c,d,respectively. The combined state and parameter estimation simulations are capable of learning the true state up to a 25% error in the parameters initialisation, as the RMS error is reduced by two orders of magnitude from the initial state, such as in the case of Figure 11a. Combined state and parameter estimation provides an improved approximation of the solution for the highly uncertain case of 50% error (Figure 11d).

Assimilation of the acoustic pressure from microphones
As detailed in § 3.1, we consider the scenario of assimilation of pressure measurements from mic microphones, located equidistantly from the flame location. This section includes results for state estimation ( § 6.2.1) and combined state and parameter estimation ( § 6.2.2).

State estimation
We consider a tube that is equipped with mic = 6 microphones, which measure multiple frequency contributions in the signal. This value is chosen from the literature in thermoacoustic experiments (Garita et al., 2021). Figure 12 shows the acoustic pressure at the flame location of the true solution, the unfiltered solution, and the filtered solution, as well as their phase space reconstruction and first return map (the calculation procedure can be found in the Supplementary material). In nonlinear regimes, the algorithm successfully learns the pressure state. The accuracy of the solution is lower than in the assimilation of the acoustic modes of § 6.1.1 because, here, less information on the state is assimilated. (The filter is not designed for statistically non-stationary problems, which is why the transient fixed point solution is not fully learnt by the filter.) The effect of the experimental uncertainty is analysed by varying the microphones standard deviation. Physically, the errors are larger than those in Figure 9 because, here, we are assimilating 6 components of the augmented state vector out of 36 components, whereas in § 6.1.1 the filter assimilates 20 out of the 30 components of the state vector. Figures 13a,c show that, after about 20 analysis steps, the filter follows more closely the model than the observations for larger observation's uncertainties. (In other words, the filtered solution "trusts" more the prediction from the model than the observations when the experimental uncertainty is high.) We set mic = 0.01 in the following simulations, which models experimental microphone uncertainties (de Domenico et al., 2017). The relative error is higher than 20% for this case (Figure 13a). Increasing the frequency of analysis allows for a faster convergence with a smaller relative error (Figs. 13b,d). With a time between analysis of Δ analysis = 1.5 or 1, the relative error of the filtered solution becomes less than 10% in only 10 time units, approximately. Thus, for the assimilation of microphone pressure data, a higher frequency of analysis is more suitable. We choose the time between analysis to 1.5 time units. The evolution of the trace of the forecast covariance matrix indicates that the spread of the ensemble rapidly shrinks (Figs. 13c,d). Besides, the spread is two orders of magnitude smaller than in the assimilation of the modes (Figs. 9c,d) and remains small even with large relative errors. Physically, this is because the acoustic modes are directly updated in the modes assimilation, but, in this case, the acoustic modes are unobserved variables that are updated indirectly through the microphone pressure observations.

Combined state and parameter estimation
The parameters and are updated by the EnSRKF at each analysis step, which occurs every 1.5 time units. Figure 14a,b, shows that for an ensemble of ten members, the solution converges to the parameters ≈ 6.6 and ≈ 0.4, which correspond to a chaotic solution (see Figure 6).   . Right: normalised . mic = 6, true = 3.6, true = 0.2, Δ analysis = 1.5, mic = 0.01. The shaded areas show the standard deviation, which becomes smaller as more data is assimilated.
Nevertheless, the true solution is a quasiperiodic oscillator with = 3.6 and = 0.2. This means that the filtered solution not only converges to different parameters, but also belongs to a different nonlinear regime than that of the true solution. Physically, this occurs because thermoacoustic dynamics experience several bifurcations in short ranges of and (Figure 7). This makes the sampling of nonlinear thermoacoustics challenging. A way to circumvent this is to increase the ensemble size. A parametric study of the effect of the number of realisations is shown in Figure 14. Ten ensemble members are not sufficient to learn the reference solution, however, the larger the ensemble, the faster the EnSRKF converges to the true solution.
Occasionally, the EnSRKF provides unphysical parameters as the solution of the optimisation problem, such as negative heat source strength as the solution of the optimisation problem. To avoid this, we reject the analysis steps that give unphysical solutions and continue the forecast with no assimilation. This means that we are left-truncating the Gaussian. Thus the parameters remain constant until the EnSRKF gives a physical solution to the optimisation problem. (Ad-hoc ways to bound parameters can be designed (Li et al., 2019). This is beyond the scope of this work.) The thresholds for rejection are defined as ∈ [0.1, 10] and ∈ [0.005, 0.8]. Because the rejection is effectively reducing the amount of information that can be assimilated, the ensemble convergence slows down. This increase and reject approach is not always sufficient to reach convergence. Figures 15a,b, show the same simulation as in Figure 14 with more microphones, mic = 15, and Δ analysis = 1. In this case, the filtered solution is not converging even for 150 ensemble members, which is caused by covariance collapse. To accelerate the convergence and overcome the spurious correlations of finite-seized ensembles (Evensen, 2009), we introduce a covariance inflation to the ensemble forecast when the solution of the analysis step provides unfeasible parameters. The inflation method can be used to counteract the variance reduction due to the spurious correlations, and force the model to explore more states. Here, we include the model uncertainty as stochastic noise by adding an inflation factor to the ensemble forecast Figure 15: Real-time learning of the parameters. Assimilation of acoustic pressure from microphones for combined state and parameter estimation of a quasiperiodic solution. Left: normalised . Right: normalised . Effect of ensemble size without inflation (top) and with inflation using = 1.02 (bottom). mic = 15, true = 3.6, true = 0.2, Δ analysis = 1, mic = 0.01. The shaded areas show the standard deviation, which becomes smaller as more data is assimilated. In this case, = 1.02 improves the analysis for the quasiperiodic solution. If necessary, adaptive strategies can be designed following Evensen (2009). Figure 15c,d shows the parameters' convergence for the same ensemble sizes as Figures 15a,b, but with covariance inflation. This is sufficient to remove the plateau caused by the divergence of the EnSRKF to unphysical parameters in large ensembles, thereby speeding up the convergence. To summarise, we propose an increase, reject, inflate strategy to learn the nonlinear dynamics and parameters of thermoacoustics.

Twin experiments in chaotic regimes
This section addresses the assimilation in chaotic regimes. We perform a series of twin experiments with synthetic data using the base parameters of Tab. 1 and the obtained suitable parameters in § 6. Both, state estimation and combined state and parameter estimation are tested in the chaotic region F. In the combined state and parameter estimation, the initial conditions for and are sampled from uniform distributions with an upper bound 25% larger than their true value, and a lower bound 25% smaller than the true parameters. Different simulations are performed to analyse  the predictability of the solutions and to determine a suitable time between analysis (Δ analysis ), which is not trivial in chaotic oscillations. Figure 16 shows the comparison between the combined state and parameter assimilation solution, an unfiltered solution, and the true state in the chaotic region F of the bifurcation diagram with the same time between analysis as the previous non-chaotic studies. The assimilation does not perform as well as in non-chaotic regimes. This is physically due to the short predictability of chaotic systems.
There are several ways to estimate the predictability of a chaotic system (Boffetta et al., 2002). Here, the predictability is computed as the inverse of the maximal Lyapunov exponent, which provides a time scale after which two nearby trajectories diverge (linearly) due to the butterfly effect. The methodology followed is described in Magri & Doan (2020). The maximal Lyapunov exponent is determined by analysing the growth of the distance between two nearby trajectories.
In a logarithmic scale, the Lyapunov exponent is the slope of the linear region, which is computed by linear regression. Figure 17a shows two trajectories that are the same until 1 = 980, when they are set apart by = 10 −6 . After 10 time units, the two instantaneous solutions are completely different, which is a manifestation of chaos. The logarithmic evolution of the distance between the two trajectories is shown in Figure 17b, where the slope of the linear region gives the dominant Lyapunov exponent. This method is carried out for several initial conditions in the attractor. The resulting maximal Lyapunov exponent is 1 = 0.74 ± 0.30, which corresponds to a predictability time scale of = −1 1 = 1.62±0.78. Physically, the predictability, , is key to the implementation of the ensemble square-root Kalman filter for time-accurate predictions because, if the time between analysis is too large, the forecast ensemble will already be far apart from the truth. Figure 16 shows how the filtered chaotic solution with an assimilation time on the high end of the time scale is completely different to the true solution. Figure 18 shows the effect of the time between analysis Δ analysis in the chaotic assimilation. The EnSRKF time-accurately learns the true solution for Δ analysis < only as the relative error and the trace of the covariance are reduced significantly and converge. Therefore, we consider a time between analysis of Δ analysis = 0.5 for chaotic regions.
(The butterfly effect is not present in non-chaotic behaviours, therefore, the time between analysis considered in the fixed point, limit cycle, frequency-locked and quasiperiodic cases can be increased to reduce the computation time, as long as the Nyquist-Shannon criterion is fulfilled (Traverso & Magri, 2019).) Figure 19 shows the results of state estimation in a chaotic regime. The assimilation of the acoustic modes is shown in Figs. 19a, while the assimilation of pressure observations is shown in Figs. 19b. The results are generated with an ensemble of = 100. The results indicate that the filter learns the pressure state in chaotic regimes for the two assimilation approaches. Because of the butterfly effect, the filtered pressure and the true signal start differing after removing the filtering due to the chaotic nature of the solutions. The agreement is also evident in the phase space reconstruction and first return map. Figure 20 shows the results of state estimation in the form of power spectral density (PSD). The top PSDs are computed during the assimilation window ( ∈ [900, 1200]) and the bottom PSDs are computed after removing the filter and propagating the filtered solution without data assimilation ( ∈ [1200, 1500]). The PSDs during the assimilation indicate that the filter learns as well almost exactly the frequency spectrum of the solution, while the unfiltered solution exhibits significant discrepancies. After removing the filter, the PSD of the true and filtered solutions remain qualitatively similar, but differ slightly due to the chaotic divergence of the solution.
Finally, the data assimilation algorithm is able to estimate and in the combined state and parameter estimation in chaotic regimes for the assimilation of both acoustic modes and pressure from microphones (Figs. 21a,b, respectively). The results indicate that there is a successful convergence of the parameters even though their initial uncertainty is large. These simulations are performed with a large ensemble of 300 members and by inflating the ensemble when the assimilation is neglected due to unphysical parameters. The inflation parameter required for convergence in the assimilation of pressure data (Figure 21b    increased to 15, as they provide a greater amount of information on the system, i.e., the problem is less ill-conditioned. The data assimilation successfully learns the true state and parameters for chaotic regimes in the twin experiments by increasing the assimilation frequency, the ensemble size and the inflation parameter.

Bias-aware data assimilation with echo state networks
The echo state network is trained with the bias, which is the difference between the statistically stationary solution of the travelling-wave model with the flame kinematic model § 4.3 and a statistically stationary realisation of the lower-fidelity model with initial guess˜t rain = 10 6 Ws 1/2 m 5/2 . The training set is short (1.2 s), and sampled at 10000 Hz. This sampling frequency is consistent with experimental works (e.g. Garita et al., 2021). The washout consists of 0.025 s of observations sampled at the same frequency. Following the results from § 6, the simulations in this section use an ensemble size = 100, an inflation factor of = 1.02, a microphone standard deviation of mic = 0.01 and a time between analysis of Δ˜a nalysis = 3 · 10 −3 s. Figure 23 shows the unbiased state estimation results. The biased filtered solution represents the expectation of the forecast pressure, while the unbiased filtered solution is the resulting pressure state after correcting the model bias with the ESN. After 1 second of assimilation-free forecasting, when both the truth and the low-order solutions are statistically stationary, the ESN is initialised with 0.025 s of washout. At˜= 1.025 s the state estimation begins. The results indicate that the ESN favourably estimates the model bias. This allows the EnSRKF to recover the true pressure timeseries, as well as to learn its frequency spectrum and the attractor (Figure 23b). Figure 24 shows the performance of the ESN at the start of unbiased state estimation, and after the data assimilation ends. After the open-loop initialisation, the agreement between the estimated bias and the actual   . The vertical dashed lines shows the data assimilation window bias is favourable. In the autonomous evolution (closed-loop), a re-initialisation every 3 · 10 −3 s is sufficient to maintain the accuracy on the inferred bias. The ESN is trained with the bias resulting from a simulation using the˜=˜t rain , so it is expected to provide good estimates of the bias when initialising the ensemble to the same value of˜, provided a long enough washout.
The results for combined unbiased state and parameter estimation are shown in Figure 25. The values of heat source strength for the ensemble members are initialised far from the training˜t rain , as a uniform random distribution with a 10% standard deviation and a mean value of˜= 3˜t rain . The data assimilation with echo state network algorithm converges to a physical value of heat source strength (˜= 1.70 · 10 6 Ws 1/2 m 5/2 in Figure 25a), which recovers the dominant frequencies of the higher-fidelity simulation (Figure 25b). By comparing Figures 26 and 24, it can be seen that the timeseries of the model bias for a low-order model with˜= 1.70 · 10 6 Ws 1/2 m 5/2 is significantly different from that of˜t rain . This means that, although the echo state network is trained on data with a fixed˜t rain = 10 6 Ws 1/2 m 5/2 , it is able to infer adaptively the bias of unseen data (with a different˜).

Conclusions
Low-order thermoacoustic models are qualitatively correct, but they are quantitatively incorrect. In this work, we introduce data assimilation to make qualitative models quantitatively (more) accurate. This is achieved by combining the knowledge from observations, such as experimental data, and a physical model prediction. Data and model predictions are combined with a Bayesian data assimilation. The algorithm learns the state, such as the acoustic pressure, and the parameters of the model, every time that reference data becomes available (real-time). First, we develop a sequential data assimilation algorithm based on the ensemble square-root Kalman filter in the time domain. This nonlinear filter selects the most likely state and set of physical parameters, which are compatible with model predictions and their uncertainties, and with observations and their uncertainties. The filter is physical, i.e., it is not a purely machine learning technique, as it provides estimates that are compatible with the conservation laws, which makes it robust and principled. The data, once assimilated, does not need to be stored. For the data assimilation, which is based on a Markov assumption, we transform the time-delayed dynamics (non-Markovian) into an initial value problem (Markovian). Second, we perform twin experiments in each region of the bifurcation diagram with reference data on (i) the acoustic Galerkin modes, and (ii) the acoustic pressure taken from multiple microphones. On the one hand, in non-chaotic oscillations, the frequency with which data should be assimilated needs to fulfil the Nyquist-Shannon criterion with respect to the dominant acoustic mode. On the other hand, in chaotic oscillations, we highlight that the assimilation frequency should scale with the Lyapunov exponent. During the combined state and parameter estimation with pressure observations, it is observed that the filter occasionally provides unphysical solutions, such as negative time delays, which lead to convergence to incorrect solutions. This is due to the bifurcations and hystereses that occur in a small range of parameters. Hence, we propose an increase, reject, inflate strategy to overcome this. In detail, we increase the ensemble size to better capture the correct dynamics; we reject the analysis steps that provide unphysical parameters, e.g., negative time delays; and we inflate the ensemble covariance by adding noise as a regularisation term. With the twin experiments in data assimilation, we show that (i) the correct acoustic pressure and parameters can be accurately learnt (i.e., inferred); (ii) the ensemble size is small (in contrast to standard Kalman filters), from ten to hundred depending on the multi-frequency content of the solution; (iii) the learning is robust because it can tackle large uncertainties in the observations (up to 50% the mean values); (iv) the uncertainty of the prediction and parameters is naturally part of the output; and (v) both the time-accurate solution and statistics (through power spectral density function) can be successfully learnt.
Third, we propose a data assimilation framework to learn the model error (bias). The model bias is inferred by an echo state network, which is a data-driven tool that is more general than an auto-regressive model. We perform data assimilation using reference data from a higher-fidelity acoustic model, which contains a mean flow, non-ideal boundary conditions, and a kinematic model for the flame. The echo state network is trained a priori, and then it is run in parallel with the sequential data assimilation algorithm. We show that, with a short training set, the reservoir learns the dynamics of the thermoacoustic model error. The proposed methodology successfully learns in real time both the time-accurate solution and the statistics of it.
The technology developed in this paper is being applied to improve the quantitative accuracy of reduced-order models with experimental data from pressure sensors, to learn different model parameters, and to provide estimates of the model error. Data assimilation with an echo state network opens up new opportunities for real-time prediction of thermoacoustics by synergistically combining physical knowledge and data, as well as for estimating the model bias beyond the field of thermoacoustics. Acknowledgements The authors are grateful to Francisco Huhn, who helped the authors produce Figure 6, and to Alberto Racca, who helped implement the echo state network algorithm.

A Derivation of the EnSRKF
Before starting with the derivation of the filter, some definitions are introduced. For ensemble members and a state vector ∈ R ×1 , the matrix that encapsulates the ensemble members and the ensemble mean are defined as With these, the following definition for the ensemble perturbation matrix applies The ensemble covariance matrix can be determined from (27), introducing a factor ( − 1) to avoid a sample bias. The covariance matrix is defined as an approximation because it is derived from a statistical sample Hence, a solution for the analysis ensemble perturbations, which preserves the zero mean in the updated perturbations and keeps the EnSRKF unbiased, is (Sakov & Oke, 2008): Finally, the analysis state of the ensembles is determined by adding the analysis ensemble perturbations to the mean analysis ensembles. This analysis state is then propagated in time using the nonlinear forecast model, i.e.: where F is a compact representation of the nonlinear thermoacoustic equations. Note that, in the absence of observations, there would be no data assimilation and the initial conditions for the next forecast are the forecast states rather than the analysis states, hence