Phase-resolved ocean wave forecast with ensemble-based data assimilation

Through ensemble-based data assimilation (DA), we address one of the most notorious difficulties in phase-resolved ocean wave forecast, regarding the deviation of numerical solution from the true surface elevation due to the chaotic nature of and underrepresented physics in the nonlinear wave models. In particular, we develop a coupled approach of the high-order spectral (HOS) method with the ensemble Kalman filter (EnKF), through which the measurement data can be incorporated into the simulation to improve the forecast performance. A unique feature in this coupling is the mismatch between the predictable zone and measurement region, which is accounted for through a special algorithm to modify the analysis equation in EnKF. We test the performance of the new EnKF-HOS method using both synthetic data and real radar measurements. For both cases (though differing in details), it is shown that the new method achieves much higher accuracy than the HOS-only method, and can retain the phase information of an irregular wave field for arbitrarily long forecast time with sequentially assimilated data.


Introduction
Accurate prediction of ocean waves plays a significant role in the industries of shipping, oil & gas, aquaculture, ocean renewable energy, coastal and offshore construction. In the past few decades, both phase-averaged and phase-resolved wave models have been developed. The phase-averaged wave models, which provide statistical descriptions in terms of the wave spectrum, have been widely used in the operational forecast of global and regional sea states (Booij et al. 1999;Tolman et al. 2009). Despite their wide applications and success, phase-averaged models have limitations of providing no information on the individual deterministic waves. For example, rogue waves, which often appear sporadically and potentially cause enormous damages to offshore structures and ships (Broad 2006;Nikolkina et al. 2011), cannot be captured. On the other hand, phase-resolved models can forecast the evolution of individual waves, but receive much less attention historically, partly due to the difficulty in obtaining the phase-resolved ocean surface as initial conditions. This has now been largely ameliorated with the recent development of sensing technologies and wave field reconstruction algorithms (e.g. Reichert et al. 2004;Nouguier et al. 2013;Gallego et al. 2011;Nwogu & Lyzenga 2010;Lyzenga et al. 2015;Qi et al. 2016Qi et al. , 2018aDesmars et al. 2018). For example, the Doppler coherent marine radars have been applied to measure the radial surface velocity field, based on which the field of both velocity potential and surface elevation can be reconstructed in real time (Nwogu & Lyzenga 2010;Lyzenga et al. 2015).
Given the reconstructed surface elevation and velocity potential as initial conditions, the evolution of the wave field can be predicted by linear or nonlinear phase-resolved wave models. Although the linear models yield low computational cost, their prediction horizon is severely limited (e.g. Qi et al. 2017;Blondel et al. 2010). For nonlinear models, the Euler equations governing the free surface need to be numerically integrated. One efficient numerical algorithm to achieve this goal, based on high-order spectral (HOS) method, is developed by Dommermuth & Yue (1987); West et al. (1987), with later variants such as Craig & Sulem (1993); Xu & Guyenne (2009). The novelties of these algorithms lie in the development of an efficient spectral solution of a boundary value problem involved in the nonlinear wave equations, which is neglected in the linear wave models with the sacrifice of accuracy. In recent years, HOS has been developed for short-time predictions of large ocean surface taking radar measurements as initial conditions (Xiao 2013). However, due to the significant uncertainties involved in the realistic forecast (e.g., imperfect initial free surface due to measurement and reconstruction errors; the effects of wind, current, etc., that are not accurately accounted for) as well as the chaotic nature of the nonlinear evolution equations, the simulation may deviate quickly from the true wave dynamics (Annenkov & Shrira 2001). Because of this critical difficulty, operational phaseresolved wave forecast has been considered as a "hopeless adventure" to pursue (Janssen 2008).
The purpose of this paper, however, is to show that the dilemma faced by the phase-resolved wave forecast can be largely addressed by data assimilation (DA), i.e., a technique to link the model to reality by updating the model state with measurement data (Evensen 2003(Evensen , 2009Bannister 2017). Mathematically, the principle of DA is to minimize the error of analysis (i.e., results after combining model and measurements), or in a Bayesian framework, to minimize the variance of the state posterior given the measurements (Evensen 1994(Evensen , 2003Carrassi et al. 2018). Depending on formulations and purposes, two categories of DA algorithms exist, namely the variational-based and the Kalman-filter-based approaches. Among the limited studies to couple DA with phaseresolved wave models, most use the variational-based method, where the purpose is to find the optimal initial condition to minimize a cost function measuring the distance between the model prediction and data in future times (Aragh & Nwogu 2008;Fujimoto & Waseda 2020;Qi et al. 2018a). These methods, however, are not directly applicable to operational forecast due to their requirement of future data far after the analysis state (in contrast to the realistic situation where data becomes available sequentially in time). On the other hand, the Kalman-filter-based approach allows data to be sequentially assimilated, by updating the present state as a weighted average of prediction and data according to the error statistics. The only attempt (based on the authors' knowledge) to couple such an approach with phase-resolved wave model is Yoon et al. (2015), which however assumes linear propagation of the model error covariance matrix, thus limiting its application only to wave fields of small steepness. More robust methods based on ensemble Kalman filter (EnKF, i.e., with error statistics estimated by ensemble of model simulations), which lead to many recent successes in geosciences (Carrassi et al. 2018), have never been applied to phase-resolved wave forecast. Moreover, most existing work, if not all, use only synthetic data for the validation of their methods, which ignores the realistic complexity that should be incorporated into the forecast framework, such as the mismatch between the predictable zone and measurement region, and the underrepresented physics in the model.
In the present work, we develop the sequential DA capability for nonlinear wave models, by coupling ensemble Kalman filter (EnKF) with HOS. The coupling is implemented in a straightforward manner due to the non-intrusive nature of EnKF, i.e., the HOS code can be directly reused without modification (Evensen 2003(Evensen , 2009). The new EnKF-HOS solver is able to handle long-term forecast of the ocean surface ensuring minimized analysis error by combining model prediction and measurement data. The possible mismatch of the predictable zone (which shrinks in time) and measurement region (which moves with, say, ship speed) is accounted for by a new analysis equation in EnKF. To improve the robustness of the algorithm (i.e., address other practical issues such as misrepresentation of the error covariance matrix due to finize ensemble size and underrepresented physics in the model), we apply both adaptive covariance inflation and localization, which are techniques developed elsewhere in the EnKF community (Anderson & Anderson 1999;Anderson 2007;Hamill & Whitaker 2005;Carrassi et al. 2018). We test the performance of the EnKF-HOS method against synthetic and realistic radar data, which shows consistent and significant improvement in forecast accuracy over the HOS-only method in both cases. For the former, we further characterize the effect of parameters in EnKF on the performance. For the latter, we show that the EnKF-HOS method can retain the wave phases for arbitrarily long forecast time, in contrast to the HOS-only method which loses all phase information in a short time.
The paper is organized as follows. The problem statement and detailed algorithm of EnKF-HOS method are introduced in §2. The validation and benchmark of the method against synthetic and realistic radar data are presented in §3. We give a conclusion of the work in §4.

Problem statement
We consider a sequence of measurements of the ocean surface in spatial regions M j , with j = 0, 1, 2, 3, · · · the index of time t. In general, we allow M j to be different for different j, reflecting a mobile system of measurement, e.g, a shipborne marine radar or moving probes. We denote the surface elevation and surface potential, reconstructed from the measurements in M j , as η m,j (x) and ψ m,j (x) with x the two-dimensional spatial coordinates, and assume that the error statistics associated with η m,j (x) and ψ m,j (x) is known a priori from the inherent properties of the measurement equipment.
In addition to the measurements, we have available a wave model that is able to simulate the evolution of the ocean surface (in particular η(x, t) and ψ(x, t)) given initial conditions. Our purpose is to incorporate measurements η m,j (x) and ψ m,j (x) into the model simulation sequentially (i.e., immediately as data become available in time) in an optimal way such that the analysis of the states η a,j (x) and ψ a,j (x) (thus the overall forecast) are most accurate.

The general EnKF-HOS coupled framework
In this study, we use HOS as the nonlinear phase-resolved wave model, coupled with the ensemble Kalman filter (EnKF) for data assimilation (DA). Figure 1 shows a schematic illustration of the proposed EnKF-HOS coupled framework. At initial time t = t 0 , measurements η m,0 (x) and ψ m,0 (x) are available, according to which we generate ensembles of perturbed measurements, η  (n) * ,j to represent η (n) * ,j (x), ψ (n) * ,j (x) with * = m, f, a for measurement, forecast, and analysis, and j = 0, 1, 2. ensemble size, following the known measurement error statistics (see details in §2.3). A forecast step is then performed, in which an ensemble of N HOS simulations are conducted, taking η (n) m,0 (x) and ψ (n) m,0 (x) as initial conditions for each ensemble member n ( §2.4), until t = t 1 when the next measurements become available. At t = t 1 , an analysis step is performed where the model forecasts η a,1 (x) as initial conditions, and the procedures are repeated for t = t 2 , t 3 , · · · until the desired forecast time t max is reached. The details of each step are introduced next in the aforementioned sections, with the addition of inflation/localization algorithm to improve the robustness of EnKF included in §2.6, and treatment of the mismatch between predictable zone and measurement region by modifying the EnKF analysis equation in §2.7. The full process is finally summarized in §2.8 with Algorithm 1.

Generation of the ensemble of perturbed measurements
As described in §2.2, ensembles of perturbed measurements are needed at t = t j , as the initial conditions of N HOS simulations for j = 0, and the input of the analysis step for j 1. We collect and denote these ensembles by where s represents the state variables of surface elevation η or surface potential ψ, and S the corresponding ensemble. This simplified notation will be used hereafter when necessary to avoid writing two separate equations for η and ψ. s (n) m,j with n = 1, 2, ..., N is the n th member of the perturbed measurements. d j denotes the number of elements in the measurement state vector of either η or ψ at t = t j . Without loss of generality, in this work, we use constant d j = d for j 1, and choose d 0 for the convenience of specifying model initial condition (see details in §3).
To generate each ensemble member s (n) m,j from measurements s m,j , we first produce η where w (n) (x) is the random noise following a zero-mean Gaussian process with spatial correlation function (Evensen 2003(Evensen , 2009) and a the de-correlation length scale, both of which practically depends on the characteristics of the measurement devices (and thus assumed known in priori). The perturbed measurement of surface potential ψ (n) m,j is reconstructed from η (n) m,j based on the linear wave theory, m,j (k) denotes the n th member of perturbed surface elevation in Fourier space, and ω(k) is the angular frequency corresponding to the vector wavenumber k.
Although the error statistics of the measurements can be fully determined by (2.3) and (2.4), it is a common practice in EnKF to compute the error covariance matrix directly from the ensemble (2.1) (in order to match the same procedure which has to be used for the forecast ensemble). For this purpose, we define an operator C applied on ensemble S (such as S m,j ) such that (2.7) Therefore, applying C on S m,j , gives the error covariance matrix of the measurements.

Nonlinear Wave Model by HOS
Given the initial condition s (n) m,j for each ensemble member n, the evolution of s (n) (x, t) is solved by integrating the surface wave equations in Zakharov form (Zakharov 1968) formulated as is the surface vertical velocity with φ(x, z, t) being the velocity potential of the flow field. In (2.9) and (2.10), we have assumed, for simplicity, that the time and mass units are chosen so that the gravitational acceleration and fluid density are unity (e.g. Dommermuth & Yue 1987).
The key procedure in HOS is to solve for φ z (x, t) given ψ(x, t) and η(x, t), formulated as a boundary value problem for φ(x, z, t). This is achieved through a pseudo-spectral method in combination with a mode-coupling approach, with details included in multiple papers such as Dommermuth & Yue (1987); Pan et al. (2018).

Data Assimilation Scheme by EnKF
Equations (2.9) and (2.10) are integrated in time for each ensemble member to provide the ensemble of forecasts at t = t j (for j 1): where L is the number of elements in the forecast state vector and s f (x, t j ) is the n th member of the ensembles of model forecast results. The error covariance matrix of the model forecast can be computed by applying the operator C on S f,j : (2.12) An analysis step is then performed, which combines the ensembles of model forecasts and perturbed measurements to produce the optimal analysis results (Carrassi et al. 2018): is the optimal Kalman gain matrix of the state (for s=η or s=ψ). G j is a linear operator, which maps a state vector from the model space to the measurement space: In the present study, G j is constructed by considering a linear interpolation (or Fourier interpolation (Grafakos 2008)) from the space of model forecast, i.e., s f,j , to the space of measurements, i.e., s (n) m,j . While we have now completed the formal introduction of the EnKF-HOS algorithm (and all steps associated with figure 1), additional procedures are needed to improve the robustness of EnKF and address the possible mismatch between the predictable zone and measurement region. These will be discussed respectively in §2.6 and §2.7, with the former leading to a (heuristic but effective) correction of S f,j and Q s,j before (2.13) and (2.14) are applied, and the latter a modification of (2.13) when the mismatch occurs.

Adaptive Inflation and Localization
With N → ∞ and exact representation of physics by (2.9) and (2.10), it is expected that (2.11) and (2.12) capture the accurate statistics of the model states and equation (2.13) provides the true optimal analysis. However, due to the finite ensemble size and the underrepresented physics in (2.9) and (2.10), errors associated with statistics computed by (2.12) may lead to sub-optimal analysis and even (classical) filter divergence (Evensen 2003(Evensen , 2009Carrassi et al. 2018). These errors have been investigated by numerous previous studies (Houtekamer et al. 2005;Lorenc 2003;Hansen 2002;Hamill & Whitaker 2005;Evensen 2003Evensen , 2009Carrassi et al. 2018), which are characterized by (i) underestimates of the ensemble variance in Q s,j and (ii) spurious correlations in Q s,j over long spatial distances. To remedy this situation, adaptive inflation and localization (respectively for error (i) and (ii)) are usually applied as common practices in EnKF to correct S f,j and Q s,j before they are used in (2.13) and (2.14).
In this work, we apply the adaptive inflation algorithm (Anderson & Anderson 1999;Anderson 2007) in our EnKF-HOS framework. Specifically, each ensemble member in (2.11) is linearly inflated before the subsequent computation, i.e., ( 2.15) where λ j 1 is referred to as the covariance inflation factor. The purpose of (2.15) is to amplify the underestimated ensemble variance in Q s,j , especially whens f,j is far from s m,j , therefore to avoid the ignorance of S m,j in (2.13) (i.e., filter divergence) due to the overconfidence in the forecast. The appropriate value of λ j can be determined at each t = t j through the adaptive inflation algorithm (Anderson 2007), which considers λ j as an additional state variable maximizing a posterior distribution p(λ j |η m,j ). The detailed algorithm is presented in Appendix A.
After obtaining the inflated Q s,j , a localization scheme is applied, which removes the spurious correlation by performing the Schur product (i.e., element-wise matrix product) between Q s,j and a local-correlation function µ (Carrassi et al. 2018) ( 2.16) with µ defined as the Gaspari-Cohn (GC) function (see Appendix B for details).

Interplay between predictable zone and measurement region
The predictable zone is a spatio-temporal zone where the wave field is computationally tractable given an observation of the field in a limited space at a specific time instant (Naaijen et al. 2014;Köllisch et al. 2018;Qi et al. 2018b). Depending on the wave travelling direction, the boundary of the spatial predictable zone moves in time with speed c max g or c min g , which are the maximum and minimum group speeds within all wave modes of interest (see figure 2 for an example of predictable zone P(t) and unpredictable zone U(t) for a uni-directional wave field starting with initial data in [0, X]).
In practice, for a forecast from t j−1 to t j , the predictable zone P(t) at t = t j only constitutes a sub-region of the computational domain (see caption of figure 2), and there is no guarantee that the measurement region M j overlaps with the predictable zone P j = P(t j ). This requires a special treatment of the region where the measurements are available but the forecast is untrustworthy, i.e., x ∈ (U j ∩ M j ), where U j = U(t j ). To address this issue, we develop a modified analysis equation which replaces (2.13) when considering the interplay among P j , U j and M j .
We consider our computational region as a subset of P j ∪ M j , so that the analysis results at all x can be determined from the prediction and/or measurements. We further partition the forecast and analysis state vectors s where * = f, a. The variables with superscript U (P) represent the part of state vectors for which x ∈ U j (x ∈ P j ), with associated number of elements L U and d U (L P and d P ) for s (n) * ,j and s (n) m,j . The modified analysis equation is formulated as (2.18b) where K P s,j = K s,j (1 : L P , 1 : d P ) and G P s,j = G s,j (1 : d P , 1 : L P ) are submatrices of K s,j and G s,j associated with x ∈ P j in both measurement and forecast (analysis) spaces, H j is a linear operator which maps a state vector from measurement space to unpredictable zone in the analysis space: R d → R L U (based on linear/Fourier interpolation).
By implementing (2.18a), S U f,j is discarded in the analysis to compute S P a,j ; and by (2.18b), S U a,j is determined only from the measurements S m,j without involving S U f,j . Therefore, the modified EnKF analysis equation provides the minimum analysis error when considering the interplay among P j , U j and M j .

Pseudo-code and computational cost
Finally, we provide a pseudo-code for the complete EnKF-HOS coupled algorithm in Algorithm 1. For ensemble forecast by HOS, the algorithm takes O(N LlogL) operations for each time step ∆t. For the analysis step at t = t j , the algorithm has a computational complexity of O(dLN ) for (2.13) and O(dL 2 ) + O(d 3 ) + O(d 2 L) for (2.14) (if Gaussian elimination is used for the inverse). Therefore, the average computational complexity for one time step is O(N LlogL) + O(dL 2 )/(τ /∆t) (for L >∼ N and L >∼ d), with τ the DA interval.

15:
Calculate S a,j with (2.13) (or (2.18) when considering P j , U j , and M j ).

16:
Outputs a,j (ensemble mean of S a,j ).

17:
S f,j = S a,j .

Numerical results
To test the performance of the EnKF-HOS algorithm, we apply it on a series of cases with both synthetic and real ocean wave fields. For the former, we use a reference HOS simulation to generate the true wave field, on top of which we superpose random errors to generate the synthetic noisy measurements. For the latter, we use real data collected by a ship-borne Doppler coherent marine radar (Lyzenga et al. 2015;Nwogu & Lyzenga 2010). The adaptive inflation and localization algorithms are only applied for the latter case, where the under-represented physics in (2.9) and (2.10) significantly influences the model statistics. The perturbed measurement ensemble (2.1) are generated with parameters c = 0.0025σ 2 η (where σ η is the standard deviation of the surface elevation field) and a = λ 0 /8 (where λ 0 is the fundamental wavelength in the computational domain) in (2.3) in all cases unless otherwise specified. We remark that these choices may not reflect the true error statistics of the radar measurement, which unfortunately has not been characterized yet. The results from EnKF-HOS simulations are compared with HOS-only simulations (both taking noisy measurements as initial conditions) to demonstrate the advantage of the new EnKF-HOS method.

Synthetic cases
We consider the synthetic cases where the true solution of a wave field (η true (x, t) and ψ true (x, t)) is generated by a single reference HOS simulation starting from the (exact) initial condition. The (noisy) measurements of surface elevation are generated by superposing random error on the true solution: where v(x) is a random field, which represents the measurement error and shares the same distribution as w (n) (see (2.3)). For simplicity in generating initial model ensemble, we use η m,0 ∈ R L in (3.1) (i.e., d 0 = L in (2.1)), and η m,j ∈ R d for j 1 with d specified in each case below. Similar to ψ (n) m,j (x), ψ m,j (x) is generated based on the linear wave theory, whereη m,j (k) denotes the measurement of surface elevation in Fourier space. Depending on how the true solution is generated, we further classify the synthetic cases into idealistic and realistic cases. In the idealistic case, the true solution is taken from an HOS simulation with periodic boundary conditions, so that the entire computational domain is predictable. In the realistic case, we consider the true solution as a patch in the boundless ocean (practically taken from a patch in a much larger domain where the HOS simulation is conducted), and the interplay between M j and P j discussed in §2.7 is critical. Correspondingly, we apply the modified EnKF analysis equation (2.18) only in the realistic cases.
To quantify the performance of EnKF-HOS and HOS-only methods, we define an error metric where A is a region of interest based on which the spatial average is performed (here we use A to represent both the region and its area), η sim (x, t) represents the simulation results obtained from EnKF-HOS (the ensemble average in this case) or the HOS-only method, and σ η is the standard deviation of, say, η true in A. It can be shown that the definition (3.3) yields (t; being the correlation coefficient between η a and η b (in this case η true and η sim ). Therefore, (t; A) = 1 corresponds to the case that all phase information is lost in the simulation.
In the following, we show results for idealistic and realistic cases of synthetic irregular wave fields. Preliminary results validating the EnKF-HOS method for Stokes waves in the idealistic setting can be found in Wang & Pan (2020).
starting from initial conditions prescribed by a realization of the JONSWAP spectrum S(ω), with a spreading function D(θ) for the 3D case (where ω and θ are the angular frequency and angle with respect to the positive x direction).
For the first case of the 2D wave field, we use an initial spectrum S(ω) with global steepness k p H s /2 = 0.11, peak wavenumber k p = 16k 0 with k 0 the fundamental wavenumber in the computational domain, and enhancement factor γ = 3.3. L = 256 grid points are used in spatial domain [0, 2π) in the (reference, EnKF-HOS and HOS-only) simulations. The noisy measurements s m,j are generated through (3.1) and (3.2), with a comparison between s m,0 (x) and s true (x, t 0 ) shown in figure 3.
Both EnKF-HOS and HOS-only simulations start from initial measurements s m,0 (x). In the EnKF-HOS method, the ensemble size is set to be N = 100, and measurements at d = 2 locations of x/(2π) = 100/256 and 170/256 are assimilated into the model with a constant DA interval τ = t j − t j−1 = T p /16, where T p = 2π/ k p from the dispersion relation.
9.02 × 10 −3  In contrast, (t; A) from EnKF-HOS simulation keeps decreasing, and becomes several orders of magnitude smaller than that from HOS-only method (and two orders of magnitude smaller than the measurement error) at the end of the simulation. For visualization of the wave fields, figure 5 shows snapshots of η true (x) and η sim (x) (with EnKF-HOS and HOS-only methods) at three time instants of t/T p = 5, 45, and 95, which indicates the much better agreement with η true (x) when DA is applied. Notably, at and after t/T p = 45, the EnKF-HOS solution is not visually distinguishable from η true (x).
The influence of the parameter c (reflecting the measurement error) on the results from both methods are summarized in Table 1. We present the critical time instants t * when (t * ; A) reaches O(1) in HOS-only method, i.e. when the simulation completely loses the phase information. As expected, all cases lose phase information for sufficiently long time, and the critical time t * decreases with the increase of c. In contrast, for EnKF-HOS method, the error (t; A) decreases with time and reaches O(10 −3 ) at t = 100T p in all cases. We further investigate the effects of EnKF parameters on the performance, including DA interval τ , the ensemble size N , and the number of DA locations d. The errors (t; A) obtained with different parameter values are plotted in figure 6 (for τ from T p /16 to  T p /2), figure 7 (for N from 40 to 100) and figure 8 (for d from 1 to 4). In the tested ranges, the performance of EnKF-HOS is generally better (i.e., faster decrease of (t; A) with increase of t) for smaller τ , larger N and larger d. In addition, for τ = T p /2 as shown in figure 6, (t; A) slowly increases with time, indicating a situation that the assimilated data is not sufficient to counteract the deviation of HOS simulation from the true solution (due to the chaotic nature of (2.9) and (2.10)). It is also found that when N = 20, the error increases rapidly leading to a filter divergence, mainly due to the insufficient ensemble size to capture the error statistics. For the 3D wave field, we use the same initial spectrum S(ω) as in the 2D case, with where β = π/6 is the spreading angle. The (reference, ENKF-HOS and HOS-only) simulations are conducted with L = 64 × 64 grid points. In EnKF-HOS method, N = 100 ensemble members are used, and data from d = 10 locations (randomly selected with uniform distribution) are assimilated with interval τ = T p /16. Figure 9 shows the error (t; A) (with A = [0, 2π) × [0, 2π)) obtained from the EnKF-HOS and HOS-only methods. Similar to the 2D case, we see that (t; A) from the EnKF-  HOS method decreases with time, and becomes several orders of magnitude smaller than that from the HOS-only method (with the latter increasing with time). A closer scrutiny for error on a snapshot can be obtained by defining a local spatial error at a time instant t: Three snapshots at e(x; t) for t/T p = 5, 50 and 95 are shown in figure 10, demonstrating the much smaller error achieved using EnKF-HOS method especially for large t, i.e., the superior performance of including DA in the simulation.

Results for realistic cases
We consider η true (x) for the realistic case taken from a sub-region R with quarter edge length of a periodic computational domain W, i.e., a patch in the ocean (see figure  11(a)). The reference simulation in W is performed with 256 × 256 grid points, with all other parameters kept the same as the 3D idealistic reference simulation. The EnKF-HOS and HOS-only simulations are conducted over R = [0, 2π) × [0, 2π) with L = 64 × 64 grid points, starting from initial noisy measurements. For j 1, We further consider a practical situation where the measurements are only available in M j = B c ∩ R, where B = {x|x > π, π/2 < y < 3π/2} (say a structure of interest located within B preventing the surrounding measurements, see figure 11(b)). We use d = 2176 (locating on every computational grid point in M j ) and an assimilation interval τ = T p /4.
In this case, the use of modified EnKF analysis equation (2.18) is critical due to the interplay among M j , P j and U j . In particular, the left, upper/lower bounds of P(t) moves (towards right, down/up) respectively with speeds c max g,x and c max g,y , i.e., the maximum group speeds in x and y directions (practically taken as group speed of mode k = (4, 1), so that the energy from longer waves is less than 1% of the total energy). After applying the modified EnKF analysis equation (2.18), which takes into consideration of M j , P j and U j (see a sketch in figure 11(b)), P j is recovered to fill in R due to the DA. In contrast, in HOS-only method, P(t) ≡ P * (t) keeps shrinking and vanishes for sufficient time. We consider two error metrics (t; R) and (t; P * (t)), which are plotted in figure 12 for both EnKF-HOS and HOS-only methods. For HOS-only simulation, (t; R) increases rapidly in time and reaches O(1) at t/T p ∼ O(3). This is resulted from the chaotic nature of (2.9) and (2.10), as well as the significant error in U(t). For EnKF-HOS simulation, (t; R) decreases with time and reaches a constant level of O(0.002) after t/T p ∼ O(3). The further reduction of the error is prohibited due to the region U(t), because (t; U(t)) has a lower bound from the measurement error. The general trend of (t; P * (t)) is similar for both methods, but the magnitude of (t; P * (t)) is smaller than that of (t; R) especially for the HOS-only method. This is due to the removal of U(t) from A in the computation of the error. To further understand the error characteristics, we plot the spatial local error e(x; t) at t/T p = 2, 5, and 8 for both methods in figure 13. While the spatial error generally increases in time for HOS-only method, e(x; t) from EnKF-HOS method is significantly smaller and exhibits heterogeneous spatial distribution. Within U j , e(x; t) is relatively high with the same order of the measurement error. In P j , e(x; t) decreases with time and becomes significantly lower than that in U j . Remarkably, this also applies to the region where measurements are not available (i.e., M c ∩ R) as the waves in this region travels from upstream locations where DA is performed. This result is of practical importance as it shows that the wave forecast at a location of interest in the ocean (say the location of an offshore structure) can be made accurate through DA in the upstream region.

Results with real radar measurements
In this section, we test the performance of EnKF-HOS method with real radar measurements of the ocean wave field. The measurements are obtained from a ship-borne X-band (9.4 GHz) Doppler coherent marine radar off the coast of southern California. We consider a patch from the radar-scanned area as our initial domain of interest R 0 , which covers a 480m×480m area resolved on a 64×64 grid (Figure 14). The starting time of computation is 2013-09-13T01 : 00 : 13Z, with a global wave steepness k p H s /2 = 0.027 from the initial radar data. In both EnKF-HOS and HOS-only simulations, we re-scale the computational domain R j to [0, 2π) × [0, 2π) and use L = 64 × 64 grid points. For  EnKF, we use d = 64 × 64, which covers the whole patch, and set the DA interval the same as radar data collection interval, which fluctuates in time around T p /4 = 2.82s. The ensemble size is set to be N = 100. For adaptive inflation, we useλ 0 = 1 in the prior distribution of λ 0 (see Appendix A) to sequentially determine λ j in (2.15), which is applied at each t = t j together with the localization (2.16).
A critical issue in this case is the movement of M j in time due to the ship speed, which results in a mismatch between M j and the computational region R j−1 at each t = t j . To address this issue, we shift the computational region from R j−1 to R j which matches M j . In EnKF-HOS method, we further partition R j into P j and U j (using the predictable zone calculated from R j−1 ) and apply the modified analysis equation (2.18) accordingly. In HOS-only method, we use Fourier periodic extension (Grafakos 2008) to obtain the wave field covering R j .
(see (3.4)) as the metric to evaluate the performance. Figure 15 plots ρ Rj (η m,j , η sim ) obtained from EnKF-HOS and HOS-only methods. While both time series starts from ρ R0 = 1 at t = t 0 , the one from HOS-only simulation quickly approaches O(0.25) within one peak period T p , indicating the (almost) complete loss of the phase information. This is much faster than any synthetic case, mainly due to the under-resolved physics in (2.9) and (2.10) with respect to the real ocean (which includes extra physical effects of current, wind, etc.). In contrast, the correlation ρ Rj (η m,j , η sim ) from EnKF-HOS remains at O(0.75), retaining the phase information for arbitrarily long time. Figure 16 further plots the snapshots of η m,j (x) and η sim (x, t) at two cross-sections of y/(2π) = 1/3 and 2/3 for both methods at three time instants of t/T p = 1, 5 and 9. It can be visually observed that the EnKF-HOS results are (on average) much closer to η m,j (x) for all cases.
We finally test the effect of parameterλ 0 on the performance of EnKF-HOS. In general, the value ofλ 0 can be considered as a control of the extent to which the inflation is applied. For larger value ofλ 0 , it is expected that the ensemble variance of (2.11) is amplified to a greater extent, and more weights are assigned to measurements when the analysis (2.18) is applied. Physically, larger values ofλ 0 may be chosen if the model is associated with significant under-represented physics (thus severely underestimates the variance in (2.11)). To elucidate this effect ofλ 0 , we test another value ofλ 0 = 2, and compare the resulting ρ Rj (η m,j , η sim ) with that fromλ 0 = 1 in figure 17. Indeed, the  Figure 17: Time series of ρ Rj (η m,j , η sim ) withλ 0 = 1.0 (---) andλ 0 = 2.0( ) result forλ 0 = 2 shows a higher correlation with the measurement, with ρ Rj above O(0.8) for all time. We note that the higher value of ρ Rj does not imply the closeness of η sim to η true , since the relation between η m,j and η true is not known in this case.

Conclusions
In this paper, we develop the ensemble-based DA capability for phase-resolved wave forecast, resulting in a new EnKF-HOS method. A unique consideration in EnKF-HOS is the treatment of the interplay between predictable and measurement zones, which is successfully accounted for through a modified analysis equation. The performance of EnKF-HOS method is extensively tested and compared to the (traditional) HOS-only method using both synthetic wave field and real radar data. In all cases, significant advantages are demonstrated by using the EnKF-HOS method, namely the dramatic reduction of forecast error and retaining the phase information for arbitrarily long time with DA of radar data. In contrast, the phase information is lost within one peak period in HOS-only method when considering the real ocean waves and using the radar data. The parameters involved in the EnKF-HOS method is carefully benchmarked, including the ensemble size, DA interval, number of DA locations and the inflation factor. The developed EnKF-HOS algorithm is intrinsically parallel and very suitable to be implemented on a graphics processing unit (GPU), a compact device that can be conveniently installed in the offshore environment. Finally, we use λ =λ d in (2.15) for inflation at t = t j , and set N (λ d , σ 2 d ) computed at time t = t j as the prior p(λ) at t = t j+1 . The computations of (A 2) and (A 3) are repeated, which completes the full algorithm to determine λ by adaptive inflation at each t j .