Stabilization of a multi-frequency open cavity flow with gradient-enriched machine learning control

We stabilize an open cavity flow experiment to 1% of its original fluctuation level. For the first time, a multi-modal feedback control is automatically learned for this configuration. The key enabler is automatic in-situ optimization of control laws with machine learning augmented by a gradient descent algorithm, named gradient-enriched machine learning control (Cornejo Maceda et al. 2021, gMLC). gMLC is shown to learn one order of magnitude faster than MLC (Duriez et al. 2017, MLC). The physical interpretation of the feedback mechanism is assisted by a novel cluster-based control law visualization for flow dynamics and corresponding actuation commands. Starting point of the control experiment are two unforced open cavity benchmark configurations: a narrow-bandwidth regime with a single dominant frequency and a mode-switching regime where two frequencies compete. The feedback control commands the DBD actuator located at the leading edge. The flow is monitored by a downstream hot-wire sensor over the trailing edge. The feedback law is optimized with respect to the monitored fluctuation level. As reference, the self-oscillations of the mixing layer are mitigated with steady actuation. Then, a feedback controller is optimized with gMLC. As expected, feedback control outperforms steady actuation by achieving both, a better amplitude reduction and a significantly smaller actuation power, about about 1% of the actuation energy required for similarly effective steady forcing. Intriguingly, optimized laws learned for one regime performs well for the other untested regime as well. The proposed control strategy can be expected to be applicable for many other shear flow experiments.

We stabilize an open cavity flow experiment to 1% of its original fluctuation level. For the first time, a multi-modal feedback control is automatically learned for this configuration. The key enabler is automatic in-situ optimization of control laws with machine learning augmented by a gradient descent algorithm, named gradient-enriched machine learning control (Cornejo Maceda et al. 2021, gMLC). gMLC is shown to learn one order of magnitude faster than MLC (Duriez et al. 2017, MLC). The physical interpretation of the feedback mechanism is assisted by a novel cluster-based control law visualization for flow dynamics and corresponding actuation commands. Starting point of the control experiment are two unforced open cavity benchmark configurations: a narrowbandwidth regime with a single dominant frequency and a mode-switching regime where two frequencies compete. The feedback control commands the DBD actuator located at the leading edge. The flow is monitored by a downstream hot-wire sensor over the trailing edge. The feedback law is optimized with respect to the monitored fluctuation level. As reference, the self-oscillations of the mixing layer are mitigated with steady actuation. Then, a feedback controller is optimized with gMLC. As expected, feedback control outperforms steady actuation by achieving both, a better amplitude reduction and a significantly smaller actuation power, about 1% of the actuation energy required for similarly effective steady forcing. Intriguingly, optimized laws learned for one regime performs well for the other untested regime as well. The proposed control strategy can be expected to be applicable for many other shear flow experiments.

Introduction
Open cavity oscillations occur in many ground and airborne transport vehicles, like wheel casings or bogeys, and significantly contribute to aerodynamic drag and noise. Active model-based control has been applied with large success to the stabilization of these oscillations Sipp et al. 2010). In this study, we aim at fast self-learning feedback which simplifies the development of control and extends the applicability to nonlinear dynamics. Encouraged by results for wake stabilization (Cornejo Maceda et al. 2021), we apply gradient-enriched machine learning control to an experiment.
Open cavity flows typically feature mono-mode and multi-frequency regimes depending on the configuration. The oscillatory dynamics gathers most of the mechanisms responsible for nonlinear turbulence interactions. Yet, the self-organization of the spatial structures is still highly coherent and driven by global instability (Huerre & Rossi 1998). Our configuration has a moderate Reynolds number (Re L ≈ 10 4 ). The length-to-depth ratio is around 1.7 and thus between a shallow and deep cavity. With increasing incoming velocity an open cavity successively features, first an intra-cavitary centrifugal instability then self-sustained oscillation of the mixing layer Basley et al. 2014;Feger et al. 2019). The dynamics of the interaction between an incoming boundary layer and a rectangular cavity depends on six parameters: the ratios of the three spatial dimensions of the cavity (in particular the length L and depth D of the cavity), the momentum boundary layer thickness θ 0 at the upstream edge, the incoming velocity U ∞ and the Mach number for compressible flows. By focussing on the two main characteristic numbers, L/D and Re L = U ∞ L/ν it is possible to scan a wide range of dynamics, from a single mode regime to spectra with rich dynamics including coupled modes (Kegerise et al. 2004). This, in addition to the practical implications, is the reason for the repeated interest in this flow pattern from pioneering work (Rossiter 1964;Gharib & Roshko 1987) to the present day.
Current studies of the cavity focus on a wide range of industry applications. In the transport field, due to engineering and manufacturing constraints, most of ground and airborne transport vehicles include cavities, e.g., wheel casings and bogeys, whose interaction with low or high-speed flows is responsible for parasitic drag and flow-induced noise. For German high-speed trains, the underbody with cavities account for 61% of the aerodynamic drag and the gaps between the wagons for another 5% (Hucho 2002). At high-speeds such as 300 km h −1 , noise is increased by more than 14 dB due to cavity fluctuations (Wang et al. 2014). Landing gear bays on passenger airplanes produce strong noise and represents up to 30% of the total noise . For low-speed transports such as cars, the airflow can excite flow oscillations in the cavity to form resonance and noise sources, resulting in body resistance and noise nuisances for the passengers (Kook et al. 1997). Hence, cavity flow control is of large engineering interest.
The control of the cavity relies on the mitigation of the mixing layer by suppressing the feedback mechanism between the vortex formation and the impinging vortex recirculation flow. The control can be achieved in a passive manner by modifying the geometry of the configuration or in an active manner by injection energy to the flow. Passive devices for control include fences, spoilers, ramps, cylinders, rods (Stanek et al. 2003;Ukeiley et al. 2004;Keirsbulck et al. 2008;Panickar & Raman 2008;El Hassan & Keirsbulck 2017). Modifications of the cavity leading edge affects the shear-layer formation (Ahuja & Mendoza 1995) and also the trailing edge to reduce the sound wave generation at the impinging point (Pereira & Sousa 1994). Porous walls have also been employed to reduce the feedback excitation near the leading and trailing edge (Wilcox Jr 1988;Stallings Jr et al. 1994). However, most passive devices imply parasitic drag during cruise.
On the other hand, active control may improve performance with low intrusion into the flow, a large frequency bandwidth and the ability to adapt to the flow response. Noteworthy examples of model-free open-loop control include the stabilization of the laminar flow with high-frequency forcing (Sipp 2012;Kreth & Alvi 2020) and pressure fluctuations mitigation for supersonic flows based on resolvent analysis (Liu et al. 2021). In contrast, most closed-loop control relies on models as a simple representation of the dynamics. For instance, Barbagallo et al. (2009) develop a Galerkin model with global modes of the flow that preserve the input-output behavior. As further example, Nagarajan et al. (2018) achieved noise reduction with a reduced-order model including the control effect. For quasi-periodic dynamics, an iterative method for weakly nonlinear model was able to completely stabilize the flow (Leclercq et al. 2019).
Feedback controllers based on linear models have also been successfully employed to mitigate the oscillations of the flow (Illingworth et al. 2012) and noise suppression . Finally, we note one of the very first and remarkable closed-loop control studies by Gharib et al. (1985) on an open cavity in a water canal. We refer to (Cattafesta III et al. 2008) for a review on past successes of active flow control on the cavity. A well-known effect of linear control is the shift of the oscillations of the cavity to other Rossiter modes (Cabell et al. 2002;Williams et al. 2000) resulting in multi-frequency regimes. Mode-switching regimes present a challenge for control design as it needs to include large bandwidths and an adequate time response (Samimy et al. 2007b). Linear closed-loop control on an experimental cavity for multi-frequency control has been achieved by augmenting the controller with well-placed zeros (Yan et al. 2006). Samimy et al. (2007b) manage to control multiple frequencies by incorporating several models in linear quadratic optimal controllers.
Building a control-oriented model is often limited due to the nonlinearities of the flow including frequency crosstalk and time delays between the actuation and sensing. Therefore, we choose model-free approaches based on machine learning to achieve multimodal control. Machine learning control (Duriez et al. 2017, MLC) based on genetic programming (Dracopoulos 1997) is employed to build feedback control laws mapping the outputs of the system (sensor signals) to its inputs (actuation commands). MLC is a function optimizer able to optimize both the structure of the control law and its parameters. In an evolutionary process, new mechanisms (exploration) are found and are improved (exploitation). MLC has been successfully applied in dozens of experiments, each time outperforming optimized control methods often by exploiting unexpected nonlinear mechanisms (Noack 2019). MLC achievements include drag reduction of the Ahmed body with and without yaw angle (Li et al. 2019(Li et al. , 2018, jet mixing enhancement (Zhou et al. 2020) and mixing layer control (Parezanović et al. 2016), separation control of a turbulent boundary layer (Debien et al. 2016), recirculation zone reduction behind a backward facing step (Gautier et al. 2015), reduction of vortex-induced vibration of a cylinder (Ren et al. 2019 and pitch control for floating off-shore wind turbines (Kane 2020). Recently, MLC has been augmented with intermediate gradient descend steps for a fast descend into the minima (Cornejo Maceda et al. 2021, gMLC).
This study constitutes, to the best of the authors' knowledge, the first self-learning model-free control for the stabilization of open cavity flows. We employ our fastest optimizer, gMLC, to address the challenge of robust multi-frequency stabilization. For this, feedback control laws are learned in two regimes: a narrow-bandwidth one and a mode-switching one. The second regime constitutes a challenging problem as gMLC needs to learn a control law able to control two modes simultaneously. The robustness of the laws is tested by cross-evaluating each law in the other regime.
The manuscript is organized as follows. § 2 introduces the cavity experiment setup including the wind tunnel, sensing, actuation and details the characteristics of the unforced dynamics. § 3 describes the control problem, including the cost function and the ansatz for the control law, and outlines gradient-enriched machine learning control. Moreover, two methods to interpret the control mechanisms are presented: an analytical approximation based on an affine regression and a cluster-based visualization method based on representative flow states. In § 4, the results of the control of the open cavity are described, from steady forcing as a benchmark to gMLC feedback. § 5 discusses on the robustness of the gMLC laws, highlights the necessity for feedback and comments on

The open cavity experiment
This section details the characteristics of the wind tunnel, the means of sensing and actuation, the control unit and finally the unforced dynamics for the two regimes studied in this manuscript: the narrow-bandwidth regime and the mode-switching regime.

Wind tunnel set-up
The cavity is inserted into the rectangular cross-section duct of a 0.075 m high and 0.30 m wide wind tunnel. The cavity, inserted in depression to the floor, is D = 0.05 m deep, S = 0.30 m wide and L = 0.075 m or L = 0.0875 m long following the studied regime. The resulting aspect ratio are R = L/D = 1.5 for the narrow-bandwidth regime and R = L/D = 1.75 for the mode-switching regime. A schematic of the wind tunnel is depicted in figure 1; The walls are made of anti-reflection treated glass. A Blasius-type boundary layer develops from an elliptical edge located 0.30 m upstream. Laser Doppler Velocimetry (LDV) measurements of the velocity upstream the cavity show that the standard deviation of the incoming flow is less than 1%.
An anemometer is located at the exit of the open wind tunnel vein. Measurements show that the free-stream velocity U ∞ and the velocity measured at the exit of the tunnel vein are linearly related to the rotation speed of the wind tunnel fan motor. Thus in this study, U ∞ is estimated from the anemometer measurements. For the narrow-bandwidth regime, the incoming velocity is set to U ∞ = 2.13 m s −1 , resulting on a Reynolds number equal to Re L = 1.04 × 10 4 . The momentum boundary layer thickness is estimated at θ 0 /L = 1.17 × 10 −2 . Great care has been taken to calibrate and regulate the incoming velocity with reference to LDV measurements. However, it has been observed a 2% variation of the incoming velocity for the narrow-bandwidth regime over the 24 hours necessary for the longest learning sessions. The velocity variations are caused by the temperature variation, T ≈ 23.14 ±2°C, and very low cycle frequencies in the wind tunnel at this low velocity operating point. The incoming velocity variations reach 5% for the modeswitching regime. Finally, the flow is in incompressible range with a Mach number less than 10 −2 . A more detailed description of the set-up can be found in Lusseyran et al. (2008); Basley et al. (2013).

Hot-wire sensor
For sensing, we use a constant temperature anemometer (DANTEC hot-wire probe 55P16 and miniCTA54T30 converter) with a single 1D hot-wire sensor, 5 µm in diameter Figure 1: Diagram of the cavity with the position of the DBD actuator (in red) and the velocity sensor (in green). The magnified region depicts the velocity profiles of the incoming velocity (U (y) in blue) and the ionic wind produced by the DBD actuator (U p (y) in red). and 1 mm length. The hot-wire is located at 6 mm above the cavity and 6 mm upstream of the trailing edge, as sketched in green in figure 1 and figure 2b. The position of the hot-wire sensor has also be chosen to limit the velocity drops in the mode-switching regime, see § 2.5. The hot-wire output signal from E w (t) is converted into streamwise velocity information u according to King's law: where A = 1.28, B = 0.70 and n = 0.48 are determined by calibration of the hot-wire using an LDV anemometer. Before conversion, the signal E w is temperature-corrected by the multiplicative factor (T w − T 0 )/(T w − T ), where T is the room temperature, T 0 is the calibration temperature and T w is the wire temperature (Jørgensen 2005). T and T 0 are both measured with a P t100 platinum sensor with 0 02°C accuracy. The velocity measured u is then employed in three ways: first, it serves to compute the performance of the tested controllers ( § 3.1); second, it closes the feedback control loop ( § 3.2); third, it is used to analyze the control mechanisms ( § 3.4). All the following spectra and spectrograms are computed from this velocity measurement.

Plasma actuator
The actuation is carried out with a dielectric barrier discharge (DBD) actuator to locally force the boundary layer at the entrance of the cavity, near the separation edge where the receptivity of the shear layer is maximum (Cattafesta III et al. 1997) (see figure 1). The DBD consists of two conductive blades placed on either side of an insulating plate and subjected to a high alternating voltage. The streamwise shift between the two blades (see figure 2a) creates an electric field parallel to the plate and responsible for an ionic wind in the streamwise direction. The principle and the adjustment of the parameters for an application as a fluid actuator are thoroughly detailed in Moreau (2007); Forte et al. (2007); Benard et al. (2010). In our experimental setup, the dielectric is made of 2 mm-thick acrylic glass (PMMA) and the electrodes are made of 9 mm-wide, 26 cm-long and 200 µm-thick copper ribbons. The downstream edge of the lower electrode is placed at x = 4 mm upstream to the leading edge, see figure 2a.
To produce an ionic wind, a carrying signal E(t) at high frequency f p (≈ 3 kHz) is sent to the active electrode. The signal E(t) is produced by an agilent function generator and amplified (×3000) by a Trek high-voltage amplifier. The expression of the carrying  signal is: with A being the amplitude of the carrying signal. The control is then achieved by modulation of the amplitude A through the actuation command b ∈ [−1, 1]. In practice, A is an affine function of b such as A| b=−1 = A min and A| b=1 = A max . A min is the ionization voltage; It is the threshold above which an ionic wind is produced. The generated wind acts then as a localized body force whose intensity increases with the voltage and thus with b. The increasing level of the body force results in the reduction of the main peak of the power spectrum until the dynamics are completely modified. A steady actuation forcing study of the open cavity flow is reported in § 4.1. A max is defined as the maximum voltage that keeps the main resonance of the cavity still present in the power spectrum. In practice, A min and A max are measured before each experiment as they are sensible to the atmospheric pressure, room temperature, moisture and number of hours of use of the electrode. To make the control robust against these variations, the range of the actuation command b is set independent of A min and A max . Forte et al. (2007); Moreau (2007) describe the typical velocity profile generated by a DBD actuator with LDV measurements. In particular, Forte et al. (2007) show that for a voltage of 20 kV and a carrying frequency of 1 kHz applied between 0.1 mm-thick, 20 cmlong aluminum electrodes, the velocity profile displays a maximum at y = 0.5 mm from the wall. As the velocity profile moves downstream, the value of the maximum velocity decreases and its height increases up to ∼ 1 mm. For our experiment, Pitot measurements indicate that for a tension equal to 6 kV the maximum velocity is around 0.8 m s −1 and is reached at y = 1.25 mm of the wall. Unfortunately, the tension value is not significative as A min and A max have changed by a factor between two experiments.

Control unit
In our experiment, the signal acquisitions and actuation command are carried out by a dSPACE real-time controller, including a DS1600 4 cores processors board and a DS2201 I/O board with a 12 bits on ±10 V range analog-to-digital converter. Only two inputs of the I/O board are exploited, one for the hot-wire signal and one for the voltage delivered by the P t100 platinum sensor. The hot-wire signal E w is translated and amplified (×40) before analog-to-digital conversion. All signals are sampled at 250 Hz such as the Nyquist-Shannon theorem is respected up to three times the highest frequency of interest f + ≈ 40 Hz, also avoiding aliasing of the second harmonics. One output of the I/O board is employed to send the command signal to the Agilent Function Generator.
The control optimization process includes two loops: a fast evaluation loop and a slow learning loop, see figure 9. The fast evaluation loop is managed by the ControlDesk software and Simulink. For our study, the evaluation loop operates at the sampling frequency (250 Hz). For each control law tested, the time series of the actuation command and the hot-wire signal are recorded and post-processed with MATLAB. As for the slow learning loop, it includes the post-processing of the control and the control law update; It is automated with Python and MATLAB scripts. Finally, the whole control unit is supervised by a PowerShell script that automates all the steps of the control optimization.

Unforced dynamics
As described in § 1, the cavity allows a wide range of complex intra-cavity dynamics by tuning the two remaining cavity flow parameters namely the upstream speed U ∞ and the width L. We recall that the width S and the depth D of the cavity are fixed throughout this study and that the flow is incompressible (Mach number < 10 −2 ). In this manuscript, we aim to stabilize two different flow regimes of different dynamical complexity. For both regimes, the power spectrum is mainly organized within 5 frequency bands: the very low frequencies, not considered here, a low frequency f b and the 3 peaks directly reflecting the resonance of the mixing layer f − , f a and f + , see figure 3. These frequencies are nonlinearly coupled and satisfy the relationships f − = f a − f b and f + = f a + f b . The two regimes studied differs by the power ratios of the frequencies f a and f + .
The first regime is referred as the narrow-bandwidth regime and corresponds to a flow dynamics mainly centered on a single frequency f a and its harmonics. This regime is achieved with L = 7.50 cm and with an incoming velocity of U ∞ = 2.13 m s −1 . For this case, the ratio of the powers associated to f + and f a is close to 10 −3 , see figure 3a.
The coupling between f a and f b is then insignificant. In contrast, for the second regime, referred as mode-switching regime, the power ratio between f + and f a is greater than 0.22, see figure 3b. In this case, the nonlinear couplings between frequencies are strong and this leads to a chaotic intermittency between f a and f + (Lusseyran et al. 2008). In the mode-switching regime, two modes compete in the flow leading to a switch of the dominant the frequency. Such intermittency has been mentionned for the first time by Kegerise et al. (2004) for a compressible cavity flow. In this study, the mode-switching regime is obtained in incompressible conditions (Ma < 0.01) for L = 8.75 cm and for a slightly higher incoming velocity U ∞ = 2.23 m s −1 , corresponding to a Reynolds number Re L = 1.28 × 10 −4 . The momentum boundary layer thickness is estimated at θ 0 /L = 9.72 × 10 −3 . All the cavity flow parameters and experiment conditions are grouped in table 2.
The episodic velocity drops, observed in the time series of the mode-switching regime (figure 3b), are due to slow vertical undulations of the mixing layer which bring the low velocities of the lower part of the mixing layer to the level of the measurement point. The position of the hot-wire sensor has been chosen to minimize these low-velocity incursions while limiting the damping of the oscillations to be controlled. The undulations of the mixing layer are stronger for the mode-switching regime, and the incursions could not be avoided.
The temporal evolution of the frequency content for the two regimes is depicted in figure 4. In particular, figure 4a shows a clear line for the frequency f a and less intense lines for its harmonics, whereas figure 4b displays a switching between frequencies f a and f + and their harmonics over the course of time. The time between two switches is estimated between 15 s and 20 s; Exceptionally, this time may exceed 40 s.
To understand the difference in dynamics between the two regimes, we locate them in the Strouhal versus L/θ 0 map (figure 5). Similar maps have been plotted for different impinging shear flows revealing jumps between the modes and linear-like relationships    between the Strouhal number and the dimensionless cavity length L/θ 0 or L/δ 0 , δ 0 being the boundary layer thickness (Sarohia 1977;Rockwell & Naudascher 1978;Knisely & Rockwell 1982). Indeed, Basley et al. (2013) shows that in such incompressible flow, most main frequencies measured in the downstream shear layer align with lines of locked-on modes such that the Strouhal number based on L is given by where the parameter n = 1, 2, 3 can be seen as the number of cycles within the cavity length and the corrective term, γ n , can be interpreted as a wave adaptation to the effective resonance length. The authors also propose a model for γ n , linear with respect to the  On the other hand, it is noted that equation (2.3) presents a resemblance with Rossiter's formula for compressible flows in Rossiter (1964) where the corrective term is associated with the propagation time of the acoustic waves. The Strouhal distribution is well described by Basley et al. (2013), however it is worth noting that there is still no consensual overview for the origin of the incommensurable frequencies in incompressible open cavity flows. As a first interpretation, the peaks in the spectrum are the result of nonlinear interactions inside the mixing layer dynamics. Indeed, the resonance of the cavity occur for regimes beyond a critical Reynolds number Re c contrary to the Kelvin-Helmholtz instability of the mixing layer that is unstable for all Reynolds numbers. Beyond Re c , self-sustained oscillations appear. This scenario coming from a 2D perspective is however, a little simplistic when considering a real 3D cavity, especially in an incompressible regime at moderate Reynolds number. In fact, the mixing layer develops in the streamwise direction and the flow not being strictly parallel, Squire's theorem fails: It is the transverse centrifugal instabilities that transit first, possibly several times, below Re c . Depending on the values of the aspect ratio, these Görtler-Taylor type instabilities have already reached, several bifurcations then a strong non-linear development when the resonant transition appears. This description  corresponds to the two regimes chosen to test our control methodology and leads to the spectral signature described previously and in appendix A.
In addition, Sipp & Lebedev (2007); Meliga (2017) have shown, using a linearization around the average flow in 2D simulations, that the occurrence of self-sustaining instabilities in shear-driven cavities are due to a supercritical Hopf bifurcation. Following a similar method, (Bengana et al. 2019) and (Tuerke et al. 2015) manage to predict two incommensurable frequencies in simulations and experiments respectively. Another approach based on the identification of two characteristic delay times is able to predict the two frequencies of the flow (Tuerke et al. 2020). The authors show that the nonlinear interactions between these two frequencies can be captured by the resolution of a Stuart-Landau type amplitude equation, whose quadratic damping term consists of two delayed amplitude terms. In this equation, the first delay time characterizes the upstream traveling hydrodynamic instability wave and hence the feedback of the reflected shear layer instability (Tuerke et al. 2015). The second delay time is motivated by the hydrodynamic feedback of the recirculating vortices, also referred as "vortex carousel" and corresponds to an intra-cavity overturning time (Tuerke et al. 2017). In the following the description of Basley et al. (2013) that leads to figure 5 is sufficient to guide the choice of parameters leading to the two regimes we have chosen to control. Figure 5, plots the values of the measured frequencies in Strouhal number versus the dimensionless cavity length, for the two regime and their relation to the resonance points. The details of computation of the momentum boundary layer thickness θ 0 are detailed below. For the first regime, f a is close to Strouhal number equals to n/2| n=2 i.e. at the intersection between the black line and the dashed line, while f + is clearly below the resonance at n = 3. As for the second regime, the Strouhal number corresponding to f a is above the resonant mode n = 2 (γ 2 = −0.14) and the one corresponding to f + is below the resonant mode n = 3 (γ 3 = +0.14). In practice, U ∞ has been chosen such as the average presence rate of the two frequencies f a and f + is equalized. The fact that |γ 2 | ≈ |γ 3 | appears only after calculation, shows clearly that the parameter guiding the relative intensity of the two main modes is indeed |γ n |. The values of Strouhal number and γ n for each frequency are grouped in table 3.
In fact, we observe a slight discrepancy between the natural frequencies measured and the predictions of Basley et al. (2013), which we attribute to a change in the free development of the boundary layer and especially a reduction of the boundary layer thickness. This reduction of the boundary layer thickness can be attributed to the planing effect of the 200 µm thick upper electrode, glued just before the leading edge. Therefore, in this work, L/θ 0 was not obtained from the Blasius law (θ 0 = κ 2νl x /U ∞ with κ = 0.4696, l x = 0.3 m) and ν the kinematic viscosity, nor by a direct measurement of θ 0 , for lack of optical access, but deduced from equations (2.3) and (2.4), using the observed frequency (figure 3) and the regime parameters (table 2) for the two considered regimes. First, the value of γ n is computed from f n , n, L and U ∞ and equation (2.3), then L/θ 0 is deduced from equation (2.4). The resulting dimensionless cavity lengths are L/θ 0 = 85.83 for the narrow-bandwidth regime and L/θ 0 = 102.90 for the mode-switching regime.
Finally, we have investigated the deviation obtained with the Blasius law. From the values of L/θ 0 and assuming the same expression as the Blasius law, we fit the corresponding kappa for our cases: for the narrow-bandwidth regime κ = 0.4209 and for the mode-switching regime κ = 0.4205. Both values are close to the value of the Blasius law (κ = 0.4696) but slightly lower, which comforts the hypothesis of boundary layer thinning by the presence of the DBD electrode.
The low (f b ) and very low frequencies (f < 1 Hz) constitute a challenge for automatic learning as their more rare occurrences require longer time windows for converged statistics and thus slows down the overall learning process. First, we have chosen to alleviate this difficulty by controlling the narrow-bandwidth regime where the very low frequencies (f < 1 Hz) are around two order of magnitude lower than f a in terms of power. Then, we fully embrace the effect of the low frequencies with the mode-switching regime where the nonlinear interactions between f a and f + give rise to f b and especially the very low frequencies f < 1 Hz: f b is caused by the triadic interaction between f a and f + and the low frequencies (f < 1 Hz) are responsible for the frequency switches in the mode-switching regime. Indeed, the power associated to the very low frequencies (< 1 Hz) is more than one order of magnitude greater for the mode-switching regime than for the narrow-bandwidth regime, see figures 3a and 3b. The control of the low frequencies is then performed indirectly by controlling the two other frequencies f a and f + . Moreover, following Basley et al. (2014), the energetic contribution of the very low frequencies is also due to the coupling between the mixing layer instability and the centrifugal instabilities originating in the span-wise direction within the cavity.
To conclude this description of the cavity dynamics, we recall that the goal we set for the control is to reduce the oscillation of the mixing layer by penalizing the peaks of power in the frequency range that includes f − , f a and f + as indicated by the green shaded area of the figure 3.

Control problem formulation and methodology
In this section, the control problem is defined and the methodology to solve it and to analyze the solutions is described. In § 3.1, the control problem is reformulated as an optimization problem. Such a problem is, in the most general case, non-convex and contains several minima a priori. To solve such an intricate problem, we employ a powerful machine learning algorithm § 3.3, the gradient-enriched machine learning control (Cornejo Maceda et al. 2021, gMLC), that combines exploration to discover new minima and exploitation for a fast convergence. Finally, two methods for describing the control mechanisms involved are presented: one based on linear regression and the second on the reconstruction of the phase space with clustering ( § 3.4).

Cost function and optimization problem
The aim of this study is the stabilization of the open cavity flow in two regimes of different complexity, in particular, the mitigation of the self-sustaining oscillations of the mixing layer. For this, a cost function is built based on the velocity data provided by the hot-wire downstream. The oscillations of the mixing layer are reflected in the oscillations of the velocity signal, thus the goal translates into the reduction of the highest peak of the associated power spectrum. Moreover, the power invested and the power saved by the control must be balanced. In that respect, two terms are considered in the cost function to optimize: The term J a accounts for the peak reduction and J b for the actuation power invested. J a is defined as the value of the power spectral density maximum in a given frequency window. The value is normalized by the value for the unforced case. Hence, the performance of control law K is given by: where PSD(u) is the power spectral density of the velocity u measured by the hot-wire for the flow forced with the control law K and u 0 is the velocity measured for the unforced flow. A steady actuation forcing study ( § 4.1) shows that the actuation affects both f a and f + so the detection window for the maximum of the PSD is set such as it comprises both f a and f + : Only the frequencies f a and f + are considered as they are the leading modes of the dynamics; The remaining high-power frequencies (2f a , 3f a in figure 3) are harmonics of the fundamental, i.e., slaved to f a . The detection window is set in Strouhal such as it is independent of the studied regime. The normalization of the cost function J a by the value of the peak for the unforced flow allows us to have a direct measure of the reduction of the peak. The PSD is computed over T ev = 40 s. This choice is motivated for three reasons: First, it allows a good convergence of the statistics; Second, the time is short enough to evaluate 1000 individuals in few hours of experiment, limiting potential drifts and staying close to real-life applications with limited testing budget; Third, the mode-switching regime may include one or two switches during this period of time which is enough to have a record of both frequencies f a and f + in the spectrum. Hence, the evaluation time balances practicality and good characterization of the flow dynamics. Anticipating on the results, the value chosen for T ev happened to be enough for the control of the two main frequencies in the mode-switching regime. The control of the mode switching is realized indirectly by the control of the two frequencies involved f a and f + . It's worth noting that a direct control of the intermittency requires a much longer evaluation time due to its very low frequencies.
The actuation penalization term J b is estimated from the actuation command b ∈ [−1, 1], as the effective power supplied is not directly accessible in the experiment. J b is based on the square of the actuation command averaged over T ev so that it is an analogue to energy. To simplify the interpretation, J b is normalized by the range of the actuation, so that J b = 0 when there is no actuation (A = A min ) and J b = 1 when the controller acts steadily at maximum level (A = A max ). Therefore, with . denoting the mean value over T ev = 40 s. The choice of the penalization parameter γ is based on the open-loop steady forcing study presented in § 4.1. We show, in particular, that a high level steady actuation is enough to reduce the cost J a by at least 90%. The penalization parameter γ is chosen such as the cost for the unforced flow (J 0 = 1) is similar to the cost of the high level steady actuation (b =1), thus the optimal solution aimed needs to efficiently reduce J a with minimal actuation power. As both cost function components J a and J b are normalized, we choose then the penalization parameter to be γ = 1. This choice results in setting the cost of the high level steady actuation (b =1) to Finally, the normalized standard deviation σ of the velocity signal is computed for the best control laws, a posteriori, to characterize the controlled flow. Indeed, an effective mitigation of the self-sustained oscillations of the mixing layer results in a reduction of the standard deviation, defined as such: with σ(u) being the standard deviation of the velocity u computed over T ev = 40 s. T ev is also chosen such as the standard deviation is sufficiently converged. The standard deviation is normalized by the standard deviation of the natural unforced flow so to have a direct measure of the gain.

Control problem
As stated previously, the control objective is to stabilize the cavity flow by mitigating the oscillations of the mixing layer downstream. To achieve this goal, the flow is forced with a DBD actuator located at the cavity leading edge. The result of the action is an unsteady body force whose intensity is commanded by the input signal b at the terminals of the DBD actuator. b, also referred as the actuation command, is determined by the control law K. The control may be open-loop or closed-loop with flow information input. In this study, the considered open-loop actuations are only steady forcing and closed-loop control includes the unique velocity sensor and time-delayed records. Thus, the control law reads: with a being the feature vector comprising flow state information. Then, the control problem to solve can be reformulated as an optimization problem where the goal is to derive the optimal control law K * that minimizes the cost function J.
with K : A → B being the space of all possible control laws. A is the input domain and B is the output range for the actuation command. Deriving the optimal control law K * without any a priori on the cost function J is a challenging non-convex optimization problem presenting presumably several minima.

Gradient-enriched machine learning control
In this section, we present the gradient-enriched machine learning control (gMLC) algorithm (Cornejo Maceda et al. 2021) employed to solve the optimization problem (3.6). Gradient-enriched MLC is an iterative function optimizer to derive control laws directly from the plant. The method is based on machine learning control (MLC) (Duriez et al. 2017) and is augmented with downhill simplex steps to accelerate the learning. MLC has already been employed to control dozens of experiments outperforming previous control laws with unexpected frequency crosstalk (Noack 2019). The choice of downhill simplex algorithm relies on its fast convergence and its easy implementation as it does not require an analytical expression of the cost function but only its evaluation. In the past, downhill simplex has been successful in deriving an adaptive closed-loop control for lift-to-drag ratio optimization over a NACA 0025 airfoil (Tian et al. 2006;Cattafesta III et al. 2009), and reducing the net drag power of the fluidic pinball and a slanted Ahmed body (Li et al. 2022). In (Cornejo Maceda et al. 2021), the gradient-enriched MLC is introduced and employed to successfully stabilize the fluidic pinball. The authors show, in particular, that gMLC outperforms MLC, managing to derive better performing control laws with a greater learning speed. It is now applied for the first time on an experiment. Anticipating on the results (see § A), the superiority of gMLC over MLC is also verified for the control of the cavity in experimental conditions. The benefits of gMLC compared to MLC comes from the combination of stochastic optimization for exploration of the search space and deterministic optimization for a fast convergence towards the minimum. The methods consists on the generation of candidate solutions to equation (3.6), evaluate them and systematically recombine stochastically and deterministically the best ones to improve their performances.
Starting point of gMLC is MLC based on linear genetic programming (Brameier & Banzhaf 2006, LGP). Following the genetic programming terminology, the candidate solutions are also referred as individuals. Like the MLC method, gMLC makes no assumptions on the structure of the relationship between the inputs and the outputs. The optimal solution needs, however, to be computable, meaning it can be expressed by a finite number of mathematical operations with finite memory. Indeed, the candidate solutions are internally represented by matrices inherited from linear genetic programming. Each matrix resembles a computer program that unequivocally codes a control law. Each line of the matrix is an instruction pointing to basic operations (+, −, ×, ÷, cos, sin, tanh, etc.) and registers containing constant random numbers and variables (a 1 , a 2 , a 3 , etc). The N inst lines of the matrix are then read linearly yielding the control commands as outputs of the first registers. We refer to (Li et al. 2019) for more information on the internal representation of the control laws.
The gMLC algorithm starts with a broad exploration of the control law space with a Monte Carlo sampling (MCS) phase. The Monte Carlo sampling generates N MCS random matrices that represent the first set of individuals. The individuals are evaluated and added to the database of all individuals. Then, the algorithm alternates between exploration phases carried out by genetic programming and exploitation phases performed by downhill simplex iterations until a stopping criterion is reached. The role of exploration is to locate new and better minima in the space of control laws with a stochastic recombination of the best performing individuals. The stochastic recombination is achieved with the genetic operations crossover and mutation. This exploration is much like the evolution phase in the LGP method, however, in the case of gMLC, the concept of population that evolves through generations is generalized by considering all the individuals evaluated so far and stored in the database. Thus, during the exploration phase, new individuals are generated by recombining the best among all the previously evaluated individuals. This assures that no crucial information is irretrievably lost. The best individuals to be recombined are selected following their cost function. The selection is carried out by the tournament method with a tournament size equal to 7 for 100 individuals following Duriez et al. (2017) recommendation. The tournament size is scaled with the number of individuals in the database in order to keep a 7 for 100 ratio. N p new individuals are built at each exploration phase by recombining the best individuals of the database.
Each exploration phase is followed by an exploitation phase. This step exploits the local gradient information to slide down towards the neighboring minimum. This is carried out with a variant of downhill simplex for infinite-dimensional spaces introduced by Rowan (1990) and referred to as downhill subplex. In the following, we do not differentiate between the downhill simplex and subplex as the algorithms steps are similar and only applied to different spaces. The principle of downhill simplex is to linearly combine the N sub best-performing control laws following the gradient of the cost space to derive more performing individuals. Contrary to the exploration phase, the new individuals are built in a deterministic way. First, the N sub best individuals are selected to describe a simplex that lives in the subspace generated by the N sub best individuals. The simplex, then, crawls in the subspace according to geometric operations (reflection, expansion, contraction and shrink) following the local gradients. Each geometric operation yields one or several new individuals that are linear combinations of the original N sub individuals. After each downhill simplex iteration, the simplex is updated by replacing the least performing individuals. The downhill simplex steps are iterated until at least N p individuals are generated. The newly generated individuals are then added to the database of all individuals. We emphasize that all the new individuals belong to the subspace defined by the original N sub individuals. If the stopping criterion is reached, the algorithm returns the best-performing control law. Otherwise, a new iteration of exploration and exploitation is carried out. The stopping criterion may be a performance threshold or a total number of evaluations when the testing budget is limited.
We note the critical intermediate phase of reconstruction between each exploitation and exploration iterations. Indeed, the new individuals generated by the downhill simplex are linear combinations of individuals without a matrix representation, which is essential for the genetic recombination during the exploration phase. Thus, a matrix reconstruction is performed for each linearly combined individual by solving a secondary optimization problem. The goal is to derive a matrix which translates into a control law that has the same response as the linearly combined one. Such a problem is similar to a surface fitting problem, which we solve with linear genetic programming. The reconstruction phase builds a matrix representation for the linearly combined individuals in such a way that they can be recombined with genetic operators. For more information on the gMLC algorithm, we refer the readers to Cornejo Maceda et al. (2021). Figure 6, schematically illustrates the different phases of the gMLC algorithm and the learning principle in the control law space. The MATLAB implementation of gMLC employed for this study is freely available at https://github.com/gycm134/gMLC.

Control law investigation
In this section, we propose two methodologies to analyze the actuation mechanisms of optimized control laws. Firstly, an analytical approximation of the control is performed with an affine mapping between the inputs (components of a) and the actuation command (b). Such mapping aims to reveal the most relevant component of the feature vector. The affine approximation K of the control law K reads: where k i are gains determined by linear regression between the time series a and b recorded during the experiment. The quality of the fitting is measured by the coefficient of determination R 2 , measuring the relative reduction of the residual variance. The closer R 2 is to 1, the better K fits the original control law K. Secondly, we propose a visualization of the control laws based on the clustering of the feature vector a to reconstruct the phase portrait. Cluster-based methods have been successful in reproducing key characteristics of fluid flow dynamics such as temporal evolution and fluctuation levels Li et al. 2021). For this analysis, all the states of the feature vector are grouped in 10 clusters to reconstruct the dynamics. The cluster centroids, c k , are defined as the average state of all the states in a given cluster. Clustering is performed with the k-means algorithm (Lloyd 1982) and the metric employed is the one induced by the L 2 norm. The dynamics of the feature vector are then encapsulated in a probability transition matrix where its elements p ij are the transition probabilities from cluster i to cluster j. The probability p ij is defined as p ij = n ij /n i with n ij being the number of states transitioning from cluster i to cluster j and n i the total number of states in cluster i. Then all feature vector states and centroids are projected on a two-dimensional space with classical multidimensional scaling (Kaiser et al. 2017;Li et al. 2022, MDS). MDS is dimensional reduction method that consists on extracting the two main features of the flow (γ 1 and γ 2 ) by applying a proper orthogonal decomposition on the distance matrix of the feature vector a. The vectors γ 1 and γ 2 spawns a two-dimensional space where all the data is projected. It is the optimal projection, in the L 2 norm sense, that preserves the distances between the states. Such representation is referred as a proximity map.
Adding the probability transitions to the proximity map allows to build a network model reproducing the phase portrait. The centroids constitute representative states of the flow where the system transitions ergodically, meaning that from any centroid one can reach any other centroid. Finally, the mean forcing level is computed for each cluster and associated to their corresponding centroid. Such representation allows to visually partition the states of strong or low forcing and to reveal actuation mechanisms. Figure 7 summarizes the two approximation methodologies employed.
The latter data-driven methodology for control visualization is expected to aid the human interpretability of machine-learned controls. In this study, the methodology is employed to analyze a single-input single-output system, however we believe that the methodology will be beneficial for the analysis of more complex control systems including a high number of inputs and outputs. Figure 7: Control law investigation methodology for the cavity control. A feature vector a is built from a direct measurement of the flow u. A feature vector of dimension three is depicted for simplicity. The elements of the feature vector are grouped in clusters. For clarity, only three clusters and their centroids (1,2,3) are displayed. The proximity map is defined by γ 1 and γ 2 the two flow features extracted with classical multidimensional scaling. For the transition matrix, darker squares symbolize higher transition probabilities. In the control network model, the actuation magnitude are represented by rectangles; yellow denote the actuation range, red (blue) for a positive (negative) actuation with respect to the mean value.

Control results
In this section, we stabilize the open cavity flow in two regimes: the narrow-bandwidth regime ( § 4.3) and the mode-switching regime ( § 4.4) presented in § 2.5. We recall that the control objective is to mitigate the self-sustaining oscillations of the mixing layer. First, in ( § 4.1) we reduce the main oscillations with steady forcing at increasingly actuation level. Then, we employ gradient-enriched machine learning control to optimize feedback control laws. § 4.2 details the parameters employed for the control law optimization and § 4.3 and § 4.4 present the results for the control of the narrow-bandwidth regime and the mode-switching regime respectively.

Open-loop steady forcing
In this section, the response of the flow to steady actuations is described. For this study, the amplitude of the carrying signal is set to constant values. The flow is excited with 23 levels of actuation equally distant from the ionization level (A = A min = 6.9 kV) to A = A max = 12 kV. Figure 8 presents the velocity power spectra for the two regimes. For the narrow-bandwidth regime, figure 8a shows that the second peak f + rises and the first peak f a decreases as the actuation level increases. When the actuation is too strong, the noise level increases and the two peaks are at the same level. The maximum peak reduction is achieved for A = A max with a cost reduction of 97%. The associated standard deviation slightly decreases to σ = 96%.
For the mode-switching regime, figure 8b shows that a quite strong actuation level is needed to reduce the amplitude of the peaks associated with f a and f + , though the broadband noise level also increases. The maximum cost reduction is achieved for the nearly maximum actuation level A = 11.4 kV (88% of A max ) and the maximum peak power decreases by 90%. Also the standard deviation slightly decreases to σ = 95%. We note that for the third spectrum starting from the bottom (V = 1.2 kV), the incidental absence of mode switching during the measurement led to a lower f + peak.
This open-loop analysis shows that a strong steady actuation is able to reduce the fluctuations of the shear layer. However, in both cases the background noise level increases. Therefore, in the following, in order to exclude power demanding controllers, we consider the cost function described in § 3.1 that includes two terms: one based on the maximum amplitude of the spectrum and an actuation penalization term.

Implementation of gradient-enriched machine learning control
The parameters chosen for the gMLC algorithm are similar to the ones chosen in Cornejo Maceda et al. (2021). The Monte Carlo sampling phase generates N MCS = 100 individuals. The exploration and exploitation phases both produce N p = 50 new individuals at each iteration. The exploration and exploitation phases alternate until 1000 individuals are evaluated. We recall that each individual is evaluated over T ev = 40 s. A relaxation time of 2 s is intercalated between two consecutive control law evaluation. The experiment time needs also to include the time needed to solve the reconstruction problem, but this constraint can be lifted with additional computation power. The limit of 1000 individuals is then chosen such as all the individuals are evaluated in one day. Cornejo Maceda et al. (2021) shows also that 1000 evaluations is enough to converge for a multiple-input multiple-output problem. The subplex space is generated by N sub = 10 control laws to balance speed and performance, as in Cornejo Maceda et al. (2021). For the evolution during the exploration phase, the crossover and mutation probabilities are both set to P c = P m = 0.5. The control laws are built from nine mathematical operations (+, −, ×, ÷, sin, cos, tanh, exp and log), ten flow features {a i } i..10 and N cr = 10 random constants. As suggested by Duriez et al. (2017), the ÷ and log operations are protected allowing them to be defined for all the real numbers. N vr = 14 registers are employed to derive the control laws. Finally the maximum number of instructions to be coded in the matrix representation is N inst,max = 50. The flow features employed for feedback control laws are the velocity signal and nine time-delayed velocity signals. Time-delayed sensor signals are introduced as inputs to enrich the search space and allow, in principle, ARMAX type controllers (Hervé et al. 2012), linear and nonlinear combinations of them. The resulting feature vector a reads: (4.1) Actually, only half of the delays are necessary but nine have been taken into account to enrich the phase space and get closer to full-state control. The presence of time-delayed information in the control can play, for example, the role of an embedding process in the new dynamical system consisting of the flow and the closed-loop control. Indeed, we have opted for single measurement point separated from the actuator by a convective time that intrinsically varies over time. Implicitly, the system under loop control is hence reduced to a purely temporal dynamical system and the spatial information can be interpreted as an embedding of this dynamic into a larger phase space. Table 4 summarizes the parameters for the gMLC optimization process. Figure 9 displays the experiment setup to control the open-cavity with machine learning control and specifically with gMLC.

Closed-loop control of the narrow-bandwidth regime
In this section, we describe the best control law derived by gMLC that mitigates the self-sustaining oscillations of the mixing layer for the narrow-bandwidth regime. In the following, the notations for the control law, cost and standard deviation associated with the learning on the narrow-bandwidth regime are marked by the superscript I. Figure 10 depicts the cost for the 1000 evaluated individuals during the optimization process. The individuals from the Monte Carlo sampling and exploration phases are   sorted following their cost as there is no direct causal relationship between two successive individuals, while the individuals generated during the exploitation phase are depicted in the order of their evaluation as each individual depends of the previous one. We recall that the cost function is defined such as the cost of the unforced flow is J 0 = 1. We note that a random sampling of 100 control laws already manages to reduce the cost function to J = 0.2410. Then, the first exploration phase (individuals i = 101, . . . , 150) reduces slightly the cost function to J = 0.2276. In the following exploitation phase (individuals i = 151, . . . , 206), the downhill subplex individuals reduces gradually the cost function to J = 0.1882. We note that, at first, the individuals are scattered along the vertical axis and then go down, close to lowest cost so far. This behavior shows that the individuals progress towards a minimum in the control law space. However, this descent is interrupted by the next exploration phase (individuals i = 207, . . . , 257) where a more performing individual is found, its cost is J = 0.1329. This new individual replaces the least performing individual in the simplex, allowing to explore beyond the initial subspace. It is worth noting that the dimension of the subspace remains the same as new individuals replace the least performing ones. From there on and until individual i = 512, only the exploitation phases built better individuals. The cost of the best individual after 512 individuals is J = 0.0402. Interestingly, we note that downhill simplex can build worse performing individuals. Indeed, we notice that all exploitation phase starting from the 4-th one include individuals whose costs are spread up to J = 1. The next exploration phase (individuals i = 513, . . . , 563) finds a better control, whose cost is J = 0.0311. As the simplex includes now poor performing individuals, four better performing individuals generated by the exploration phase are introduced in the simplex. From there on, progress is made only with exploitation steps: the cost of the best individual reaches a plateau after 707 evaluations and is only slightly improved after 913 evaluations. After 1000 evaluations, the cost of the final control law K I is J I = 0.0192. The corresponding peak reduction is J a I = 0.0129, i.e. 99% of the fluctuation level In other terms, the amplitude of the oscillations is reduced by a factor 9 or by 19 dB. The structure and components of K I is thoroughly described in appendix C.
The learned control law K I is re-evaluated 20 times afterwards to test its efficiency outside the learning loop. We note that the performances slightly dropped as the averaged cost reduction of J I went from −98% to −94% and the standard deviation increased from σ I = 61% to σ = 65%. Such discrepancy is expected as the experimental conditions are always evolving. Indeed, the temperature in the room, the evolution over time of the DBD actuator and the incoming velocity variation are all possible sources of fluctuations on the measured velocity. However, the discrepancies are small and the overall performance of the control law is retained. Figure 11 depicts the flow response under the control with the best control law K I derived by gMLC for the narrow-bandwidth regime. We observe that the control law K I effectively answers to the control objective as the oscillation amplitude of the velocity measured for the controlled flow is reduced compared to the unforced flow, see figure 11a. Term  1  a1  a2  a3  a4  a5  a6  a7  a8 a9 a10  Gain  k0  k1  k2  k3  k4  k5  k6  k7 k8 k9 k10 Value -1.50 0.51 -0.06 -0.07 0.01 -0.06 0.02 -0.04 0 0 0 This goes along the decrease of the standard deviation to σ I = 61%. Such feature was consistently seen in different learning iterations with gMLC. The power spectra on figure 11b show that K I effectively reduces the highest peak of the spectrum at frequency f a by almost two orders of magnitude. The effect of the control is also observed beyond the observation window between St ∈ [0.5, 1.75] as the harmonics of f a are also nullified.
Note that the peak f + associated with the mode n = 3 increases with the control. This behavior is not surprising as the frequency is associated with a quasi-stable mode of the flow, which then grows when energy is supplied to the system. Moreover, it appears that the frequency f + is split into two peaks. This peak-splitting phenomenon, referred as spillover, is well-known when building closed-loop transfer functions with unstable zeros and poles . Despite being beyond a linear framework, one can suspect that the same mechanism is behind the observed peak splitting. The learned law K I is not able to control f + as its power level is around one order of magnitude lower than f a under control.
In addition, such control has been achieved with a minimum actuation power, indeed the associated cost is J b I = 0.0063, less than 1% of the maximum actuation power. The actuation command, plotted on figure 11a, shows that the control is a combination of low level steady actuation and a low amplitude feedback control. To demonstrate the effectiveness of the low amplitude level, we use the closed-loop actuation command as an open-loop control. Meaning that the closed-loop actuation command recorded during the learning process has been employed as open-loop control signal to force the flow. The spectrum of the resulting flow (blue spectrum in figure 11b) shows that that the main frequency of the flow f a resurfaces and also that the mode n = 3 associated to the frequency f + is also excited. This open-loop test reveals that despite the low amplitude level, feedback plays a crucial role in stabilizing the flow. Moreover, when we compare the results of the open-loop control with a steady actuation of equivalent level (∼ 10%A max ) in figure 8a, we note that both frequencies f b and f + are excited, indicating the effect of the unsteady component of the command. Now we describe the learned law K I with an analytical approximation and a clusterbased visualization. For the analytical approximation, the determination coefficient for the affine reconstruction R 2 = 0.87 indicates an acceptable reconstruction. The gains associated with each feature component are presented in table 5. We note that, aside the mean component, the dominant term is a 1 = u, as its gain (k 1 =0.51) is more than 7 times higher than the second highest gain. The strong correlation between K I and a 1 reveals that phasor control or direct feedback of the system's state plays a major role for this control. However, figure 12 shows that the relationship between b and a 1 is not fully affine as two regions of significant width are displayed. This analysis is in agreement with the cluster-based investigation of the control law.
The control visualization, following the method described in § 3.4, makes it possible to better specify the dynamics of the control achieved by K I . The proximity map (figure 13a) shows that the centroids are arranged in a circular manner around the origin where the centroid 1 is located. The transition probability matrix (figure 13b) gives the probability transition from one cluster to the other for each time step dt = 0.004 s. The transition probability matrix also shows one fixed point and 2 cycles: one small with centroids 2, 3, 4 and 5 on one hand and one large with centroids 6, 7, 8, 9 and 10 on the other hand. Combining the proximity map and the transition probability matrix, we can reconstruct the phase space of the dynamics by deriving a control network model (figure 13c). The identified cycles in the transition probability matrix are then represented by limit cycles in the phase space. A frequency analysis based on a Poincaré section and angular first return map of the dynamics similar to Lusseyran et al. (2008) reveals that the frequencies associated to the large limit and small limit cycles are around 28.48 Hz Figure 12: Actuation command b versus a 1 for the case: K I controlling the narrowbandwidth regime. and 41.14 Hz respectively. We can then assume that the large limit cycle is associated with the dynamics of mode n = 2 (frequency f a = 28.81) and the small limit cycle is associated with mode n = 3 (frequency f + = 38.72). Following Lusseyran et al. (2008), such organization frequencies in the phase space shows that the dynamics may be structured around a fixed point of the stable spiral type where the oscillation's time period increases with the distance to the fixed point, here represented by cluster 1. Figure 13c also depicts the actuation regions in the phase space, revealing two mains regions of opposite actuation sign separated by a straight line. Interestingly, the sign of the actuation changes when approaching the fixed point. Such actuation map shows that the main stabilization mechanism exploited by K I is phasor control. The change of actuation sign when approaching the fixed point is explained by a phase shift due to the frequency change. The resulting control is similar to the one obtained with linear control by Yan et al. (2006) where there is a rapid switching between two modes competing for the available energy and thus mitigating any resonance. Moreover, a spectral analysis of the actuation command b and a 1 = u shows that they share the same frequency peaks comforting the phase relation between the control and the state of the system, see figure 25a in appendix D.
Finally, a comparison between MLC and gMLC has been performed (see appendix B) and reveals that gMLC outperforms MLC in terms of speed and final solution. In total, the learning has been accelerated by one order of magnitude.
Gradient-enriched MLC has been successfully applied to the stabilization of the opencavity flow experiment. Exploration and exploitation phases both participated to the fast learning of a feedback control law. The evolution phases managed to discover new minima in the search space and the simplex steps succeeded in converging towards a new minimum. A feedback control law is built, outperforming the steady actuation and allowing a similar reduction of the level of the maximum peak of the spectrum but with small actuation power. Both the analytical approximation and the cluster-based analysis hints a control combining phasor control and nonlinear interactions. Hence, the control achieved is close to an ideal stabilization scenario, where some kind of base state or fixed point, is stabilized with a vanishing cost. This interpretation is certainly to be considered heuristically given the complexity of the real dynamics of the 3D intra-cavity  flow and its nonlinear temporal and spatial interactions with the mixing layer. However, it aims at capturing the remarkable properties of the control law learned by the gMLC, whose mode of action is radically opposed to that of a control by steady forcing. In this section, the learning has been done for a single frequency regime, in the next section, a more challenging regime is controlled where two modes compete, strengthening nonlinear coupling.

Closed-loop control of the mode-switching regime
In this section, gMLC is employed to control a flow regime with strong nonlinear coupling which lead, in particular, to an intermittency between the two main instability modes of the mixing layer. The dynamics of intermittency is chaotic (Lusseyran et al. 2008) with the appearance of long time scales that are demanding from the point of view of machine learning. The goal is again to stabilize the flow by reducing the oscillations of the mixing layer but this time in the case where two modes compete as described in § 2.5. This constitutes a challenging problem as gMLC needs to learn a control law able to control two modes simultaneously. For the gMLC optimization, the same parameters as for the narrow-bandwidth regime have been employed, see § 4.2. In the following, the notations for the control law, cost and standard deviation associated with the learning on the mode-switching regime are marked by the superscript II. Figure 14 depicts the cost of the individuals evaluated along the learning process. For this specific experiment, most of the learning is realized during the Monte Carlo sampling phase, reducing the cost function to J = 0.0713. From there on, the only improvements are carried out with the simplex steps, bringing the cost function to the final value J II = 0.0565. Nonetheless, the second exploration phase introduced a new control law (#11) in the simplex (see table 8 of appendix C). As the gMLC algorithm is partially stochastic, it is possible to fall close to the global minimum by pure luck but it usually takes several iterations of the learning phases to converge, as in § 4.3. Such learning process has been observed in other realizations of the same experiment where a combination of exploration and exploitation have been necessary to reach similar levels of performance.
After 1000 evaluations, the final control K II is a feedback control law, thoroughly described in appendix C. The spectra of the flow under control (figure 15b) reveals a drastic decrease in the level of the maximum peak at frequency f a . The dominant frequency is then close to the one associated with the second main mode (f + ). The relative reduction of the maximum peak in the spectrum is J a II = 0.0335, i.e. 0.97% of the fluctuation level. This corresponds to a reduction of the oscillation's amplitude by a factor 5 or by 15 dB. The standard deviation associated decreases to σ II = 97%. Like for the narrow-bandwidth regime, the control is achieved with small actuation power, using around 2% of the maximum actuation power. Figure 16 shows that the controlled flow does no longer shows a mode-switching behavior like in figure 4b. The actuation command plotted in figure 15a shows a short-time intermittent high-amplitude spikes emerging from the minimum actuation level (A min ). Like for the narrow-bandwidth regime, the control law K II is re-evaluated 20 times and a small performance drop is observed: J II went from −94% to −89% and the standard deviation from σ II = 97% to σ = 112%. Finally, like in § 4.3, the time series of the actuation command has been employed as a signal input for an open-loop control. Surprisingly, the equivalent open-loop control performs as good as the close-loop control law, suggesting that feedback was not at play in the reduction of the power amplitude for the mode-switching regime. However, anticipating on the next section ( § 5), controlling the narrow-bandwidth regime with K II reveals that feedback is still a feature selected by gMLC.
For the polynomial approximation of the K II control of the mode-switching regime, a linear regression was unable to derive an affine reconstruction of the actuation command; the corresponding determination coefficient is R 2 = 0.13. Even with expanding the affine Figure 14: Distribution of the costs for the 1000 evaluated individuals during the gMLC optimization process for the mode-switching regime. Each dot represents the cost J of one individual. The color of the dots symbolize how the individuals have been generated: black dots for randomly generated individuals (Monte Carlo sampling phase), blue dots for individuals generated from a genetic operator (exploration phase) and yellow dots for the the individuals arising from the subplex method (exploitation phase). The individuals from the Monte Carlo sampling and exploration phases are sorted following their costs. The red line follows the evolution of the best cost. The vertical axis is in log 10 scale. reconstruction with quadratic and cubic terms, R 2 is less than 0.25, implying that no linear control can be inferred from the data.
Regarding the cluster-based analysis of the control, figure 17 reveals a complex structure for the dynamics. The transition matrix (not plotted) shows that the cluster selftransitions are dominant; they are not displayed in the control network for clarity. The reconstructed phase space reveals two regions of opposite actuation signs. The complex interactions between the centroids shows that strong nonlinearities are at play. Interestingly, the spectrum of the actuation command (figure 25b in appendix D) does not include any significant peak, except for the very low frequencies. The actuation command corresponds then to a random noise without any correlation with the measured velocity.
In less than 1000 evaluations, gMLC manage to build feedback control laws that reduced the maximum peak of the power spectrum with small actuation power in two regimes: the narrow-bandwidth regime and the mode-switching regime. We proposed a visual representation of the control laws to aid the interpretability of the actuation mechanisms that enabled such efficient controls. However, a real analysis of the controlled flow is not done yet and constitutes a study in itself. The identification of the involved control mechanism requires a study of the short transient that leads to the stabilized state. Nonetheless, we can affirm that the control impacts the linear amplifier of the shear layer as the DBD actuator modifies its thickness on average. This effect has been demonstrated by studying the difference between the unforced and forced mean flow even for low actuation levels, i.e., near the ionization threshold. Such mechanism is expected to remain valid for turbulent flows and even for often studied transonic cavity flows. Moreover, Cornejo Maceda et al. (2021) show that gMLC surpasses MLC in terms of performance of the final solution and learning speed for a 2D numerical simulation. In appendix A, we demonstrated that gMLC also surpasses MLC in experiments, establishing gMLC as a keystone for fast learning of feedback control laws directly on the plant. We foresee that gMLC will greatly contribute the learning of control laws for MIMO control.  The control network is depicted as in figure 13. The actuation amplitudes are denoted with bars, red (blue) for positive (negative) amplitude with respect to the mean actuation. The yellow boxes indicate 10% of the maximum amplitude. The black arrows serve as the most probable transition from one cluster to the other (p > 0.15). The gray arrows are for are for less probable transitions (0.15 p 0.10). Lower probability transitions and self transitions are omitted for clarity. The red (blue) background denotes the supposed regions of positive (negative) amplitude. The dotted black lines are the supposed limit that separates the regions.

Control law investigation
In this section, we further investigate the capabilities of the control laws K I and K II learned for the narrow-bandwidth regime and the mode-switching regime respectively. Firstly, the robustness of the laws is tested by applying each law on the other regime, comprising dynamics different from the learning conditions ( § 5.2). Secondly, we characterize the new control of these regimes with an affine approximation and our cluster-based visualization method (5.2). Finally, we establish the existence of an effective feedback for the control of both regimes with open-loop tests ( § 5.3).

Robustness of the control
In this study, feedback control laws are optimized for steady conditions: fixed Reynolds number and incoming velocity during the learning process. To achieve robustness, the control laws learned should also perform for a large range of parameters. Thus, to test the robustness of the learned laws, they are re-evaluated for operating conditions different from the learning ones: the law K I optimized for the narrow-bandwidth regime is now employed to control the mode-switching regime and vice versa. Such test is demanding as the dynamics are partially different from one regime to the other. In particular, the unforced dynamics of the mode-switching regime includes the main frequency of the narrow-bandwidth regime but also display an intermittency with another frequency. Figure 25 displays the spectra for each controlled case. For the control of the narrowbandwidth regime with K I (figure 18a, green line), we note that the law K II manages to reduce the main peak f a of the spectrum to the same level as the law K I (red line). Moreover, we can notice that for the frequency range around f + , K II performs better than K I where it manages to nullify the peak-splitting phenomenon described in § 4.3. Such feature is expected as K II has been optimized to also control the mode f + . In fact, the whole spectral range is reduced in amplitude. This non-transfer of energy to other frequencies is a remarkable feature of K II .
As for the control of the mode-switching regime with K I (figure 18b, green line), the power of the main frequency f a is drastically reduced, reaching the same power level as the control with K II (red line). However, the controller is less efficient than K II , as K I fails to reduce the power associated to the frequency range surrounding the frequency f + of mode n = 3. Again, like for the control of the narrow-bandwidth regime with K I , we observe the spillover effect with a splitting of the main frequency into two frequencies on either side and with a lower power level.
In summary, both learned control laws K I and K II are able to retain their efficiency when controlling regimes that are out of the learning conditions. Expectedly, K I was unable to reduce the peak of the frequency f + in the mode-switching regime but it manages to reduce the f a peak as it was built for, despite appearing only intermittently. On the other hand, for the narrow-bandwidth regime, the control with K II was more significant than K I as it manage to also control prevent the peak-splitting (or spillover) of the third mode. Thus this test also reveals, that K II is not only able to control the frequency f a and f + but also to prevent the rise of both modes simultaneously. Of course, from the point of view of the cost function, both control laws K I and K II are similar when controlling the narrow-bandwidth regime and gMLC could have converges towards any of the two control laws. Yet, K II is the control law that answers the best the control objective which is the stabilization of the flow. Therefore, it is really the learning conditions that make the difference. This analysis reveals that learning a control in complex and rich conditions is beneficial for the robustness and for the overall efficiency of the control as the richness of the dynamics will be reflected in the control law. Figure 18: Spectral response of the flow controlled by K I and K II for the narrowbandwidth regime (top) and the mode-switching regime (bottom). Black spectra corresponds to the unforced dynamics of each regime; Red spectra corresponds to the flow controlled by law learned in the given regime, i.e., K I (K II ) for the narrow-bandwidth (mode-switching) regime. Green corresponds to the flow controlled by the law learned in the other regime, and blue to the open-loop equivalent of the latter one. The horizontal dashed lines denote the maximum of each spectrum in the observation window (shaded green section). The vertical axis are in log 10 scale.

Interpretation of the resulting controlled flow
Now, we propose an interpretation of the controls performed in the previous section using both analytical approximation and cluster-based visualization of the control laws. Firstly, we analyze the case where K II controls the narrow-bandwidth regime. Like for § 4.4, a linear regression is unable to reconstruct the actuation command despite being in a less complex regime. The determination coefficient is R 2 = 0.13. The addition of quadratic terms brings the coefficient no higher than 0.78. A complex nonlinear control is expected as K II manages to control both frequencies f a and f + .
As for the control law visualization, figure 19 depicts a similar control network as in § 4.3. However, there is only one large cycle composed of the centroids 1, 2, 3, 4, 5, 6 and 7. The limit cycle presents four phases regularly alternating between positive and negative actuation, suggesting that the control operates at twice the frequency of the Figure 19: Visualization of the control law K II controlling the narrow-bandwidth regime. The control network is depicted as in figure 13. The actuation amplitudes are denoted with bars, red (blue) for positive (negative) amplitude with respect to the mean actuation. The yellow boxes indicate 25% of the maximum amplitude. The black arrows serve as the most probable transition from one cluster to the other (p > 0.50). The gray arrows are for less probable transitions (0.50 p 0.5). Lower probability transitions and self transitions are omitted for clarity. The red (blue) background denotes the supposed regions of positive (negative) amplitude. The dotted black lines are the supposed limit that separates the regions. flow. A spectral analysis of a 1 = u and b (see figure 25c in appendix D) shows that the main peaks are respectively f a = 29.03 Hz and f = 58.23 Hz ≈ 2f a , confirming the phase relation between the flow dynamics and the actuation command. Interestingly, the second harmonic 2f a is slightly excited but does not resonate for the controlled flow (figure 18a, green line), while it clearly resonates for the open-loop equivalent of K II (blue line). In summary, K II controls the flow at twice the main frequency but avoids the resonance of the second harmonic. The efficiency of a control at twice the main frequency has been previously reported by Schumm et al. (1994). The authors note the stabilization of a cylinder wake by transverse vibration of the cylinder at 1.8 times the natural shedding frequency. In particular, they declare that the control effect is due to a nonlinear interaction between the instability and the forcing input. Thus, both, the analytical and cluster-based analysis point towards a nonlinear actuation mechanism for the control of the narrow-bandwidth with K II .
Secondly, we interpret the case where K I controls the mode-switching regime. This time, the linear regression manages to build an affine approximation of the control as the determination coefficient is R 2 = 0.92. The gains associated with each feature component are displayed in table 6. Like for § 4.3, the most relevant feature is a 1 = u(t) but again the control cannot be reduced to an affine relationship as figure 20 displays a nonlinear curve. Such observation is in agreement with the spectral analysis of b and a 1 = u (see figure 25d in appendix D) showing the peaks at the same frequency. The complexity of the flow translates into a complex control network as in § 4.4. Figure 21 depicts a  Term  1  a1  a2  a3  a4  a5  a6 a7 a8  a9  a10  Gain  k0  k1  k2  k3  k4  k5  k6 k7 k8  k9 k10 Value -1.65 0.48 -0.06 -0.05 0.01 -0.01 0 0 0 0.01 0 Table 6: Gains for the affine reconstruction of K I controlling the mode-switching regime.
Figure 20: Actuation command b versus a 1 for the case: K I controlling the mode-switching regime.
reconstructed phase space divided into two main regions: one on the left (centroids 1, 2 and 3) with positive actuation amplitude; and one on the right (centroids 4, 5, 6, 7, 9 and 10) with negative actuation amplitude. The role of centroid 8 may be small as its associated actuation is close to the mean value. Interestingly, the overall structure of the control network is similar to the one on figure 21, suggesting that the control mechanism is also a complex nonlinear one.

The need of feedback
We've shown on the control of the narrow-bandwidth regime ( § 4.3) that without feedback the control law K I is unable to stabilize the flow and even excites the frequencies f a and f + (figure 11b). Moreover, K I is able to partially control the mode-switching regime ( § 5.1) and, as expected, the same controller applied in an open-loop manner is no longer able to control the main mode f a (blue spectrum in figure 18b). We note, nonetheless, a small shift of the spectrum towards the higher frequencies.
On the other hand, surprisingly, it has been shown in ( § 4.4) that K II performs in openloop as well as in closed-loop. This result is consistent with the absence of correlation between the actuation command b and the velocity measure a 1 = u, see figure 25b in appendix D. So, it seems that the learning process has selected the flow information (a i , see appendix C) without any improvement of the cost. However, K II applied in closedloop manner to the narrow-bandwidth regime performs even better than K I , while in an open-loop manner, K II fails to achieve any control. Indeed, the corresponding spectrum (blue spectrum in figure 18a) is almost similar to the unforced flow. Therefore, it should Figure 21: Visualization of the control law K I controlling the mode-switching regime. The control network is depicted as in figure 13 The actuation amplitudes are denoted with bars, red (blue) for positive (negative) amplitude with respect to the mean actuation. The yellow boxes indicate 25% of the maximum amplitude. The black arrows serve as the most probable transition from one cluster to the other (p > 0.15). The gray arrows are for less probable transitions (0.15 p 0.9). Lower probability transitions and self transitions are omitted for clarity. The red (blue) background denotes the supposed regions of positive (negative) amplitude. The dotted black lines are the supposed limit that separates the regions. be noticed that the flow state information a i in K II are truly functional and in fact give the controller the ability to remain effective well away from the learning conditions. This analysis shows the extent to which feedback is a key feature for control. We believe that the ability of gMLC to learn effective and efficient feedback control laws in experiments will greatly benefit future MIMO control experiments.

Stabilization of the open cavity flow
In this study, the flow is monitored by a single hot-wire sensor downstream of the actuator. The sensor signal is employed for sensor feedback and to characterize the controlled flow. The achieved stabilization near the sensor extends in spanwise and streamwise direction, assumingly a significant portion of the finite aspect ratio cavity.
For large aspect ratios S/D, e.g. O(100) or more, the effect of two-dimensional actuation along the whole span can be expected to depend on the sensor location. The feedback stabilization in the sensor plane will become a non-stabilizing open-loop actuation far beyond the spanwise coherence length. An interesting example has been reported for the stabilization of a large-aspect-ratio cylinder wake. Roussopoulos (1993) forced the cylinder wake with a pair of loudspeakers driven in opposite phase and significantly reduced the fluctuations at the downstream sensor location. Far away from the sensor in spanwise direction, no stabilization was observed. For our open cavity flow, we expect a loss of control authority at a distance from the hot-wire position greater than the transverse coherence length. This transverse coherence of the mixing layer instabilities is at least of the order of the cavity depth D. Our cavity has an aspect ratio of S/D=6, i.e. we control at least one third of the cavity spanwise (1D on either side of the hot-wire plane). The remaining lateral thirds are in the Ekman layers and therefore probably less oscillating. The global stabilization could be augmented with multiple actuators and multiple sensors.
As for the spanwise homogeneity of our one-piece DBD actuator, measurements performed for actuation levels close to the ionization threshold, when the ionization is still quite inhomogeneous along the electrode, show that the flow response is independent of the spanwise location of the measurement point.
Concerning the streamwise direction, the source of the oscillation is related to the mixing layer through the Kelvin-Helmholtz instability. At the level of the mixing layer, the hypothesis of an oscillation along the length of the cavity which would be canceled at the downstream edge corresponds to the idea of a standing wave with a node at the hot-wire location. However, with a Kelvin-Helmholtz instability, we are in the case of a convective instability which can only be killed by canceling the disturbances at the source. Otherwise, any oscillations close to the most dangerous frequency would inevitably be amplified and be present at all x and especially at the hot-wire location. This is well observed in the visualization of the natural flow where the oscillations reach maximum non-linear amplitudes and break on the downstream corner. Hence, the downstream stabilization of the flow results necessarily on the control of the mixing layer in the streamwise direction.

Conclusions
This paper deals with a closed-loop stabilization experiment of an oscillating flow exhibiting non-linear coupling between several frequencies. The control law is automatically learned with gradient-enriched machine learning control (gMLC) (Cornejo Maceda et al. 2021). The chosen plant is an open cavity flow in experiment for two distinct regimes: a narrow-bandwidth regime with dominant frequency f a and a mode-switching regime where another frequency f + temporarily occurs. The flow is actuated upstream at the leading edge with a DBD plasma actuator and monitored downstream at the trailing edge with a hot-wire sensor. The cost function penalizes the energy peaks at the dominant frequencies in this velocity signal and the actuation power.
First, the effect of steady forcing is explored. For the narrow-bandwidth regime, an increasing actuation level progressively mitigates the main frequency f a while the energy of the other mode (f + ) rises. The fluctuation energy is reduced by up to 97% compared to the unforced case. The corresponding maximum actuation level defines the limit where a residual resonance can still be observed. Similarly, on the mode-switching regime, the two frequencies present in the flow (f a and f + ) are both damped as the actuation level increases. 90% decrease of the maximum power is achieved for 88% of the maximum actuation level. Thus, reducing the main oscillations of the mixing layer is possible with a high-amplitude steady forcing.
Second, a feedback control law from hot-wire signal to DBD actuation is optimized with gradient-enriched machine learning control. The control law associated with the narrowbandwidth regime reduces the energy of the peak frequency to 1.29% of the unforced case, i.e., more than the steady forcing. In addition, this better feedback performance requires less than 1% of the steady open-loop actuation power. Feedback is demonstrated to be crucial for the established control: an open-loop control with the recorded feedback actuation command has hardly any stabilizing effect. A novel cluster-based investigation of the control law indicates a similar mechanism as fixed point stabilization with phasor control. This mechanism is corroborated by an analytical simplification of the control law. Intriguingly, the phase delay strongly varies with amplitude of the oscillations. Thus, the Figure 22: Summary of the performance for the laws learned (K I , K II ) in this study. On the left, the performance of the laws during the optimization process and on the right, the performance during the offline re-evaluations. The re-evaluation results are averaged over 20 realizations. The results associated with the learning on the narrow-bandwidth regime and the mode-switching regime are marked by the superscripts I and II respectively. The downward arrows symbolize cost reduction. The feedback symbols indicate whether ( ) or not (×) feedback is a necessary feature for the control: σ designates the normalized standard deviation of the downstream velocity.
control has the features of stabilizing control of fixed point with minimal actuation power to compensate for system noise.
Third, gMLC is also employed to optimize the control law to stabilize the modeswitching regime. The learned law manages to successfully decrease the energy related to the two main frequencies to 3.35% of the unforced case and also with small actuation power around 2% of the maximum actuation level. This time, the control performs in open-loop as well as in closed-loop. The actuation mechanism seems hardly interpretable and more complex than phasor control. Re-evaluation of the learned laws leads to a slight performance drop rendering them insensitive to the varying experimental conditions. Finally, the robustness of the optimized controllers is assessed by applying the law learned for one regime to the other regime. Expectedly, the law learned in the narrowbandwidth was only partially stabilized the mode-switching regime: The energy associated with frequency f a is similar to their minimal actuated level while the energy of f + is hardly mitigated. On the other hand, the law learned in the mode-switching regime performs even better than the law learned in the given regime. The main frequency f a is controlled and the f + spillover effect is prevented, revealing that a simultaneous control of both frequencies is possible. Moreover, the need for feedback is demonstrated: Applying the recorded closed-loop actuation command in open loop fashion has hardly any stabilizing effect. Figure 22 summarizes the control performances for each case and also re-evaluation tests of the learned laws to assess their robustness. Lastly, the global nature of the stabilization is discussed.
Summarizing, the feedback in stabilization is demonstrated as for similar linear control  and model-based control Samimy et al. (2007a). The actuation power is shown to be a tiny fraction as compared to stabilizing steady actuation.
The key enabler for the fast learning of feedback control laws directly in the plant is gradient enriched machine learning control as regression solver. Genetic programming as evolutionary algorithm explores and populates new local minima while the subplex simplex method efficiently slides down towards the minima exploiting the local gradient information. A comparison between gMLC and MLC confirms the benefits of the gradient-augmented method for the control performance and learning rate. Fast learning is critical for experiments with limited testing budget.
Moreover, the performances of the learned laws in one regime at least partially persist when applied to another regime. Intriguingly, the law obtained in the mode-switching regime outperforms the feedback law for single-frequency regime as it has learned to stabilize the two characteristic frequencies. Parezanović et al. (2016) made a similar observation for the destabilization of the mixing layer.
We demonstrated the learning capability of gMLC for moderate Reynolds numbers on a single-input single-output (SISO) control experiment. Ongoing work focuses on the learning of multiple-input multiple-output (MIMO) feedback laws in more complex flows. One example is drag reduction of a generic truck model under yaw. Another example is lift increase of an airfoil under angle of attack at a Reynolds number near one million. Hitherto, already the gMLC predecessor, machine learning control (MLC) has been successfully employed in dozens of numerical and experimental plants Noack 2019) comprising O(10) control inputs and O(10) control outputs. Future, MIMO control law optimizers may be expected to synergize a spectrum of methods. One example is cluster-based control (Nair et al. 2019) which can rapidly learn smooth control laws and deep reinforcement learning (Rabault et al. 2019(Rabault et al. , 2020Fan et al. 2020;Ren et al. 2021) which seems to be very efficient in exploiting short-term actuation responses. comprise several frequencies f ∆ , f Ω , f b , f l , f a , f r , f + that interact nonlinearly with each other and their harmonics. The first two are shown as originating from centrifugal instabilities taking place span-wise within the intra-cavity recirculation, f b , the so called edge frequency, and all the following ones are directly associated with the shear layer instability.
We shall not describe the dynamics of the flow as it is presented in detail in (Basley et al. 2013). Figure 23, extracted from (Basley et al. 2013), highlights the interest of the open cavity as a benchmark of adjustable complexity for the development of machine learning algorithms. Beyond the benchmark role, the open cavity is still one of the flow configurations frequently encountered in industrial applications such as transportation systems and still has a strong impact on the performance and noise level of these vehicles.
• Crossover : two new individuals are generated by stochastic recombination of two individuals, exploiting parts of the parent individuals; • Mutation: a new individual is generated by a stochastic modification in one individual, the resulting individual may share new structures or generate new ones depending on the impact of the change; • Replication: an identical copy of one individual is generated, assuring memory of good individuals throughout the generations. The genetic operators are applied to the better-performing individuals to generate the next generations of individuals. The best individuals are selected with a tournament selection method. As suggested in Duriez et al. (2017) a tournament selection of size of 7 for 100 individuals is chosen. Genetic operations are chosen randomly following given probabilities: the crossover probability P c , the mutation probability P m and the replication probability P r . The probabilities add up to unity P c + P m + P r = 1. Following Cornejo Maceda et al. (2021), we choose [P c , P m , P r ] = [0.6, 0.3, 0.1] as this set of parameters converges towards better solutions in average and has one of the lowest dispersion of the final solution. Moreover, an elitism operator, transferring the best individual of one generation to the next, is employed to assure that the best individual does not get lost. The parameters employed for the definition of the control laws are the same as for gMLC, see table 4. The individuals are evaluated over T ev = 40 s for both the gMLC and MLC experiments. And for a fair comparison, a population of 100 individuals is chosen to evolve over 10 generations, for a total of 1000 individuals. Figure 24 shows the learning process of MLC and the distribution of the individuals evaluated following their cost J. We note that most of the learning is unusually done at the Monte Carlo sampling phase, where the cost is reduced to J = 0.12. The next improvement is carried out at the 8-th generation, where the cost of the best control law slightly decreases to J = 0.10 and the associated standard deviation is σ = 1.59. Such type of control laws have been encountered in most of MLC realizations. We take a particular case where the Monte Carlo sampling phase is particularly efficient and where the evolutionary phases does not allow us to leave this local minimum. It is then necessary to wait for 2000 evaluations to reach performances similar to gMLC, the final cost being J = 0.05 and the standard deviation dropping to σ = 0.73. Note that for 700 evaluations, gMLC already reduced the cost function to J = 0.02. As described in § 4.3, the progress of gMLC results on one side, from the exploration of the control law space with the crossover and mutation operators and, on the other side, from the exploitation with the gradient descent performed with downhill simplex. Therefore, for a same number of evaluations gMLC surpasses MLC both in terms of learning speed and performance of the final solution. By multiplying the gains in terms of speed and cost, gMLC outperforms MLC by one order of magnitude. The benefits of gMLC over MLC have been described in Cornejo Maceda et al.

Appendix C. Gradient-enriched MLC laws
In this appendix, we describe the control laws derived with gMLC. In § C.1, we detail the control law learned in the narrow-bandwidth regime K I ( § 4.3) and in § C.2, we give more insight on the control law learned in the mode-switching regime K II ( § 4.4).
C.1. Control law learned in the narrow-bandwidth regime, K I The control law K I learned by gMLC in the narrow-bandwidth regime is a linear combination of 19 control laws; it is rewritten in equation (C 1). We recall that the Figure 24: Distribution of the costs during the MLC optimization process. Each dot represents the cost J of one individual. The color of the dots represent how the individuals have been generated. Black dots for the individuals randomly generated by Monte Carlo sampling (individuals i = 1, . . . , 100), blue dots for the individuals generated with genetic operator (individuals i = 101, . . . , 1000). For each generation the individuals have been sorted according to their cost. The red line shows the evolution of the best cost for the MLC optimization process. The green line corresponds to the gMLC optimization process. The vertical axis is in log 10 scale. division and logarithm operations are protected to be defined over all the real values. We note that K I includes all feedback signals at least one time except sensor a 8 = u(t − 7T s ) that is missing. Sensor a 1 and a 2 are present in majority; 12 occurrences for a 1 and 5 for a 2 , supporting the possibility of phasor control as a control mechanism. Table 7 breaks down the control law K I into linear combination of control laws. We note that nine additional control laws (#11 to #19) have been introduced in the simplex due to the exploration phases. Moreover, the best performing control law (#15) among the 19 control laws is associated with the highest weight. However, control law #17 weight is a close second, suggesting that K I is mainly composed of control laws #15 and #17.
(C 1) C.2. Control law learned in the mode-switching regime, K II The control law K II learned by gMLC in the mode-switching regime is a linear combination of several control laws; it is rewritten in equation (C 2).