Hostname: page-component-6766d58669-l4t7p Total loading time: 0 Render date: 2026-05-21T08:56:03.897Z Has data issue: false hasContentIssue false

Active flow control for bluff body drag reduction using reinforcement learning with partial measurements

Published online by Cambridge University Press:  21 February 2024

Chengwei Xia
Affiliation:
Department of Aeronautics, Imperial College London, London SW7 2AZ, UK
Junjie Zhang
Affiliation:
Department of Aeronautics, Imperial College London, London SW7 2AZ, UK
Eric C. Kerrigan
Affiliation:
Department of Aeronautics, Imperial College London, London SW7 2AZ, UK Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, UK
Georgios Rigas*
Affiliation:
Department of Aeronautics, Imperial College London, London SW7 2AZ, UK
*
Email address for correspondence: g.rigas@imperial.ac.uk

Abstract

Active flow control for drag reduction with reinforcement learning (RL) is performed in the wake of a two-dimensional square bluff body at laminar regimes with vortex shedding. Controllers parametrised by neural networks are trained to drive two blowing and suction jets that manipulate the unsteady flow. The RL with full observability (sensors in the wake) discovers successfully a control policy that reduces the drag by suppressing the vortex shedding in the wake. However, a non-negligible performance degradation ($\sim$50 % less drag reduction) is observed when the controller is trained with partial measurements (sensors on the body). To mitigate this effect, we propose an energy-efficient, dynamic, maximum entropy RL control scheme. First, an energy-efficiency-based reward function is proposed to optimise the energy consumption of the controller while maximising drag reduction. Second, the controller is trained with an augmented state consisting of both current and past measurements and actions, which can be formulated as a nonlinear autoregressive exogenous model, to alleviate the partial observability problem. Third, maximum entropy RL algorithms (soft actor critic and truncated quantile critics) that promote exploration and exploitation in a sample-efficient way are used, and discover near-optimal policies in the challenging case of partial measurements. Stabilisation of the vortex shedding is achieved in the near wake using only surface pressure measurements on the rear of the body, resulting in drag reduction similar to that in the case with wake sensors. The proposed approach opens new avenues for dynamic flow control using partial measurements for realistic configurations.

Information

Type
JFM Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press.
Figure 0

Figure 1. The RL framework. The RL agent and flow environment, and the interaction between them, are demonstrated. The PM case is shown, where sensors are located on the downstream surface of the square bluff body: 64 sensors are placed by default, and the red dots show only a demonstration with a reduced number of sensors. Two jets located upstream of the rear separation points are trained to control the unsteady wake dynamics (vortex shedding).

Figure 1

Figure 2. Demonstration of an FM environment with a static feedback controller (‘FM-Static’), a PM environment with a static feedback controller (‘PM-Static’), and a PM environment with a dynamic feedback controller formulated as an NARX model (case ‘PM-Dynamic’). The dashed curve represents the bottom blowing/suction jet, and the red dots demonstrate schematically the locations of the sensors.

Figure 2

Figure 3. Episode rewards (solid lines) and RMS of drag coefficient (dashed lines) against episode number during the maximum entropy RL phase with TQC.

Figure 3

Table 1. Number of episodes $N_{c}$ required for RL convergence in different environments. The episode reward $R_{ep,c}$ at the convergence point, the configuration of the neural network and the dimension of inputs are presented for each case. Here, $N_{fs}$ is the finite-horizon length of past actions measurements.

Figure 4

Figure 4. (a) Drag coefficient $C_D$ without control (‘Baseline’) and with active flow control by RL in both FM and PM cases. In PM cases, control results with a dynamic and static feedback controller are presented. The dash-dotted line represents the base flow $C_{Db}$. (b) The mass flow rate $Q_1$ of one of the blowing and suction jets.

Figure 5

Figure 5. Contours of velocity magnitude $|\boldsymbol {u}|$ in the asymptotic regime of control (at $t=100$). Areas of $(-4,26)$ in the $x$ direction and $(-3,3)$ in the $y$ direction are presented for visualisation: (a) baseline (no control), (b) PM-Static, (c) PM-Dynamic, (d) FM-Static.

Figure 6

Figure 6. (a) Mean and (b) RMS base pressure for controlled and uncontrolled cases from the $64$ wall sensors on the downstream surface of the bluff body base.

Figure 7

Figure 7. Time series of pressure differences $\Delta p_t$ (blue) and action $a_{t-1}$ (red) for (a) PM-Static and (b) PM-Dynamic cases. Control is applied at $t=0$. The arrows are pointing from low to high values of $|y_{sensor}|$ among $\Delta p_t$ curves. The vertical dashed lines mark the time instances of the vorticity snapshots in figure 8.

Figure 8

Figure 8. Vorticity snapshots at the transient phase of control: (ac) PM-Static, (df) PM-Dynamic.

Figure 9

Table 2. Correspondence between the number of vortex shedding (VS) periods and frame stack (history) length in samples $N_{fs}$. The RL control step size is $t_a =0.5$, and $N_{fs}$ is rounded to an integer.

Figure 10

Figure 9. Average drag coefficient $\langle C_{D}\rangle$ and average episode reward $\langle R_{ep}\rangle$ in PM cases against number history length (numbers of stacked frames) $N_{fs}$. Here, $\langle C_{D}\rangle$ is obtained from the asymptotic regime of control, and $\langle R_{ep}\rangle$ is calculated from two episodes after convergence of RL.

Figure 11

Figure 10. Curves of drag coefficients after control applied in both FM and PM environments. Results from FM cases are presented as references, while a performance difference can be observed in the PM cases with and without past actions included.

Figure 12

Figure 11. Tests of RL-trained controllers with various reward functions. Drag coefficient $C_D$ curves are presented for each case. Dotted lines denote the cases with FM environments, while solid lines denote PM environments. The dash-dotted line represents $C_D$ in the base flow, which has no vortex shedding. Control starts at $t=0$, with the same initial conditions for every case.

Figure 13

Figure 12. Curves of drag coefficients after control applied at $t = 0$ in PM-Dynamic cases. Sensor configurations with different sensor numbers $N = 1, 2, 16, 32, 48, 64$ are tested. The dash-dotted line presents $C_D$ from the base flow. The inset shows the asymptotic drag coefficient $\langle C_D \rangle$ (time-averaged value after $t = 80$) and probe number $N$.

Figure 14

Figure 13. Pressure measurements in $t\in [0,40]$ (early transient stage in the controlled case) from two surface sensors: (a) baseline without control; (b) PM-Dynamic with an NARX controller, $N=2$. All curves are detrended by a fifth-order polynomial to reveal the relationship between measurements from the two sensors.

Figure 15

Figure 14. Asymptotic drag coefficient $\langle C_D \rangle$ for baseline, base flow, and tests of RL-trained controllers in both FM and PM environments with different $Re$. The controllers were trained at $Re=100$ (dashed line), and tested at $Re= 80, 90, 100, 110, 120, 150$. The controllers were trained again at $Re=150$ (dash-dotted line) and tested at $Re=150$ (square and diamond markers). All curves are fitted using a third-order spline.

Figure 16

Figure 15. Comparison of control performance in terms of $C_D$ between SAC and TQC. Control starts at $t=0$. Solid curves show the cases using TQC and baseline, while dotted curves show SAC. The dash-dotted curve corresponds to the base flow $C_D$.

Figure 17

Figure 16. Computational mesh of the simulation domain, $x \in (-20.5,26.5)$ and $y \in (-12.5, 12.5)$. A zoom-in view around the bluff body is presented in the black rectangle at the right. Boundaries of the simulation domain, bluff body surface and jet area are denoted.

Figure 18

Table 3. Hyperparameters used by default in TQC. For SAC, ‘top quantiles to drop per net’ is not used, and other parameters remain the same. For the entropy target, $-\operatorname {dim}(\mathcal {A})$ denotes the dimension of action space $\mathcal {A}$.

Figure 19

Figure 17. A long evaluation for 400 non-dimensional time units of the RL-trained dynamic controller in a PM environment. Control starts at $t=0$. Solid curves show the controlled $C_D$ using TQC and baseline without control. The dash-dotted curve corresponds to the base flow $C_D$. The mass flow rate $Q_1$ is presented for $t\in [0,200]$ and $t\in [200,400]$, respectively.

Figure 20

Figure 18. Base flow (steady flow without vortex shedding) obtained with a half-domain simulation, i.e. $y \in [0,12.5]$. A sub-domain $y \in [0,3.5]$, $x \in [-4,26]$ is plotted for demonstration. The symmetric boundary condition is applied on the $y = 0$ boundary. The mesh of the simulation is consistent with figure 16.