Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-09T12:35:53.704Z Has data issue: false hasContentIssue false

Reinforcement-learning-assisted control of four-roll mills: geometric symmetry and inertial effect

Published online by Cambridge University Press:  02 June 2025

Xuan Dai
Affiliation:
Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, 117575 Singapore, Republic of Singapore
Da Xu
Affiliation:
Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, 117575 Singapore, Republic of Singapore
Mengqi Zhang*
Affiliation:
Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, 117575 Singapore, Republic of Singapore
Yantao Yang
Affiliation:
Department of Mechanics and Engineering Science, State Key Laboratory for Turbulence and Complicated System, College of Engineering, Peking University, Beijing 100871, PR China
*
Corresponding author: Mengqi Zhang, mpezmq@nus.edu.sg

Abstract

Embedding the intrinsic symmetry of a flow system in training its machine learning algorithms has become a significant trend in the recent surge of their application in fluid mechanics. This paper leverages the geometric symmetry of a four-roll mill (FRM) to enhance its training efficiency. Stabilising and precisely controlling droplet trajectories in an FRM is challenging due to the unstable nature of the extensional flow with a saddle point. Extending the work of Vona & Lauga (Phys. Rev. E, vol. 104(5), 2021, p. 055108), this study applies deep reinforcement learning (DRL) to effectively guide a displaced droplet to the centre of the FRM. Through direct numerical simulations, we explore the applicability of DRL in controlling FRM flow with moderate inertial effects, i.e. Reynolds number $\sim \mathcal{O}(1)$, a nonlinear regime previously unexplored. The FRM’s geometric symmetry allows control policies trained in one of the eight sub-quadrants to be extended to the entire domain, reducing training costs. Our results indicate that the DRL-based control method can successfully guide a displaced droplet to the target centre with robust performance across various starting positions, even from substantially far distances. The work also highlights potential directions for future research, particularly focusing on efficiently addressing the delay effects in flow response caused by inertia. This study presents new advances in controlling droplet trajectories in more nonlinear and complex situations, with potential applications to other nonlinear flows. The geometric symmetry used in this cutting-edge reinforcement learning approach can also be applied to other control methods.

Information

Type
JFM Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. ($a$) Schematic diagram of four-roll mill showing the control set-up for a droplet initially positioned in the sub-quadrant iii. ($b$) The five initial positions of the droplet to be studied in the present work with different $h_0$ and $\alpha _0$. The blue shade encircles the initial positions considered by Vona & Lauga (2021). The definition of the angle $\beta$ in the reward definition (2.2) is also illustrated. ($c$) Validation of our DNS results (black solid lines) against those (dashed lines) of Higdon (1993) at vanishingly small $Re$. The letters represent the cases of different $a/b$ and $l/b$ as summarised in the table.

Figure 1

Table 1. The cases considered in this work and the parameters selected in each case under $a/b=0.625$, $l/b=3.6$. $h_0$, initial radial distance to origin; $\alpha _0$, initial angle; $\eta$, clipped value for sampling actions; $p,q,c$, parameters in the reward function (2.2); $h_e$, target distance to origin; $\gamma$, discounting factor; $\Delta t_a$, time interval between adjacent control actions; $N$, maximum control steps per epoch.

Figure 2

Table 2. Explanations for the parameters in table 1.

Figure 3

Figure 2. Agent training history for droplets at different initial positions. The first row corresponds to the closest radial distance to the origin. The second row shows the cumulative rewards of each epoch.

Figure 4

Figure 3. Converged policies in figure 2 are run for $50$ episodes at $Re=0.4$, all droplets are successfully driven to the target radial distance. Panels (a), (c) demonstrate representative trajectories for all the considered cases. Panels (b), (d) display the corresponding actions. Note that only two rollers (1) and (2) are acted on, and an action is followed by a waiting time $\Delta t_a=50\Delta t$ until the next one, during which it is unchanged. Therefore, in each control step shown in panels (b), (d), actions are actually step functions, and continuous lines in these panels are guides for the eye.

Figure 5

Figure 4. Instantaneous quiver plots for the velocity field in case 5.2 at $Re=0.4$. (a) Without control. The droplet tends to be swept away. The flow is inherently symmetric. (b) The last time step of an epoch for a converged policy. The white cross represents the starting position of the droplet and the red one the ending position in this control case of $h_0=0.25\sqrt {2}$ and $\alpha _0=45^{\circ }$.

Figure 6

Figure 5. The domain of FRM is evenly divided into eight sub-quadrants. By the geometric symmetry, the control policy trained in one of the sub-quadrants can be applied to the entire domain. For example, panel ($a$) shows that the dynamics of the blue droplet in sub-quadrant iii and that of the red droplet in the sub-quadrant iv is symmetric with respect to the antidiagonal line (${-}45^\circ$). Panel ($b$) shows that symmetry with respect to the vertical axis for the droplets in sub-quadrants ii and iii.

Figure 7

Figure 6. Trajectories by applying the policies trained in § 3.1 based on geometric symmetry in FRM. Note that the polices trained in sub-quadrant iii (see the solid lines) are applied directly to other sub-quadrants (the dashed lines) without new training. ($a$) $h_0=0.05\sqrt {2}$ and $\alpha _0=60^{\circ },80^{\circ }$. ($b$) $h_0=0.1\sqrt {2}$ and $\alpha _0=60^{\circ },80^{\circ }$. ($c$) $h_0=0.25\sqrt {2}$ and $\alpha _0=45^{\circ }$.

Figure 8

Figure 7. Effect of inertia on the droplet trajectory subject to ad hoc action variation. The blue dot marks the initial position of the droplet at $[-0.25, 0.25]$. Panel ($a$) illustrates the action history of the roller (labeled as roller 2), where different colours represent the variations in the applied control action over time. In other panels, the black dashed lines show the trajectories without control, corresponding to the case where $a_1(t) = 1$ for $t \in [0, 2]$. The curve segments in multiple colours show the trajectories of the droplet under the corresponding actions.

Figure 9

Figure 8. Droplet trajectory under control corresponding to the case 5.1 in table 1 with the parameters $h_0 = 0.25\sqrt {2}$, $\alpha _0 = 45^{\circ }$ and $Re=10^{-9}$, compared with the baseline case 5.2 ($Re=0.4$), which has been explained in detail in § 3.1. ($b$) Roller actions in the case of $h_0 = 0.25\sqrt {2}$ and $\alpha _0 = 45^{\circ }$, and the value history of the positions $[x{(t)},\ y{(t)}]$ and velocities $[u{(t)},\ v{(t)}]$.

Figure 10

Figure 9. ($a$) Controlled droplet trajectory for case 5.3 in table 1 with the parameters $h_0 = 0.25\sqrt {2}$, $\alpha _0 = 45^{\circ }$ and $Re=2$. ($b$) Action history and value history in the test. ($c$) Controlled droplet trajectory for case 5.4 in table 1 with the parameters $h_0 = 0.25\sqrt {2}$, $\alpha _0 = 45^{\circ }$ and $Re=3$. ($d$) Action history and value history in the test. Three rollers are activated in this control task.

Figure 11

Figure 10. The response test across $Re$ values. The dashed line represents the actions exerted by roller (2), denoted as $a_1(t)$. The red, green and blue lines are the velocity signals monitored at positions of $h_0=[0.25\sqrt {2},0.05\sqrt {2},0.25\sqrt {2}]$ and $\alpha _0=45^{\circ }$.

Figure 12

Figure 11. Robustness test based on adding thermal noise to the flow environment. (a) The accuracy as the number of successful controls out of 100 total runs. (b) The average distance of 100 total runs.

Figure 13

Figure 12. Final distances of droplets to the target position. Droplets are initially located at the cell centres on a $10 \times 10$ rectangular grid, uniformly distributed over $x \times y \in [-0.3, -0.1] \times [0.1, 0.3]$. The applied policy was trained for the specific initial position $(-0.25, 0.25)$, denoted by the red dot. The green dots indicate the final distance to the target is less than $h_e=0.005$. Some thermal noise with $ {Pe} = 10^{5}$ has been added to the flow environment in these tests.

Figure 14

Figure 13. The first reward term $r_1(t)$ against the angle in degrees between two displacement vectors.

Figure 15

Figure 14. The trajectories obtained by policies trained using only $r_1(t)$ with different choices of $p$. The other terms in the reward function, i.e. $r_2(t), r'$, are excluded in this test.

Figure 16

Figure 15. The second reward term $r_2(t)$ against the distance to the origin $h(t)$.

Figure 17

Figure 16. (a) The trajectory obtained by policy trained with $r_1(t)+r_2(t)'$ using $[p,q]=[2,30]$ without adding $r'$; (b,c) the training history.

Figure 18

Figure 17. (a) The trajectory obtained by policy trained with $r_1(t)+r_2(t)+r'$ using $[p,q,c]=[2,30,2]$; (b,c) the training history.

Figure 19

Figure 18. (a) The trajectory obtained by policy trained with $r_1(t)+r_2(t)+r'$ using $[p,q,c]=[2,30,4]$; (b,c) the training history.

Figure 20

Figure 19. ($a$) Droplet trajectory under control corresponding to case 5.3 in table 1 with the parameters $h_0 = 0.25\sqrt {2}$, $\alpha _0 = 45^{\circ }$ and $Re=2$. The dashed lines result from the application of geometric symmetry in FRM. ($b$) Action history and value history in the test. Additionally, different input configurations for the DRL state were tested, including both position and velocity and position alone. The policy using only position failed to converge. Two closest rollers are activated in the control task.