Hostname: page-component-77f85d65b8-7lfxl Total loading time: 0 Render date: 2026-03-27T06:47:58.127Z Has data issue: false hasContentIssue false

Interpretable and efficient data-driven discovery and control of distributed systems

Published online by Cambridge University Press:  14 November 2025

Florian Wolf*
Affiliation:
Department of Mathematics, Technical University of Darmstadt, Darmstadt, Germany Department of Mathematics, MOX, Politecnico di Milano, Milan, Italy The Computing + Mathematical Sciences Department, California Institute of Technology , Pasadena, USA
Nicolò Botteghi
Affiliation:
Department of Mathematics, MOX, Politecnico di Milano, Milan, Italy
Urban Fasel
Affiliation:
Department of Aeronautics, Imperial College London , London, UK
Andrea Manzoni
Affiliation:
Department of Mathematics, MOX, Politecnico di Milano, Milan, Italy
*
Corresponding author: Florian Wolf; Email: florian.wolf@stud.tu-darmstadt.de

Abstract

Effectively controlling systems governed by partial differential equations (PDEs) is crucial in several fields of applied sciences and engineering. These systems usually yield significant challenges to conventional control schemes due to their nonlinear dynamics, partial observability, high-dimensionality once discretized, distributed nature, and the requirement for low-latency feedback control. Reinforcement learning (RL), particularly deep RL (DRL), has recently emerged as a promising control paradigm for such systems, demonstrating exceptional capabilities in managing high-dimensional, nonlinear dynamics. However, DRL faces challenges, including sample inefficiency, robustness issues, and an overall lack of interpretability. To address these challenges, we propose a data-efficient, interpretable, and scalable Dyna-style model-based RL framework specifically tailored for PDE control. Our approach integrates Sparse Identification of Nonlinear Dynamics with Control within an Autoencoder-based dimensionality reduction scheme for PDE states and actions (AE+SINDy-C). This combination enables fast rollouts with significantly fewer environment interactions while providing an interpretable latent space representation of the PDE dynamics, facilitating insight into the control process. We validate our method on two PDE problems describing fluid flows—namely, the 1D Burgers equation and 2D Navier–Stokes equations—comparing it against a model-free baseline. Our extensive analysis highlights improved sample efficiency, stability, and interpretability in controlling complex PDE systems.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Overview of the RL training loop. In Dyna-style algorithms, we choose if the agent interacts with the full-order model, requiring (expensive) environment rollouts or the learned surrogate, that is reduced order, model, providing fast approximated rollouts. In this work, we focus on the setting where the full-order reward is (analytically) known and only the dynamics are approximated. In general, the observed state is computed by $ {\mathrm{\mathbb{R}}}^{N_x^{\mathrm{Obs}}}\ni {\mathbf{x}}_{t+1}^{\mathrm{Obs}}=C\cdot {\mathbf{x}}_{t+1} $. In the partially observable (PO) case, the projection matrix $ C\in {\left\{0,1\right\}}^{N_x\times {N}_x^{\mathrm{Obs}}} $ is structured with a single 1 per row and zero elsewhere, that is, $ {N}_x^{\mathrm{Obs}}\ll {N}_x $. In the fully observable case $ C\equiv {\mathrm{Id}}_{{\mathrm{\mathbb{R}}}^{N_x}} $, that is, $ {N}_x^{\mathrm{Obs}}={N}_x $.

Figure 1

Figure 2. AE architecture and loss function used during the training stage. Trainable parameters are highlighted in red. The different stages of the training scheme can be listed as follows. (1) the current state $ {\mathbf{x}}_t $, applied control $ {\mathbf{u}}_t $, and the next state $ {\mathbf{x}}_{t+1} $ are provided as input data. (2) After compressing both the current state and the control vector, the SINDy-C algorithm is applied in the latent space, yielding a low-dimensional representation of the prediction for the next state. (3) The latent space representations of the current state, the control, and the next state prediction are decoded. (4) The classical AE loss is computed. (5) The SINDy-C loss and a regularization term to promote sparsity are computed. Figure inspired by Conti et al. (2023, figure 1).

Figure 2

Table 1. Performance comparison of the Dyna-style AE+SINDy-C method for Burgers’ equation

Figure 3

Figure 3. Sample efficiency of the Dyna-style AE+SINDy-C method for Burgers’ equation. We test $ {k}_{\mathrm{dyn}}=5,10 $ against the full-order baseline for the fully observable (solid line) and partially observable (dashed line). The dashed vertical lines indicate the point of early stopping for each of the model classes (FO + PO) after 100 epochs and represent the models which are evaluated in detail in Appendix A.1. For the evaluation, the performance over five fixed random seeds is used.

Figure 4

Figure 4. State and control trajectories for Burgers’ equation in the PO case. The initial condition is a bell-shaped hyperbolic cosine (eq. [4.3] with $ \alpha =0.5 $ fixed), we use $ \nu =0.01 $ (two orders of magnitude smaller compared to the training phase), and the black solid line indicates the timestep $ t $ when the controller is activated. (a) FOM states. (b) AE + SINDy-C states, $ {k}_{\mathrm{dyn}}=5 $. (c) AE + SINDy-C states, $ {k}_{\mathrm{dyn}}=10 $. (d) FOM controls. (e) AE + SINDy-C controls, $ {k}_{\mathrm{dyn}}=5 $. (f) AE + SINDy-C controls, $ {k}_{\mathrm{dyn}}=10 $.

Figure 5

Figure 5. State and control trajectories for Burgers’ equation in the FO case. The initial condition is a bell-shaped hyperbolic cosine (eq. [4.3] with $ \alpha =0.5 $ fixed), we use $ \nu =0.01 $ (two orders of magnitude smaller compared to the training phase), and the black solid line indicates the timestep $ t $ when the controller is activated. (a) FOM states. (b) AE + SINDy-C states, $ {k}_{\mathrm{dyn}}=5 $. (c) AE + SINDy-C states, $ {k}_{\mathrm{dyn}}=10 $. (d) FOM controls. (e) AE + SINDy-C controls, $ {k}_{\mathrm{dyn}}=5 $. (f) AE + SINDy-C controls, $ {k}_{\mathrm{dyn}}=10 $.

Figure 6

Figure 6. Analysis of the coefficient matrix $ \boldsymbol{\Xi} \in {\mathrm{\mathbb{R}}}^{d\times {N}_x^{\mathrm{Obs}}} $ for Burgers’ equation. (a) Partially observable case, $ {k}_{\mathrm{dyn}}=5 $. (b) Fully observable case, $ {k}_{\mathrm{dyn}}=5 $. (c) Partially observable case, $ {k}_{\mathrm{dyn}}=10 $. (d) Fully observable case, $ {k}_{\mathrm{dyn}}=10 $.

Figure 7

Table 2. Training time and overview of the loss distribution of eq. (3.1) during the training phase of the Dyna-style AE+SINDy-C method for Burgers’ equation

Figure 8

Figure 7. Sample efficiency of the Dyna-style AE+SINDy-C method for the Navier–Stokes equations. We test $ {k}_{\mathrm{dyn}}=5 $ against the full-order model-free baseline. The dashed vertical lines indicate the point of early stopping for each of the model after 750 epochs and represent the models which are evaluated in detail in Section 4.2.2. For the evaluation, the performance over five fixed random seeds is used.

Figure 9

Figure 8. Velocity field and control trajectories for the model-free baseline and AE+SINDy-C for the Navier–Stokes equations. Black arrows represent the velocity fields and the background color the magnitude of the velocity vector. (a) Full-order baseline model, Velocity field at t = 0.2. (b) Full-order baseline model, control trajectory. (c) AE + SINDy-C with $ {k}_{\mathrm{dyn}} $ = 5 and an eight-dimensional latent space, velocity field at 𝑡 = 0.2. (d) AE + SINDy-C with $ {k}_{\mathrm{dyn}} $ = 5 and an eight-dimensional latent space, control trajectory. (e) AE + SINDy-C with $ {k}_{\mathrm{dyn}} $ = 5 and a four-dimensional latent space, velocity field at t = 0.2. (f) AE + SINDy-C with $ {k}_{\mathrm{dyn}} $ = 5 and a four-dimensional latent space, control trajectory.

Figure 10

Table 3. Performance comparison of the Dyna-style AE+SINDy-C method for the Navier–Stokes equations

Figure 11

Figure 9. Analysis of the coefficient matrix $ \boldsymbol{\Xi} \in {\mathrm{\mathbb{R}}}^{d\times {N}_x^{\mathrm{Obs}}} $ for the Navier–Stokes equations. (a) Eight-dimensional latent space. (b) Four-dimensional latent space.

Figure 12

Table 4. Analysis of the internal loss distribution for the training and validation data during the AE training as well as the training time for the Navier–Stokes equations, trained on a MacBook M1 (2021, 16GB RAM)

Figure 13

Figure A1. State and control trajectories for Burgers’ equation in the partially observable (PO) case with $ \mathcal{U}\left({\left[-1,1\right]}^{N_x}\right) $ as initial distribution. The black dashed line indicates the timestep $ t $ of extrapolation in time. (a) FOM: states. (b) AE + SINDy-C with $ {k}_{\mathrm{dyn}}=5 $. (c) AE + SINDy-C with $ {k}_{\mathrm{dyn}}=10 $. (d) FOM: controls. (e) AE + SINDy-C with $ {k}_{\mathrm{dyn}}=5 $. (f) AE + SINDy-C with $ {k}_{\mathrm{dyn}}=5 $.

Figure 14

Figure A2. State and control trajectories for Burgers’ equation in the fully observable (FO) case with $ \mathcal{U}\left({\left[-1,1\right]}^{N_x}\right) $ as initial distribution. The black dashed line indicates the timestep $ t $ of extrapolation in time. a) FOM: states. (b) AE + SINDy-C with $ {k}_{\mathrm{dyn}}=5 $. (c) AE + SINDy-C with $ {k}_{\mathrm{dyn}}=10 $. (d) FOM: controls. (e) AE + SINDy-C with $ {k}_{\mathrm{dyn}}=5 $. (f) AE + SINDy-C with $ {k}_{\mathrm{dyn}}=10 $.

Figure 15

Table B1. Environment details for ControlGym’s implemention of Burgers’ equation, the diffusivity constant $ \nu =1.0 $ is fixed

Figure 16

Table B2. Environment details for PDEControlGym’s implemention of the Navier–Stokes equations

Figure 17

Table C1. DRL algorithm configuration details for Burgers’ equation experiment

Figure 18

Table C2. DRL algorithm configuration details for the Navier–Stokes equations experiment

Figure 19

Table D1. Details of the AE+SINDy-C surrogate model for Burgers’ equation

Figure 20

Table D2. Details of the AE+SINDy-C surrogate model for the Navier–Stokes equations

Submit a response

Comments

No Comments have been published for this article.