Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-07T00:50:45.680Z Has data issue: false hasContentIssue false

Optimal experiment design with adjoint-accelerated Bayesian inference

Published online by Cambridge University Press:  31 May 2024

Matthew Yoko
Affiliation:
Department of Engineering, University of Cambridge, Cambridge, UK
Matthew P. Juniper*
Affiliation:
Department of Engineering, University of Cambridge, Cambridge, UK
*
Corresponding author: Matthew P. Juniper; Email: mpj1001@cam.ac.uk

Abstract

We develop and demonstrate a computationally cheap framework to identify optimal experiments for Bayesian inference of physics-based models. We develop the metrics (i) to identify optimal experiments to infer the unknown parameters of a physics-based model, (ii) to identify optimal sensor placements for parameter inference, and (iii) to identify optimal experiments to perform Bayesian model selection. We demonstrate the framework on thermoacoustic instability, which is an industrially relevant problem in aerospace propulsion, where experiments can be prohibitively expensive. By using an existing densely sampled dataset, we identify the most informative experiments and use them to train the physics-based model. The remaining data are used for validation. We show that, although approximate, the proposed framework can significantly reduce the number of experiments required to perform the three inference tasks we have studied. For example, we show that for task (i), we can achieve an acceptable model fit using just 2.5% of the data that were originally collected.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Diagram of the Rijke tube, rotated for convenience.

Figure 1

Figure 2. (a) Top view, (b) side view, and (c) isometric view of the heater, which consists of two identical concentric annular ceramic plates, each wound with nichrome wire. It is held in place by two threaded support prongs. The dimensions are $ d=47\hskip0.35em \mathrm{mm} $, $ {d}_i=31.6\hskip0.35em \mathrm{mm} $, $ {d}_w=0.6\hskip0.35em \mathrm{mm} $, $ t=5\hskip0.35em \mathrm{mm} $, $ h=5\hskip0.35em \mathrm{mm} $, and $ {d}_p=3\hskip0.35em \mathrm{mm} $. The power is supplied to the nichrome wire by two fabric-insulated copper wires (not shown), which each have diameter $ 4\hskip0.35em \mathrm{mm} $.

Figure 2

Figure 3. Illustration of parameter inference on a simple univariate system. (a) The marginal probability distributions of the prior and data, $ p(a) $ and $ p(z) $, as well as their joint distribution, $ p\left(a,z\right) $ are plotted on axes of parameter value, $ a $, vs. observation outcome, $ z $. (b) The model, H, imposes a functional relationship between the parameters, $ a $, and the predictions, $ s $. Marginalizing along the model predictions yields the true posterior, $ p\left(a|z\right) $. This is computationally intractable for even moderately large parameter spaces. (c) Instead of evaluating the full posterior, we use gradient-based optimization to find its peak. This yields the most probable parameters, $ {a}_{\mathrm{MP}} $.

Figure 3

Figure 4. Illustration of uncertainty quantification for three univariate systems, comparing the true posterior, $ p\left(a|z\right) $ to the approximate posterior from Laplace’s method, $ p{\left(a|z\right)}_L $. (a) The model is linear in the parameters, so the true posterior is Gaussian and Laplace’s method is exact. (b) The model is weakly nonlinear in the parameters, the true posterior is slightly skewed, but Laplace’s method yields a reasonable approximation. (c) The model is strongly nonlinear in the parameters, the posterior is multi-modal and Laplace’s method underestimates the uncertainty.

Figure 4

Figure 5. (a) At each candidate experiment design, $ {x}_{i+1} $, the two candidate models make slightly different predictions, with different uncertainties. (b) Each model encodes a belief that the next data point will fall within the distribution $ p\left({z}_{i+1}|{H}_j\right) $.

Figure 5

Figure 6. Three steps of active data selection comparing experimental data to model predictions after assimilating: (a) no data, (b) the first datapoint with maximum information content, and (c) the second datapoint with maximum information content. For each step, we show (i) growth rate, zr, (ii) angular frequency, zi, and (iii) information content, ΔS, plotted against heater position, Xh. For comparison, we also show (d) the result from assimilating the two experiments with minimum information content.

Figure 6

Figure 7. Comparison of learning rate for three experiment design strategies: (blue) sequentially performing the most informative experiments, (orange) sequentially performing the least informative experiments, (gray) 1,000 instances of sequentially performing random experiments. Plot (a) shows how the Shannon entropy of the parameter probability distribution decreases as additional experiments are assimilated. Plot (b) shows the information gained from each experiment, which is given by the change in Shannon entropy. We show the information gain estimated before the data are assimilated using equation (16) (+), as well as the actual achieved information gain, calculated after the experiment is assimilated (○).

Figure 7

Figure 8. Four steps of active data selection for assimilating data from experiments with the heater active. We compare experimental data to model predictions after assimilating: (a) no data, and (b–d) the first, second, and third observation with maximum information content. For each step, we show (i) growth rate, $ {z}_r $, (ii) angular frequency, $ {z}_i $, and (iii) information content, $ \Delta S $, plotted against heater position, $ {X}_h $, and heater power, $ {Q}_h $.

Figure 8

Figure 9. Comparison of learning rate for three experiment design strategies: (blue) sequentially performing the most informative experiments, (orange) sequentially performing the least informative experiments, (gray) 1,000 instances of sequentially performing random experiments. Plot (a) shows how the Shannon entropy of the parameter probability distribution decreases as additional data are assimilated. Plot (b) shows the information gained from each experiment, which is quantified from the change in Shannon entropy before and after the data were assimilated. We show the information gain estimated before the data are assimilated, using equation (16) (+), as well as the actual achieved information gain, calculated after the data are assimilated (○).

Figure 9

Figure 10. Three stages of optimal sensor placement: (a) reference mic only, (b) one additional mic, and (c) two additional mics. Figures show (i) the real component of the pressure, $ \mathit{\operatorname{Re}}(P) $, vs. axial position in the tube, $ X $, (ii) the imaginary component of pressure, $ \mathit{\operatorname{Im}}(P) $, and (iii) the expected information gain, $ \Delta S $, from a microphone placed at any axial position. Predictions are plotted as solid lines, with uncertainties indicated with shaded regions. Available microphone data are plotted in teal in (i, ii), and as open circles in (iii). Assimilated microphone data are colored with the appropriate shade of red.

Figure 10

Figure 11. Posterior joint probability distributions after three stages of optimal sensor placement: (a) reference mic only, (b) one additional mic, and (c) two additional mics. The joint distribution between pairs of parameters is indicated in each frame using contours of one, two, and three standard deviations from the mean. The parameters are the absolute value and angle of the upstream, $ {R}_u $ and downstream reflection coefficients, $ {R}_d $, and the boundary layer dissipation strength, $ \eta $. The axes are labeled with the ±3 standard deviation bounds.

Figure 11

Figure 12. Comparison between sequentially assimilating all available mic data. At each step, we select the (a) best and (b) worst microphones. Figures show (i) the real component of the pressure, $ \mathit{\operatorname{Re}}(P) $, vs. axial position in the tube, $ X $, (ii) the imaginary component of pressure, $ \mathit{\operatorname{Im}}(P) $, and (iii) the expected information gain, $ \Delta S $, from a microphone placed at any axial position. Predictions are plotted as solid lines, with uncertainties indicated with shaded regions. Uncertainties after assimilating only the reference mic are shaded in blue. Uncertainties after assimilating additional mics are colored with shades of red from dark (fewest additional measurements) to light (most additional measurements).

Figure 12

Figure 13. Comparison of learning rate for three sensor placement strategies: (blue) sequentially placing the sensors in the best locations, (orange) sequentially placing the sensors in the worst locations, (gray) 1,000 instances of sequentially placing the sensors in random locations. Plot (a) shows how the Shannon entropy of the parameter probability distribution decreases as additional mic data are assimilated. Plot (b) shows the information gained from each microphone, which is quantified from the change in Shannon entropy before and after the mic data were assimilated. We show the information gain estimated before the data are assimilated, using equation (16) (+), as well as the actual achieved information gain, calculated after the data are assimilated (○).

Figure 13

Figure 14. Predictions produced by three candidate models. We compare (i) growth rate, $ {z}_r $, and (ii) angular frequency, $ {z}_i $ predictions produced by the baseline model (blue) against those produced by (a) model A (red) and (b) model B (red). The initial experiments selected to train the models are shown with orange markers.

Figure 14

Figure 15. Four model comparison metrics are plotted against the number of experiments assimilated. The metrics are (i) the $ \log $ marginal likelihood, $ \log (ML) $, (ii) the $ \log $ best fit likelihood, $ \log $(BFL), (iii) the $ \log $ Occam factor, $ \log (OF) $, and (iv) the $ \log $ of the ratio of marginal likelihoods between two models, $ \log (MLR) $. In panel (a), we compare the baseline model (dark red) to model A (light red), and in panel (b), we compare the baseline model (dark red) to model B (light red). In (i–iii), we only show the results produced by selecting the best experiment at each step. In (iv), we show the results produced by recursively selecting (blue) the best experiments, (orange) the worst experiments, and (gray) random experiments. A positive $ \log (MLR) $ means that the baseline model is preferred.

Submit a response

Comments

No Comments have been published for this article.