Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-07T19:03:36.293Z Has data issue: false hasContentIssue false

METHODOLOGY AND CHALLENGES OF SURROGATE MODELLING METHODS FOR MULTI-FIDELITY EXPENSIVE BLACK-BOX PROBLEMS

Published online by Cambridge University Press:  31 May 2024

NICOLAU ANDRÉS-THIÓ*
Affiliation:
School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia; e-mail: smith-miles@unimelb.edu.au ARC Training Centre in Optimisation Technologies, Integrated Methodologies, and Applications (OPTIMA), Parkville, Victoria 3010, Australia
MARIO ANDRÉS MUÑOZ
Affiliation:
ARC Training Centre in Optimisation Technologies, Integrated Methodologies, and Applications (OPTIMA), Parkville, Victoria 3010, Australia School of Computer and Information Systems, The University of Melbourne, Parkville, Victoria 3010, Australia; e-mail: munoz.m@unimelb.edu.au
KATE SMITH-MILES
Affiliation:
School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia; e-mail: smith-miles@unimelb.edu.au ARC Training Centre in Optimisation Technologies, Integrated Methodologies, and Applications (OPTIMA), Parkville, Victoria 3010, Australia
Rights & Permissions [Opens in a new window]

Abstract

Many industrial design problems are characterized by a lack of an analytical expression defining the relationship between design variables and chosen quality metrics. Evaluating the quality of new designs is therefore restricted to running a predetermined process such as physical testing of prototypes. When these processes carry a high cost, choosing how to gather further data can be very challenging, whether the end goal is to accurately predict the quality of future designs or to find an optimal design. In the multi-fidelity setting, one or more approximations of a design’s performance are available at varying costs and accuracies. Surrogate modelling methods have long been applied to problems of this type, combining data from multiple sources into a model which guides further sampling. Many challenges still exist; however, the foremost among them is choosing when and how to rely on available low-fidelity sources. This tutorial-style paper presents an introduction to the field of surrogate modelling for multi-fidelity expensive black-box problems, including classical approaches and open questions in the field. An illustrative example using Australian elevation data is provided to show the potential downfalls in blindly trusting or ignoring low-fidelity sources, a question that has recently gained much interest in the community.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Australian Mathematical Publishing Association Inc.
Figure 0

Figure 1 A two-dimensional sampling plan of 25 samples with clearly undesirable properties. The thin grey lines indicate the division of each of the two dimensions into 25 sections. Each cross represents a sample. Note that as each row and column contains only a single sample, this satisfies the definition of an LHS plan.

Figure 1

Figure 2 (a) A randomly generated LHS sampling plan of 25 samples inside the two-dimensional space $[0,1]^2$. The minimum Euclidean distance between a pair of points is 0.0754. (b) Result of locally optimizing the sampling plan by swapping coordinates between pairs of points. This minimum distance between a pair of points is 0.1602 and the sample is more evenly spread out in the space.

Figure 2

Figure 3 (a) Procedure when choosing a subset of an LHS plan. The crosses in both panels show an LHS plan of size 25 inside the two-dimensional space $[0,1]^2$. The red circles represent a randomly chosen subset of size 10. The minimum distance between pairs of points is 0.1602. (b) Results of locally optimizing this subset in panel (a) by swapping points inside and outside of the set. The final subset is more evenly spread out, and the minimum distance between pairs of points is 0.2400.

Figure 3

Figure 4 Elevation map of Australia. The blue squares mark the capital cities of the country and the red triangles mark the highest point in each state and territory.

Figure 4

Figure 5 Low-fidelity source of the elevation of Australia, generated by adding a deterministic error to the true elevation. The blue squares mark the capital cities of the country, and the red triangles mark the highest point in each state and territory.

Figure 5

Figure 6 Low-fidelity source of the elevation of Australia, which is an approximation based on the known elevation of the capital cities. The blue squares mark the capital cities of the country, and the red triangles mark the highest point in each state and territory.

Figure 6

Figure 7 Locations at which samples have been gathered and made available to train a model. The blue crosses represent the 25 locations at which the high-fidelity source has been sampled. The orange squares represent 50 locations at which the low-fidelity source $f^{\text {approx}}_l$ has been sampled and are spread out across the whole space. The green circles represent 50 locations at which the low-fidelity source $f^{\text {approx}}_l$ has been sampled; this last set of samples has been restricted to lie on locations above sea level, marked by the shown outline.

Figure 7

Table 1 Error of constructed models for five repetitions. The second column shows the error of Kriging models constructed using 25 high-fidelity samples spread out across the space. The third column shows the error of co-Kriging models constructed with high-fidelity data and 50 samples from $f^{\text {error}}_l$ spread out across the space. The fourth column shows the error of co-Kriging models constructed with high-fidelity data and 50 samples from $f^{\text {error}}_l$ spread out across locations above sea level.

Figure 8

Figure 8 Average error of three surrogate models trying to model the elevation of Australia. The performance is shown for Kriging models which only use data from $f_h$ (blue), co-Kriging models which use data from $f_h$ and $f_l^{\text {error}}$ (orange), and co-Kriging models which use data from $f_h$ and $f_l^{\text {approx}}$ (green).

Figure 9

Figure 9 Average best point found during the optimization of three surrogate modelling techniques. The performance is shown for Kriging models which only use data from $f_h$ (blue), co-Kriging models which use data from $f_h$ and $f_h^{\text {error}}$ (orange), and co-Kriging models which use data from $f_h$ and $f_h^{\text {approx}}$ (green).