Hostname: page-component-5db58dd55d-pjp64 Total loading time: 0 Render date: 2026-05-31T06:12:19.491Z Has data issue: false hasContentIssue false

A novel adaptive sampling approach with batch selection for the automatic generation of surrogate models in geotechnical engineering

Published online by Cambridge University Press:  28 January 2026

Yunxiang Yang*
Affiliation:
Department of Civil and Environmental Engineering, Imperial College London, UK
Agustín Ruiz López
Affiliation:
Department of Civil and Environmental Engineering, Imperial College London, UK Seequent—The Bentley Subsurface Company, Netherlands
Aikaterini Tsiampousi
Affiliation:
Department of Civil and Environmental Engineering, Imperial College London, UK
David M.G. Taborda
Affiliation:
Department of Civil and Environmental Engineering, Imperial College London, UK
*
Corresponding author: Yunxiang Yang; Email: yunxiang.yang22@imperial.ac.uk

Abstract

Surrogate models have gained widespread popularity for their effectiveness in replacing computationally expensive numerical analyses, particularly in scenarios such as design optimization procedures, requiring hundreds or thousands of simulations. While one-shot sampling methods—where all samples are generated in a single stage without prior knowledge of the required sample size—are commonly adopted in the creation of surrogate models, these methods face significant limitations. Given that the characteristics of the underlying system are generally unknown prior to training, adopting one-shot sampling can lead to suboptimal model performance or unnecessary computational costs, especially in complex or high-dimensional problems. This paper addresses these challenges by proposing a novel, model-independent adaptive sampling approach with batch selection, termed Cross-Validation Batch Adaptive Sampling for High-Efficiency Surrogates (CV-BASHES). CV-BASHES is first validated using two analytical functions to explore its flexibility and accuracy under different configurations, confirming its robustness. Comparative studies on the same functions with two state-of-the-art methods, maximum projection (MaxPro) and scalable adaptive sampling (SAS), demonstrate the superior accuracy and robustness of CV-BASHES. Its applicability is further demonstrated through a geotechnical application, where CV-BASHES is used to develop a surrogate model to predict the horizontal deformation of a diaphragm wall supporting a deep excavation. Results show that CV-BASHES efficiently selects training samples, reducing the dataset size while maintaining high surrogate accuracy. By offering more efficient sampling strategies, CV-BASHES streamlines and enhances the process of creating machine learning models as surrogates for tackling complex problems in general engineering disciplines.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use.
Open Practices
Open data
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Figure 1. Comparison of different sampling strategies.

Figure 1

Figure 2. Five-fold cross-validation.

Figure 2

Figure 3. Generation and pre-filtering of starting points for multiple hill-climbing.

Figure 3

Figure 4. Adaptive sampling process with constraint-based batch selection.

Figure 4

Figure 5. Flowchart of the proposed adaptive sampling method.

Figure 5

Table 1. Details of the two analytical test cases

Figure 6

Figure 6. Response surface of drop-wave function (left) and Schwefel function (right).

Figure 7

Figure 7. Initial database for the two analytical test cases.

Figure 8

Table 2. Hyperparameter tuning intervals for SVR and GP

Figure 9

Figure 8. Learning curve of R2 testing score for drop-wave function (left) and Schwefel function (right).

Figure 10

Figure 9. Parametric definition of the excavation problem.

Figure 11

Figure 10. Coefficient of earth pressure at rest (K0) profile below the top of London Clay.

Figure 12

Table 3. Outline of the parameterized excavation sequence

Figure 13

Table 4. Stiffness properties adopted in IC MAGE M01 model for London Clay

Figure 14

Table 5. Design intervals for the ANN surrogate

Figure 15

Figure 11. Visualization of the initial training database (left) and the testing database (right).

Figure 16

Figure 12. Architecture of the single-output ANN.

Figure 17

Table 6. Hyperparameter tuning ranges for ANN optimization

Figure 18

Figure 13. Learning curve of the ANN surrogate with CV-BASHES.

Figure 19

Figure 14. Computational time for developing the ANN surrogate with CV-BASHES.

Figure 20

Figure 15. Comparison (a,b) and residual (c,d) plots of the final surrogate: training (a,c) and testing (b,d).

Figure 21

Figure 16. Boxplots of the final surrogate model on RMSE (left) and R2 (right) score for testing data.

Figure 22

Figure 17. Analysis of the ANN’s worst (a,c) and 75th percentile (b,d) performance as judged by the R2 (a,b) and RMSE (c,d) scores.

Figure 23

Figure A1. Intermediate (left) and final (right) database of drop-wave function.

Figure 24

Figure A2. Intermediate (left) and final (right) database of Schwefel function.

Figure 25

Table A1. Summary of surrogate model performance (mean R2 and RMSE) for drop-wave and Schwefel functions

Submit a response

Comments

No Comments have been published for this article.