Active learning for regression in engineering populations: a risk-informed approach

Daniel R. Clarkson; Lawrence A. Bull; Tina A. Dardeno; Chandula T. Wickramarachchi; Elizabeth J. Cross; Timothy J. Rogers; Keith Worden; Nikolaos Dervilis; Aidan J. Hughes

doi:10.1017/dce.2025.7

Active learning for regression in engineering populations: a risk-informed approach

Published online by Cambridge University Press: 21 February 2025

Daniel R. Clarkson

Lawrence A. Bull

Tina A. Dardeno

Chandula T. Wickramarachchi

and

Daniel R. Clarkson*: Affiliation:
Dynamics Research Group, University of Sheffield, Sheffield, UK
Lawrence A. Bull: Affiliation:
School of Mathematics and Statistics, University of Glasgow, Glasgow, UK
Tina A. Dardeno: Affiliation:
Dynamics Research Group, University of Sheffield, Sheffield, UK
Chandula T. Wickramarachchi: Affiliation:
Dynamics Research Group, University of Sheffield, Sheffield, UK
Elizabeth J. Cross: Affiliation:
Dynamics Research Group, University of Sheffield, Sheffield, UK
Timothy J. Rogers: Affiliation:
Dynamics Research Group, University of Sheffield, Sheffield, UK
Keith Worden: Affiliation:
Dynamics Research Group, University of Sheffield, Sheffield, UK
Nikolaos Dervilis: Affiliation:
Dynamics Research Group, University of Sheffield, Sheffield, UK
Aidan J. Hughes: Affiliation:
Dynamics Research Group, University of Sheffield, Sheffield, UK
*: Corresponding author: Daniel R. Clarkson; Email: dclarkson1@sheffield.ac.uk

Article contents

Abstract
Impact Statement
Introduction
Background information
Case study—A population of machining tools
Results
Discussion
Conclusion
Data availability statement
Author contribution
Funding statement
Competing interest
Ethical standards
References

Abstract

Regression is a fundamental prediction task common in data-centric engineering applications that involves learning mappings between continuous variables. In many engineering applications (e.g., structural health monitoring), feature-label pairs used to learn such mappings are of limited availability, which hinders the effectiveness of traditional supervised machine learning approaches. This paper proposes a methodology for overcoming the issue of data scarcity by combining active learning (AL) for regression with hierarchical Bayesian modeling. AL is an approach for preferentially acquiring feature-label pairs in a resource-efficient manner. In particular, the current work adopts a risk-informed approach that leverages contextual information associated with regression-based engineering decision-making tasks (e.g., inspection and maintenance). Hierarchical Bayesian modeling allow multiple related regression tasks to be learned over a population, capturing local and global effects. The information sharing facilitated by this modeling approach means that information acquired for one engineering system can improve predictive performance across the population. The proposed methodology is demonstrated using an experimental case study. Specifically, multiple regressions are performed over a population of machining tools, where the quantity of interest is the surface roughness of the workpieces. An inspection and maintenance decision process is defined using these regression tasks, which is in turn used to construct the active-learning algorithm. The novel methodology proposed is benchmarked against an uninformed approach to label acquisition and independent modeling of the regression tasks. It is shown that the proposed approach has superior performance in terms of expected cost—maintaining predictive performance while reducing the number of inspections required.

Keywords

active learning population modeling regression risk assessment structural health monitoring

Information

Type: Research Article
Information: Data-Centric Engineering , Volume 6 , 2025 , e16

DOI: https://doi.org/10.1017/dce.2025.7 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Impact Statement

This paper addresses online learning of regression models in a cost-effective manner. The problem is addressed via two avenues: active learning, that is, choosing information critical to decisions, and information sharing, using a hierarchical Bayesian model. This paper presents a decision-theoretic approach that allows the monetary benefits to be quantified exactly. Regression is a prominent and important tool in many decision-support technologies such as health monitoring and digital twins. Improving performance when learning regressions in an online manner could lead to reduced costs and improved safety in many science and engineering applications.

1. Introduction

Structural health monitoring (SHM) offers proactive solutions to ensure the safety and reliability of various items of infrastructure. SHM systems consist of data acquisition and processing systems to enable the detection of damage in monitored structures. The aim of SHM is to help inform decision-making, particularly for the operation and maintenance of high-value and safety-critical infrastructure. Improving decision-making in SHM has economic benefits, by reduction of unnecessary inspections and interventions, and safety benefits, by reducing the likelihood of failure events via informed interventions.

Statistical pattern recognition (SPR) is widely recognized as the primary tool for data-driven predictions in SHM systems Farrar and Worden (Reference Farrar and Worden2012). Regression models are a fundamental component of the SPR approach to decision-support technologies, enabling the prediction of continuous outcomes based on acquired data. For example, within the context of SHM, by associating the target variables of a regression model with salient health-states, inferences can be made regarding the condition of a structure of interest—although assumed as discrete in many SHM problems, damage progression is usually, in reality, a continuous variable.

A significant challenge in SHM is the scarcity of data. All data-driven models can suffer from bias and high uncertainty without sufficient training data, leading to unreliable predictions. Acquiring extensive labeled datasets capturing a structure’s behavior across various health conditions is prohibitively costly and often unattainable for essential infrastructures. Inadequate data motivate sharing information between similar assets Bull et al. (Reference Bull, Di Francesco, Dhada, Steinert, Lindgren, Parlikad, Duncan and Girolami2023). From this, a new approach has emerged, population-based structural health monitoring (PBSHM; Bull et al., Reference Bull, Gardner, Gosliga, Rogers, Dervilis, Cross, Papatheou, Maguire, Campos and Worden2021; Gosliga et al., Reference Gosliga, Gardner, Bull, Dervilis and Worden2021; Gardner et al., Reference Gardner, Bull, Gosliga, Dervilis and Worden2021; Tsialiamanis et al., Reference Tsialiamanis, Mylonas, Chatzi, Dervilis, Wagg and Worden2021).

1.1. PBSHM

PBSHM considers an entire population of structures. This approach assumes that structures that share common environmental conditions, load patterns, and/or aging effects and thus will share statistical commonalities. Considering structures as a population allows SHM users to share data between them. It allows a structure with rich historical data to lend its statistical strength to a data-poor structure. By capturing data from multiple structures, PBSHM aims to provide a more accurate and holistic health assessment and enables the identification of global patterns, trends, and anomalies that would be difficult to observe at the individual level. This strategy can not only improve predictions for structures with very little data but also makes the most out of comprehensive datasets that can be so costly to procure. PBSHM has a suite of tools that can be used to share information across a population, namely transfer learning and domain adaptation. For a thorough introduction to the population-based approach to SHM, see Bull et al. (Reference Bull, Gardner, Gosliga, Rogers, Dervilis, Cross, Papatheou, Maguire, Campos and Worden2021); Gosliga et al. (Reference Gosliga, Gardner, Bull, Dervilis and Worden2021); Gardner et al. (Reference Gardner, Bull, Gosliga, Dervilis and Worden2021); Tsialiamanis et al. (Reference Tsialiamanis, Mylonas, Chatzi, Dervilis, Wagg and Worden2021).

1.2. Transfer learning

In PBSHM, one of the most widely explored areas of transfer learning is known as domain adaptation in which feature data are mapped from a label-rich source domain to a label-scarce target domain, with the aim of reducing the distance between the source and target domain in a shared latent space such that label information can be shared. Fink et al. (Reference Fink, Wang, Svensen, Dersin, Lee and Ducoffe2020) discuss domain adaptation’s place in fleet prognostics and health management. Gardner et al. (Reference Gardner, Liu and Worden2020) utilized domain adaptation to transfer inferences across different structures, considering a population of laboratory-scale three-story buildings. Zhang et al. (Reference Zhang, Peng, Li, Chen and Zhang2017) used domain adaptation for fault diagnosis in the context of rotating machinery and between different machines (Li et al., Reference Li, Jia, Zhang, Ma, Luo and Li2020). Xu and Noh (Reference Xu and Noh2021) showed that transfer learning can be used to diagnose story-wise damage conditions of buildings effected by earthquakes.

1.3. Active learning (AL)

AL presents another avenue to address the limitations imposed by data scarcity in SHM. Conventional supervised machine-learning methods are infeasible for many SHM applications because of the costs associated with descriptive labels. This has led to the development of unsupervised and semi-supervised machine-learning techniques. AL is a set of techniques that selectively queries labels for otherwise unlabeled data that are most informative given the current model; the model can then be updated using this informed subset of labeled data. AL can be applied offline to a large pool of collected data (Wang et al., Reference Wang, Min, Zhang and Wu2017) or online, whereby the dataset is continuously updated as new observations are collected (Zhu et al., Reference Zhu, Zhang, Lin and Shi2007). The online setting is particularly significant in SHM; generally, data from a monitored structure will become available gradually throughout the life of the structure. In many cases, inspecting monitored systems can be extremely costly, so if a system can determine when only the most critical or informative observations need to be investigated, this could lead to significant reductions in maintenance costs (Bull et al., Reference Bull, Worden, Rogers, Wickramarachchi, Cross, McLeay, Leahy and Dervilis2019b).

AL has seen growing interest, particularly within SHM; Bull et al. (Reference Bull, Worden, Rogers, Wickramarachchi, Cross, McLeay, Leahy and Dervilis2019b) provide an online AL framework for the classification problem and show the effects of the framework via a case study on acoustic emissions data. Hughes et al. (Reference Hughes, Bull, Gardner, Barthorpe, Dervilis and Worden2022a) present a risk-based formulation of AL in which queries are guided by the expected value of information and outline a method to minimize the effects of sampling bias in AL (Hughes et al., Reference Hughes, Bull, Gardner, Dervilis and Worden2022b).

Historically, a large portion of the literature has focused on AL in the context of classification—this is particularly true in SHM applications (Bull et al., Reference Bull, Worden, Manson and Dervilis2018; Hughes et al. Reference Hughes, Bull, Gardner, Barthorpe, Dervilis and Worden2022a). Nonetheless, for many practical engineering scenarios, such as continuous degradation (Shahraki et al., Reference Shahraki, Yadav and Liao2017) and crack growth (He et al., Reference He, Zhao and Yan2023), regression-based models are more suitable, motivating research into AL for regression. For applications outside of SHM, it has been shown that many of the AL procedures suitable for classification purposes are also suitable for regression (Burbidge et al., Reference Burbidge, Rowland and King2007; Cai et al., Reference Cai, Zhang and Zhou2013). One of the first statistical analyses of AL regression (ALR) was provided by Cohn et al. (Reference Cohn, Ghahramani and Jordan1996), where a locally weighted regression model for studying the dynamics of a robot arm is showcased. More recently, Wu et al. (Reference Wu, Lin and Huang2019) proposed a “greedy sampling” method for ALR and applied it to several machine learning benchmark datasets and also considered a case study focused on estimating driver drowsiness. Specifically, a greedy sampling method proposed by Yu and Kim (Reference Yu and Kim2010) is expanded to be “greedy” in the output space and is shown to be a robust and effective method for AL. In a similar vein, Cai et al. (Reference Cohn, Ghahramani and Jordan2013) proposed an expected model change maximization framework for regression, and the learner chooses the unlabeled instance which causes the maximum change in the current model parameter. Freund et al. (Reference Freund, Seung, Shamir and Tishby1997) showed that query-by-committee is not only applicable to binary labels but also to discrete labels. Within the field of engineering, AL for regression has been studied in several applications. In Dodt et al. (Reference Dodt, Persoons, Faes and Moens2022), an AL scheme based on the predictive uncertainty from a Gaussian process regression is used to continuously calibrate a surrogate model used for monitoring a spot welding process. In Song et al. (Reference Song, Wei, Valdebenito, Faes and Beer2022), AL is used in the context of reliability analysis to reduce the number of computationally expensive simulations required for estimation of variance-based sensitivity indices—again, the AL procedure is applied to a Gaussian process regression. Di Fiore et al. (Reference Di Fiore, Nardelli and Mainini2024) provide an overview of AL approaches for engineering applications and highlight the relationship between Bayesian optimization and AL. In the paper, they summarize multiple approaches to AL for regressions and apply them to several simulated benchmarks problems. For additional comprehensive surveys of AL research, including applications to regression, the reader is directed to Kumar and Gupta (Reference Kumar and Gupta2020); Fu et al. (Reference Fu, Zhu and Li2013); Aggarwal et al. (Reference Aggarwal, Kong, Gu, Han, Philip and Aggarwal2014).

This article aims to combine the effects of the population-based approach with the efficiencies of AL to improve decision-making, reduce costs and more effectively allocate resources for SHM. This model will be applied to a population of machining tools.

1.3.1. Novel contribution

We propose a novel approach to risk-based AL for regression and combine it with information sharing via a hierarchical Bayesian model. The hierarchical model informs a novel risk-based query measure that incorporates costs associated with an engineering decision process, where the adaptive inspection schedule monitors a population of machine tools rather than a single system. Therefore, our contributions offer three practical advantages:

1. The cost/risk/utility function can be updated in an online manner, which allows for real-time decision support for industrial applications.
2. A risk-based approach means that monitoring costs are built directly into the decision process. This approach inherently solves budget allocation problems, a serious benefit for real-world applications.
3. This approach allows monitoring across a population, where inspecting one member in the population improves the predictions and monitoring capability for all members in the population.

1.3.2. Paper outline

Section 2 is organized into three subsections, the first shows a general framework for information sharing via hierarchical modeling, the second describes decision theory and computation of expected utility and risk, and the third subsection describes risk-based AL for decision analysis. Section 3 introduces a case study and applies the frameworks for hierarchical modeling and decision theory from Section 2.

2. Background information

The first part of this section outlines a framework for sharing information within homogeneous populations of structures; this forms the basis of the probabilistic regression model used in the current case study. The second part includes an introduction to decision theory which forms the basis of the AL procedure.

2.1. Hierarchical Bayesian modeling

Section 2.1 follows an explanation provided by the authors precious work (Bull et al., Reference Bull, Di Francesco, Dhada, Steinert, Lindgren, Parlikad, Duncan and Girolami2023). Consider data recorded from a population of K engineering structures. The population can be denoted,

(1)

$$ {\left\{{\mathbf{x}}_k,{\mathbf{y}}_k\right\}}_{k=1}^K={\left\{{\left\{{x}_{ik},{y}_{ik}\right\}}_{i=1}^{N_k}\right\}}_{k=1}^K $$

where $ {y}_k $ is a target response vector for inputs $ {x}_k $ and $ \left\{{x}_{ik},{y}_{ik}\right\} $ are the $ {i}^{\mathrm{th}} $ pair of observations in group $ k $ . There are $ {N}_k $ observations in each group and thus $ {\Sigma}_{k=1}^K{N}_k $ observations in total. The aim is to learn a set of $ K $ predictors related to a regression or classification task. This paper focuses on regression, where the tasks satisfy,

(2)

$$ {\left\{{y}_{ik}={f}_k\left({x}_{ik}\right)+{\unicode{x025B}}_k\right\}}_{k=1}^K $$

and the output $ {y}_{ik} $ is determined by evaluating one of $ K $ latent functions with separate additive noise $ {\varepsilon}_k $ . The mapping $ {f}_k $ is assumed to be correlated between members in the population. The models should be improved by learning the parameters in a joint inference over the whole population. In machine learning this is referred to as multitask learning; in statistics, such data are usually modeled with hierarchical models (Kreft and De Leeuw, Reference Kreft and De Leeuw1998; Gelman and Hill, Reference Gelman and Hill2006).

In practice, some members in a population may possess extensive historical data, while members that may have been more recently deployed will have very limited data for training. Learning separate independent models for each group might lead to unreliable results for data-poor members, while a single regression model of all the data (complete pooling) would result in poor generalization. Hierarchical Bayesian models can learn separate models for each member, while encouraging the parameters of these models to be correlated (Murphy, Reference Murphy2012). The established theory is summarized here.

Consider K linear regression models,

(3)

$$ {\left\{{\mathbf{y}}_k={\varPhi}_k{\alpha}_k+{\unicode{x025B}}_k\right\}}_{k=1}^K $$

where $ {\Phi}_k=\left[\mathbf{1},{\mathbf{x}}_k\right] $ is the $ {N}_k\times 2 $ design matrix; $ {\alpha}_k $ is the $ 2\times 1 $ vector of weights; and the noise vector is $ {N}_k\times 1 $ and normally distributed $ {\unicode{x025B}}_k\sim N\left(0,{\sigma^2}_k\mathbf{1}\right) $ . $ \mathbf{1} $ is a vector of ones, $ \mathbf{I} $ is the identity matrix, and $ N\left(m,s\right) $ is the normal distribution with mean $ m $ and (co)variance $ s $ . The likelihood of the target response vector is then

(4)

$$ {\mathbf{y}}_k\mid {\mathbf{x}}_k\sim \mathcal{N}\left({\Phi}_k{\alpha}_k,{\sigma_k}^2\mathbf{I}\right) $$

(5)

$$ \therefore {y}_k\mid {x}_k\sim \mathcal{N}\left({\alpha_1}^{(k)}+{\alpha_2}^{(k)}{x}_{ik},{\sigma_k}^2\right) $$

following the Bayesian methodology, one can set a common hierarchy of prior distributions over the weights (slope and intercept) for each member in the population, typically normal distributions are used for the weights of each group and inverse-Gamma for the variance parameter.

(6)

$$ {\left\{{\alpha}_K\right\}}_{k=1}^K\overset{i.i.d.}{\sim}\mathcal{N}\left({\mu}_{\alpha },\operatorname{diag}\left\{{\sigma}_{\alpha}^2\right\}\right) $$

(7)

$$ {\mu}_{\alpha}\sim \mathcal{N}\left({\mathbf{m}}_{\alpha },\operatorname{diag}\left\{{\mathbf{s}}_{\alpha}\right\}\right) $$

(8)

$$ {\sigma}_{\alpha}\overset{i.i.d.}{\sim}\mathcal{I}\mathcal{G}\left(a,b\right) $$

In words, Equation (6) assumes that the weights $ {\left\{{\alpha}_K\right\}}_{k=1}^K $ are normally distributed $ N\left(\cdot \right) $ with mean $ {\mu}_{\alpha } $ and covariance $ \mathit{\operatorname{diag}}\left\{{\sigma^2}_{\alpha}\right\} $ . Equation (7) states that prior expectation of the weights $ {\alpha}_k $ is normally distributed with mean $ {\mu}_{\alpha } $ and covariance $ \mathit{\operatorname{diag}}\left\{{\mathbf{s}}_{\alpha}\right\} $ . Equation (8) states that the prior deviation of the slope and intercept is inverse-Gamma distributed with shape $ a $ and scale $ b $ . A general representation of hierarchical regression can be seen in the direct graphical model in Figure 1.

Figure 1.

A graphical model representing the linear mixed model with partial pooling.

The parent nodes $ \left\{{\mu}_{\alpha },{\sigma^2}_{\alpha}\right\} $ are inferred from the data, so Equations (6)–(8) encode prior belief of the dependence between latent variables. If these parent nodes were a fixed value, rather than inferred, each model would be independent, preventing the flow of information between domains, this structure allows data-sparse domains to borrow statistical strength from those that are data rich (Murphy, Reference Murphy2012).

Huang and Beck (Reference Huang and Beck2015) and Huang et al. (Reference Huang, Beck and Li2019) present an early example of using hierarchical Bayesian models to represent engineering structures for SHM. Bull et al. (Reference Bull, Di Francesco, Dhada, Steinert, Lindgren, Parlikad, Duncan and Girolami2023) used hierarchical models to improve the survival analysis of a truck fleet and power prediction in a wind farm. Di Francesco et al. (Reference Di Francesco, Chryssanthopoulos, Faber and Bharadwaj2021) used hierarchical models to account for incomplete and imperfect data in an inspection-planning setting.

2.2. Decision theory

Hierarchical models allow one to quantify beliefs about the states of interest and do reasoning under uncertainty. In engineering these predictions are used to take actions in the real world. Decision theory gives a framework to use the uncertainty quantification provided by a Bayesian approach to make optimal choices in many engineering problem settings.

The idea of a “rational” decision-maker is defined as a decision-maker that acts to maximize the expected utility of their actions; this is expressed via the von Neumann–Morgenstern theorem (Morgenstern et al., Reference Morgenstern, Von Neumann, Kuhn and Rubinstein1964), which states that, for two decidable actions $ a $ and $ b $ :

(9)

$$ a\succcurlyeq b\iff EU(a)\ge EU(b) $$

where $ \succcurlyeq $ denotes a weak preference, indicating that a decision-maker favors $ a $ at least as much $ b $ ; and $ EU\left(\cdot \right) $ denotes the expected utility associated with doing an action. In Morgenstern et al. (Reference Morgenstern, Von Neumann, Kuhn and Rubinstein1964), the expected utility is derived. Consider a stochastic event $ X $ , which has mutually exclusive outcomes of which $ x\in \mathcal{X} $ are conditionally dependent on a decision $ D $ between actions $ a $ and $ b $ . The expected utility of action $ a $ is computed as follows:

(10)

$$ EU(a)=\sum \limits_{x\in \mathcal{X}}P\left(X=x|D=a\right)\cdot U\left(X=x,D=a\right) $$

$ P\left(X|D=a\right) $ is the probability of the outcome of $ X $ given that action $ a $ is executed. $ U $ denotes a utility function that maps $ U:\mathcal{X}\times \mathcal{A}\to \mathrm{\mathbb{R}} $ . Conveniently, if the utility of $ D $ is independent of the variable, X $ U\left(X,D\right) $ can be expressed as the sum of two utility functions, $ U(X) $ and $ U(D) $ That separately describe the utilities associated with outcomes and actions. Equation (10) can then be written as,

(11)

$$ EU(a)=\left[\sum \limits_{x\in \mathcal{X}}P\left(X=x|D=a\right)\cdot U\left(X=x\right)\right]+U\left(D=a\right) $$

For a single decision $ D $ over a finite set of actions $ \mathcal{A} $ , an optimal action $ {a}^{\ast } $ can be defined such that the maximum expected utility (MEU) is achieved, where,

(12)

$$ \mathrm{MEU}(D)=\underset{a\in \mathcal{A}}{\max } EU(a) $$

and,

(13)

$$ {a}^{\ast }=\underset{a\in \mathcal{A}}{\arg \max } EU(a) $$

From Equations (11) and (12), one can see an equivalence between expected utility and risk, which is defined as the product of a probability and a cost. One limitation of utility/risk-based decision theory is that it can be difficult to obtain the probability distributions in Equations (10) and (11). However, via the Bayesian framework outlined in Section 2.1, one can acquire posterior distributions, based on one’s beliefs about an action, that can be used as the probabilities required for Equations (11) and (12). Additionally, the costs or utilities required for these equations can be elicited from asset owners, allowing the expected utility of an action to be estimated.

2.3. Active Learning

To implement a Bayesian hierarchical model, such as the one proposed in Section 2.1, labeled data are required. As discussed in Section 1, for many engineering applications, and particularly for SHM, collecting labels for data is very expensive. It requires sending a domain expert to inspect the physical asset and often requires the operation of the asset to be temporarily halted. These costs motivate reducing the number of inspections and only inspecting measurements that would most improve the predictive model. AL is a set of tools that do this. Additionally, an offline active learner would not be suitable for SHM. SHM systems are required to analyze data as it arrives throughout the life of the monitored system and decisions about interventions can be required immediately upon arrival of this data. One of the major challenges of an online active learner in SHM is that once the decision is made not to query a label, access is lost to this information. It is not possible to retrieve the label once the opportunity has passed.

Consider data, $ X={{\left\{{\mathbf{x}}_i\right\}}^N}_{i=1} $ , which have hidden labels, $ Y={{\left\{{\mathbf{y}}_i\right\}}^N}_{i=1} $ , which can be acquired by paying the costs associated with inspection. The process of choosing and labeling these data points is referred to as querying. An active learner aims to learn a mapping of the observations, $ X $ , to the labels, $ Y $ , while keeping queries to a minimum. A general heuristic for AL is presented by Bull et al. (Reference Bull, Rogers, Wickramarachchi, Cross, Worden and Dervilis2019a) and adapted to regression in Figure 2.

Figure 2.

An AL heuristic. Source: Bull et al. (Reference Bull, Rogers, Wickramarachchi, Cross, Worden and Dervilis2019a).

2.3.1. Risk-based Active Learning

The measure that an active learner uses to decide which unlabeled data to query is important. Information theoretic approaches use information measures such as entropy and uncertainty to guide querying, these types of approaches are common. Hughes et al. (Reference Hughes, Bull, Gardner, Barthorpe, Dervilis and Worden2022a) suggest a risk-based active learner for SHM. This approach uses the expected utility of an action, or the “value of information,” to guide querying. The costs associated with interventions and monitoring are built-in directly to the decision-making process. Decisions on inspections and maintenance are made to maximize the benefit of the expected outcome. While risk-based AL has been explored in the classification setting (Hughes et al., Reference Hughes, Bull, Gardner, Dervilis and Worden2022b), the current paper proposes an approach to risk-based AL for the regression problem in which queries are guided according to the expected utility.

3. Case study—A population of machining tools

In this section, the framework outlined in Section 2 will be applied to a case study. A population of machining tools will be modeled by a hierarchical Bayesian model and a risk-based decision process will be used to actively inspect the tools. The decision-theoretic approach to inspection planning will be compared to a periodic inspection plan.

3.1 The dataset

A dataset described in the authors previous work (Wickramarachchi, Reference Wickramarachchi2019) measures deterioration over the life of machining tools during a turning process. The experimental set-up is shown in Figure 3. The workpiece is rotated around the dashed A-line and the tool makes four passes along the workpiece. Each pass starts at point S and ends at point E. After four passes, the tool is inspected, and measurements are taken of the workpiece and tool. This process continued until tool failure. The investigation was repeated for seven nominally identical tools, which form a population.

Figure 3.

Schematic showing the experimental set-up used for data acquisition. Source: Wickramarachchi (Reference Wickramarachchi2019).

The deterioration is measured indirectly from the roughness of the workpiece. As the machining tool deteriorates, the surface quality of the workpiece will deteriorate and the roughness will increase. The measurements from this experiment can be seen in Figure 4. It can be seen that, in general, surface roughness increases with the distance a given tool has cut along the workpiece, this distance is termed sliding distance. Several tools show a sharp decrease in surface roughness at the second measurement point—this reading is believed to occur due to an initial sharpening of the tool early in the cutting process.

Figure 4.

Experimental surface roughness measurements.

Because of the nature of the experiment, the measurements of surface roughness are very noisy, which can lead to robustness issues when modeling. Combined with the high noise, the shallow gradient of the deterioration makes it difficult to learn the parameters of the regression. This motivates the use of a Bayesian hierarchical model adapted from Section 2.1.

3.2. The hierarchical model

Because of the natural degradation of tools during the machining process, and its effect on surface finish, tools must be replaced regularly. While each tool may be produced to the same specification and made from the same materials, there will be variations in the manufacturing process that manifest as variations between the physical properties of the tools and differences in behavior between the tools; this can be an issue for standard modeling techniques. However, this variation lends itself well to a hierarchical Bayesian model because these types of model account for variations within a population while taking advantage of the statistical similarities between them. An additional benefit of hierarchical models is their suitability to the online setting and sparse datasets. This is particularly useful for tool condition monitoring where researchers may need to make predictions as soon as the machining process has begun and with only a few data points to learn a model. Additionally, the usual benefits of Bayesian modeling apply (uncertainty quantification, prior information, etc.). The hierarchical framework set out in Section 2.1 is adapted below to the case study. Here, there is a population of $ K=7 $ similar tools, with a target response vector $ {y}_k $ , the roughness measurements for each tool. The input vectors $ {x}_k $ are the sliding distance measurements for each tool (how far the tool has cut across the work piece). $ \left\{{x}_{ik},{y}_{ik}\right\} $ are the $ {i}^{\mathrm{th}} $ pair of observations in tool $ k $ . There are $ {N}_k $ observations in each member and thus $ {\Sigma}_{k=1}^K{N}_k $ observations in total. The aim is to learn a set of $ K $ predictors related to the regression task. The type of hierarchical model used for this analysis is a linear mixed model Kreft and De Leeuw, Reference Kreft and De Leeuw1998, so for each member in the population, a gradient $ {m}_k $ and intercept $ {c}_k $ are learnt. A graphical representation of this model can be seen in Figure 5.

Figure 5.

A graphical model representing the linear mixed model with partial pooling.

The likelihood of this model is

(14)

$$ {\left\{{y}_{ik}\right\}}_{k=1}^K\sim \mathrm{Cauchy}\left({y}_{mean},{\gamma}_k\right) $$

where the location parameter is the equation of a straight line,

(15)

$$ {y}_{mean}={m}_k\cdot {x}_{ik}+{c}_k $$

The Cauchy distribution was chosen as the likelihood because of the noisy nature of the data. Cauchy distributions are particularly suited to these types of measurements because of the larger probability density at the extremes, compared to normal distributions another more typical choice. This makes the model less susceptible to outliers and extreme values. Following the Bayesian methodology, one can set prior distributions over the slope for the groups, which encode our prior knowledge of the parameter values.

(16)

$$ {\left\{{m}_k\right\}}_{k=1}^K\sim \mathrm{Normal}\left({\mu}_m,{\sigma}_m\right) $$

(17)

$$ {\mu}_m\sim \mathrm{Gamma}\left(k,\theta \right) $$

(18)

$$ {\sigma}_m\sim \mathrm{HalfCauchy}\left({s}_{\sigma_m}\right) $$

where the slopes are $ \mathrm{normally} $ distributed, with mean $ {\mu}_m $ and standard deviation $ {\sigma}_m $ . Equation (17) shows the prior expectation of the slopes is $ \mathrm{Gamma} $ distributed with shape $ k=1 $ and scale $ \theta =1 $ . The Gamma distribution was a natural choice because we want to encode that the gradients are always positive over the life of a tool. Equation (18) shows that the prior deviation of the slope is $ \mathrm{HalfCauchy} $ distributed with location parameter equal to zero and scale parameter $ {s}_{\sigma_m}=25 $ . As recommended by Gelman (Reference Gelman2006), the variance priors for this hierarchical model are set to be weakly informative. The priors for the intercepts are

(19)

$$ {\left\{{c}_k\right\}}_{k=1}^K\sim \mathrm{Normal}\left({\mu}_c,{\sigma}_c\right) $$

(20)

$$ {\mu}_c\sim \mathrm{Normal}\left({\overline{\mu}}_c,{s}_{\mu_c}\right) $$

(21)

$$ {\sigma}_c\sim \mathrm{HalfCauchy}\left({s}_{\sigma_c}\right) $$

where the intercepts are $ \mathrm{normally} $ distributed, with mean $ {\mu}_c $ and standard deviation $ {\sigma}_c $ . Equation (20) shows the prior expectation of the intercepts is also $ \mathrm{normally} $ distributed with mean $ {\overline{\mu}}_c=0 $ and standard deviation $ {s}_{\mu_c}=1 $ . Equation (21) shows that the prior deviation of the intercept is $ \mathrm{HalfCauchy} $ distributed with location parameter equal to zero and scale parameter $ {s}_{\sigma_c}=25 $ .

The systematic application of graph-theoretic algorithms has led to a number of probabilistic programming languages. Here, models are implemented in NumPyro (Phan et al., Reference Phan, Pradhan and Jankowiak2019. The parameters are inferred using MCMC, via the no U-turn implementation of Hamiltonian Monte Carlo (Hoffman and Gelman, Reference Hoffman and Gelman2014). Throughout, the burn-in period is 1000 iterations and 2000 iterations are used for inference.

To motivate a population-based approach, the hierarchical model will be compared to a model with complete pooling and an independent model with no pooling at all. In the complete pooling approach, a single regression for all the data is learnt. For the independent model, a regression is learnt for each member in the population, but it is assumed there is no correlation between them. Hierarchical modeling is somewhere in between, where a separate regression can be learnt for each member, while encouraging the parameters of these models to be correlated Murphy (Reference Murphy2012).

To simulate tool replacement, some measurements from Figure 4 will be hidden from the models, this emulates new tools with scarce data. In the following figures, the green line shows the samples from the posterior distribution over the (parameterized) latent linear functions, and the gray area shows the area in which 90% of the posterior samples fall.

Figure 6 shows that with complete-pooling, the model struggles with poor generalization and the physical variations between tools mean that a single regression performs poorly on many tools. Even with access to a tools full history, such as Tool 2 and Tool 3, a complete pooling method may not perform well.

Figure 6.

Predictions using a complete pooling method.

The model with no pooling can be seen in Figure 7. For tools that have enough historic data, the mean and variance predictions are reasonable. However, this model performs poorly with scarce data. For these tools, the model struggles to learn the parameters of the regression. The variance of these tools are over-estimated and makes poor predictions about the hidden data, when there is not enough data the model relies on vague priors. Over predicting the variance of the surface roughness could be problematic for asset owners if they use this model to inform decisions. For example, unnecessary inspections may be triggered or tools may be replaced prematurely, increasing the costs of production.

Figure 7.

Predictions without any pooling method.

Finally, the hierarchical model can be seen in Figure 8. For tools with plentiful data, this model behaves comparably to the no-pooling model. However, for data-scarce tools, there is a large reduction in the estimated variation and improvements in the predicted mean. This model is able to draw on the statistical strength of other tools to help predictions. The model remembers the data from other tools, captured by the prior shared between models of machining operations, and has learnt how similar tools behave.

Figure 8.

Predictions using a partial pooling method.

It can be seen in Table 1 that the information sharing provided by a partial-pooling model can improve regression accuracy, shown through a reduction in the total mean squared error as compared to the complete-pooling and no-pooling models. It should be noted that while partial-pooling approaches can achieve improved performance across a population, for individual members in the population, the partial-pooling will occasionally score worse in terms of MSE. For example, in “Tool 2,” the surface roughness measurements are very elevated compared to any other member in the population and returns to the population mean towards the end of tool life. The no-pooling approach over fits to these data, while the partial pooling approach does a better job at modeling the underlying process, masked by noisy measurement. Importantly, the partial pooling method makes good predictions near the end of tool life, the portion of tool life that is most critical for making decisions about replacing the tool.

Table 1.

Mean squared error of partial pooling, complete pooling, and no pooling methods

These results are in accordance with the results of the authors previous work (Bull et al., Reference Bull, Di Francesco, Dhada, Steinert, Lindgren, Parlikad, Duncan and Girolami2023; Dardeno et al., Reference Dardeno, Worden, Dervilis, Mills and Bull2024) and motivate sharing information across a population to improve predictions.

3.3. Decision process and Active Learning

Tool health deterioration can lead to tool failure. When monitoring tools, such as the ones that produced the data in this case study, a manufacturer may have an interest in avoiding tool failure because of the safety implications and costs associated with damage to the workpiece. Typically, inspections are used to observe the health state of the tool. However, performing an inspection has its own costs. Tool inspections halt production and labor and expertise are required to conduct an inspection. Optimizing this decision process, whether or not to inspect an asset, can improve economic efficiency. Often there is a limited budget for inspections and ideally inspections should only be conducted when necessary. Employing decision-theoretic approach as shown in Section 2.2 to guide decision-making for inspections provides a risk-based AL approach to more efficiently allocate inspection budgets.

A common quality control criterion for the machining of engineering components is maintaining a surface roughness below some threshold level $ {S}_{\mathrm{crit}} $ , beyond which the component is no longer fit for purpose. A high-quality surface finish can significantly improve the fatigue strength, corrosion resistance, and creep life of machined parts (Sharkawy et al., Reference Sharkawy, El-Sharief and Soliman2014). If the surface finish is damaged via inadequate modeling or control of the machining process, the part may need to be discarded or re-machined; this has an associated utility, $ {C}_{\mathrm{workpiece}} $ . Additionally, during the machining process, an inspection can be conducted to gain access to a noisy observation of the surface roughness, with a utility $ {C}_{\mathrm{inspection}} $ . Throughout the machining process, the tool can be replaced, with a cost $ {C}_{\mathrm{tool}} $ . Here, there is a decision, inaction, to allow the tool to continue machining with a risk to damage the workpiece, and action, to inspect/replace the tool.

The EU, as seen in Equation (10), needs to be determined for each of the possible actions, and the action with maximum expected utility should be chosen in accordance with Equation (13). One limitation of the dataset used in this case study is that inspections and tool replacements can only be triggered at specific discrete time steps because the data were collected at periodic intervals; a dataset with less restrictive inspections points is under production and will be part of future work.

The expected utility of three actions need to be considered to evaluate this decision process. Inaction, to do nothing and allow the tool to continue machining, to inspect the tool and/or to replace the tool. If an inspection occurs, there is another opportunity to decide to replace the tool based on the new information acquired from the inspection.

Again, Equation (10) can be used to calculate the expected utility of a decision. To compute the utility associated with inaction, not inspecting the tool at time step $ t $ , one can use Equation (22). The two outcomes of inaction are the surface roughness reaching or exceeding $ {S}_{\mathrm{crit}} $ before the next opportunity to intervene, time step $ t+1 $ , or the surface roughness not reaching or exceeding $ {S}_{\mathrm{crit}} $ . The utility of not exceeding $ {S}_{\mathrm{crit}} $ is assigned a value of 0, so the second term in Equation (22) equals 0. The probability $ {P}_{t+1}\left(S>{S}_{\mathrm{crit}}\right) $ can be estimated from the model; here, it is the proportion of samples from the Hamiltonian Monte Carlo (HMC) simulation that exceed $ {S}_{\mathrm{crit}} $ before the next opportunity to inspect.

(22)

$$ EU\left(D=\mathrm{do}\ \mathrm{nothing}\right)={P}_{t+1}\left(S>{S}_{crit}\right)\times U\left(S>{S}_{crit}\right)+P\left(S<{S}_{crit}\right)\times U\left(S<{S}_{crit}\right) $$

When computing the utility of inspection at a given time step, again, there are two outcomes based on whether the surface roughness will exceed $ {S}_{crit} $ . Again, probabilities of the outcomes can be estimated from the HMC samples. The equation to compute the EU of inspection can be seen in Equation (23). It includes the probability that the tool needs to be replaced, which is the probability that $ S $ is already greater than $ {S}_{\mathrm{crit}} $ at that time step, as well as the probability that $ S $ is not greater than $ {S}_{\mathrm{crit}} $ but will be by the next time step. For the current linear model, this is equivalent to $ {P}_{t+1}\left(S>{S}_{\mathrm{crit}}\right) $ and could be evaluated a such from the HMC samples. Again, these probabilities are estimated from HMC samples.

(23)

$$ EU\left(D=\mathrm{inspection}\right)={C}_{tool}\times \left({P}_{t+1}\left(S>{S}_{\mathrm{crit}}\right)\right)+{C}_{inspection} $$

The criteria required to trigger the replacement of a tool can be seen in Equation (24). It is the probability at which the risk associated with damaging the work piece (the probability of damage occurring multiplied by the utility associated with it) becomes greater than the cost of replacing the tool.

(24)

$$ {\displaystyle \begin{array}{l}P\left({T}_f<T\right)\times {C}_{\mathrm{workpiece}}\ge {C}_{\mathrm{tool}}\\ {}\hskip4em P\left({T}_f<T\right)\ge \frac{C_{\mathrm{tool}}}{C_{\mathrm{workpiece}}}\\ {}\hskip4em P\left({T}_f<T\right)\ge \alpha \end{array}} $$

Equation (24) shows that tools should be replaced at the earliest time, T, that $ P\left({T}_f<T\right) $ is equal to or greater than the ratio, $ \alpha $ . $ P\left({T}_f<T\right) $ is the assessment of time-to-failure after one has updated beliefs with the results of any inspections and where failure is $ S $ exceeding $ {S}_{\mathrm{crit}} $ . Intuitively, as $ {C}_{tool} $ increases, the probability requirements are increased and so replacements are more difficult to trigger, the system prioritizes extending tool life. As $ U\left(S>{S}_{crit}\right) $ increases, the probability requirements are reduced and so replacements are more easily triggered and the system prioritizes the surface quality of the workpiece.

A complete decision-theoretic approach would consider the expected value of information associated with improving the model based on the new data from an inspection. For the current work, it is assumed that inspecting the tool would provide zero improvement to the predictions of the model as the full value of information calculation is very computationally expensive.

At every potential inspection point, the hierarchical model makes predictions about the surface roughness. These predictions are based on the currently available data and are used to inform the decision analysis detailed above. A diagram showing this process is shown in Figure 9.

Figure 9.

The decision theoretic active-learning heuristic.

The online decision-theoretic AL approach for inspection planning presented in this work is compared to a conventional inspection plan featuring periodic inspections. With both approaches, the model has access to the full measurement history of four of the tools and very limited data for the rest of the tools (the authors found that, in this case, without access to at least four full tool datasets, the hierarchical model does not have enough data to draw on for accurate predictions).

In general, the parameters of the decision process, $ {S}_{crit},{C}_{inspection},{C}_{tool},U\left(S>{S}_{crit}\right) $ can be elicited from expert familiar with the inspection process. Here, the values are set by hand to reflect a situation common in industry, where the cost of damaging the workpiece is much greater than the cost of replacing the tool (It is worth noting that optimal actions are invariant under affine transformations of the utility function.):

(25)

$$ {\displaystyle \begin{array}{l}U\left(S>{S}_{crit}\right)=1\\ {}\hskip3em {C}_{tool}=0.25\\ {}\hskip1.48em {C}_{inspection}=0.05\\ {}\hskip3.24em {S}_{crit}=0.9\mu m\end{array}} $$

To assess the performance of these monitoring systems, each will be compared to a monitoring system with access to the full measurement history of every tool. This monitoring system will be seen as the gold standard as it represents the limiting case of using all possible information within the dataset and thus the point at which it decides to replace the tool can be considered to be the optimal point of replacement. If the other monitoring systems choose to replace the tool later than the optimal point, it is assumed the roughness has exceeded $ {S}_{crit} $ (despite what the noisy measurements might suggest) and the workpiece will be considered damaged with a cost $ {C}_{workpiece} $ . If the monitoring systems choose to replace the tool earlier than the optimal replacement time, then some portion of the tool life is wasted; the cost of which can be calculated using Equation (26). The optimal monitoring system can choose to replace the tool at any point throughout the life of the tools, that is, it can observe several measurements that are greater than $ {S}_{\mathrm{crit}} $ and then realize it should have replaced the tool before these measurements. The other monitoring systems are performing in an online manner, so do not have this luxury. The combined costs of both inspections and suboptimal replacements, will be compared.

(26)

$$ {C}_{\mathrm{early}\ \mathrm{replacement}}=\frac{t_{\mathrm{replacement}}}{t_{\mathrm{optimal}\ \mathrm{replacement}}}\times {C}_{tool} $$

As mentioned previously, the first four tools are to be seen as historic data that the manufacturer has collected from previous tools. The hierarchical model can leverage this data to inform the population-level distributions. The active-learning procedure tool replacements are not calculated for these tools, only Tools 5–7. During the original experiment the measurements were taken at a time step that relates to 6.02 km sliding distance Wickramarachchi, Reference Wickramarachchi2019. Figure 10 shows the optimal replacement for Tools 5–7 based on the “gold standard” model.

Figure 10.

Benchmark replacements determined using all available information.

4. Results

In this section, the decision theoretic active learner described in Section 3 and a periodic approach to inspection planning will be compared. The point at which these approaches suggest to replace Tools 5–7, based on the criteria showcased in Equation (24), will be compared to the “gold standard” model, for which, the suggested tool replacements are shown in Figure 10.

Figure 11 shows the suggested tool replacements with periodic inspections. It can be seen that, compared to the optimal replacements in Figure 10, every tool is replaced one time step too early, with a total of 10 inspections. This result is to be expected because in forecasting the surface roughness predictions with reduced data (as compared to the “gold standard” model which has access to all the information), the uncertainty is inflated thus increasing the estimated risk associated with damage to the workpiece.

Figure 11.

Tool replacement with periodic inspections.

Figure 12 shows the replacements with risk-based inspection planning. Tools 5 and 6 are again inspected one time step before the fully observed case. Additionally, Tool 7 is inspected two time steps before the gold standard. The risk-based monitoring system used a total of five inspections. The reduced number of inspections is because the decision-theoretic approach will, in general not suggest inspections early in the life of a tool when the risk of $ S $ exceeding $ {S}_{\mathrm{crit}} $ is small. Most inspections will be triggered near the end of tool life when the risk associated with damaging the work piece is greatest. Again, because of the reduced access to information (compared to both the “gold standard” and the periodic inspections), inflated risk increases the likelihood of preoptimal tool replacements.

Figure 12.

Tool replacement with risk-based inspections.

Table 2 collates the number of inspections and tool replacements. It should be noted that the unit “0.602 kms” refers to the sliding distance (how far the tool has cut) between each surface roughness measurement.

Table 2.

Inspection values

Table 3.

Costs of each inspection method

Table 3 compares the performance of the inspection approaches. The “cost of inspections” column is the number of inspections multiplied by the cost of an inspection. The “cost of wasted tool life” for each inspection approach can be seen in the first column of Figure 3 and can be calculated using Equation (27).

(27)

$$ {C}_{wasted\ tool\ life}={C}_{tool}\times \frac{\mathrm{Discrepancy}}{\mathrm{Optimal}\ \mathrm{Replacement}} $$

As can be seen in the final column of Table 3, using the parameters described in this paper, a risk-based approach to inspection planning reduced costs associated with monitoring by 36.95 $ \% $ .

5. Discussion

The case study showcased in Section 3 highlights the effectiveness of a risk-based approach to inspection planning. The significant reduction in monitoring costs can be attributed to reducing the number of unnecessary inspections while avoiding damaging the workpiece at a similar rate to a periodic inspection process.

Deriving the equations of the expected utility of every action in the decision analysis is not always trivial. Even then, sometimes, simplifications are required if the user wishes to implement the decisions in real time, since some expected utility calculations induce large computational loads.

While choosing a hierarchical or multilevel model to model the data in Section 3 provides many benefits, there are also computational considerations when working with these models. When partially pooling data in this manner, the probability space that a Monte Carlo sampler is required to explore becomes higher dimensional; this can lead to increased computational costs and restrictions in the choice of prior probability distributions (because of complex posterior geometries which can be difficult to explore). For online monitoring scenarios where decisions and actions are required instantaneously with data acquisition, increased computation times could be an issue. Additionally, the full Value of Information analysis was left out of this paper, this would add to computation times.

6. Conclusion

A Bayesian multilevel model is used to model a population of machining tools and make predictions about how the tools degrade. The equations for the expected utility of inspecting and replacing the tools were derived to form an online decision-theoretic approach to inspection planning where tools are inspected in an active manner according to the risk.

The authors believe that using risk as a query measure for AL, rather than information measures, has a place in many engineering decision scenarios. While it can be difficult to formulate the equations presented in Section 3 without proper understanding of the decision problem, this work shows that a risk-based approach to inspection planning can lead to a large reduction in monitoring costs while maintaining comparable or, in some cases, improved performance when compared to other methods.

Data availability statement

Due to a patent on the tool monitoring system used to collect this data, this data cannot be open source. Requests for data will be considered on a case-by-case basis.

Acknowledgments

CTW would like to acknowledge Dr Wayne Leahy and the team at Element Six for their expertise, resources, and materials.

Author contribution

Conceptualization: D.C., A.H., T.R., N.D., and L.B.; Supervision: A.H., T.R., N.D., K.W.; Methodology: T.D, A.H, D.C., L.B., T.R., N.D; Data curation: C.W, L.C.; Writing—original draft: D.C.; Writing—review and editing: D.C., A.H., K.W., N.D.

Funding statement

The authors gratefully acknowledge the support of the UK Engineering and Physical Sciences Research Council (EPSRC) via the ROSEHIPS project (Grant EP/W005816/1). For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) license to any Author Accepted Manuscript version arising.

Competing interest

The authors declare no competing interests exist.

Ethical standards

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

References

Aggarwal, CC, Kong, X, Gu, Q, Han, J and Philip, SY (2014). Active learning: A survey. In Aggarwal, CC (ed.), Data Classification. Boca Raton: Chapman and Hall/CRC, 599–634.CrossRef Google Scholar

Bull, L, Worden, K, Manson, G and Dervilis, N (2018) Active learning for semi-supervised structural health monitoring. Journal of Sound and Vibration 437, 373–388.CrossRef Google Scholar

Bull, LA, Di Francesco, D, Dhada, M, Steinert, O, Lindgren, T, Parlikad, AK, Duncan, AB and Girolami, M (2023) Hierarchical Bayesian modeling for knowledge transfer across engineering fleets via multitask learning. Computer-Aided Civil and Infrastructure Engineering 38(7), 821–848.CrossRef Google Scholar

Bull, LA, Gardner, PA, Gosliga, J, Rogers, TJ, Dervilis, N, Cross, EJ, Papatheou, E, Maguire, AE, Campos, C and Worden, K (2021) Foundations of population-based SHM, Part I: Homogeneous populations and forms. Mechanical Systems and Signal Processing 148, 107141.CrossRef Google Scholar

Bull, LA, Rogers, TJ, Wickramarachchi, C, Cross, EJ, Worden, K and Dervilis, N (2019a) Probabilistic active learning: An online framework for structural health monitoring. Mechanical Systems and Signal Processing 134, 106294.CrossRef Google Scholar

Bull, LA, Worden, K, Rogers, TJ, Wickramarachchi, C, Cross, EJ, McLeay, T, Leahy, W and Dervilis, N (2019b) A probabilistic framework for online structural health monitoring: Active learning from machining data streams. Journal of Physics 1264, 012028.Google Scholar

Burbidge, R, Rowland, JJ and King, RD (2007) Active learning for regression based on query by committee. In Intelligent Data Engineering and Automated Learning-IDEAL 2007: 8th International Conference, December 16–19, Birmingham, UK. London: Springer, 209–218.CrossRef Google Scholar

Cai, W, Zhang, Y and Zhou, J (2013) Maximizing expected model change for active learning in regression. In IEEE 13th International Conference on Data Mining. Piscataway: IEEE, 51–60.Google Scholar

Cohn, DA, Ghahramani, Z and Jordan, MI (1996) Active learning with statistical models. Journal of Artificial Intelligence Research 4, 129–145.CrossRef Google Scholar

Dardeno, TA, Worden, K, Dervilis, N, Mills, RS and Bull, LA (2024) On the hierarchical Bayesian modelling of frequency response functions. Mechanical Systems and Signal Processing 208, 111072.CrossRef Google Scholar

Di Fiore, F, Nardelli, M and Mainini, L (2024) Active learning and Bayesian optimization: A unified perspective to learn with a goal. Archives of Computational Methods in Engineering 31, 1–29.CrossRef Google Scholar

Di Francesco, D, Chryssanthopoulos, M, Faber, MH and Bharadwaj, U (2021) Decision-theoretic inspection planning using imperfect and incomplete data. Data-Centric Engineering 2, e18.CrossRef Google Scholar

Dodt, MB, Persoons, A, Faes, M and Moens, D (2022) Active learning in grey-box models for near-real-time online monitoring of dynamic processes. Online Proceedings ISMA2022-USD2022. Available at https://past.isma-isaac.be/downloads/isma2022/proceedings/Contribution_332_proceeding_3.pdf.Google Scholar

Farrar, CR and Worden, K (2012) Structural Health Monitoring: A Machine Learning Perspective. New York: John Wiley & Sons.CrossRef Google Scholar

Fink, O, Wang, Q, Svensen, M, Dersin, P, Lee, W-J and Ducoffe, M (2020) Potential, challenges and future directions for deep learning in prognostics and health management applications. Engineering Applications of Artificial Intelligence 92, 103678.CrossRef Google Scholar

Freund, Y, Seung, HS, Shamir, E and Tishby, N (1997) Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168.CrossRef Google Scholar

Fu, Y, Zhu, X and Li, B (2013) A survey on instance selection for active learning. Knowledge and Information Systems 35, 249–283.CrossRef Google Scholar

Gardner, P, Bull, LA, Gosliga, J, Dervilis, N and Worden, K (2021) Foundations of population-based SHM, Part III: Heterogeneous populations–Mapping and transfer. Mechanical Systems and Signal Processing 149, 107142.CrossRef Google Scholar

Gardner, P, Liu, X and Worden, K (2020) On the application of domain adaptation in structural health monitoring. Mechanical Systems and Signal Processing 138, 106550.CrossRef Google Scholar

Gelman, A (2006) Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis 1(3), 515–534.CrossRef Google Scholar

Gelman, A and Hill, J (2006) Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press.CrossRef Google Scholar

Gosliga, J, Gardner, PA, Bull, LA, Dervilis, N and Worden, K (2021) Foundations of population-based SHM, Part II: Heterogeneous populations–Graphs, networks, and communities. Mechanical Systems and Signal Processing 148, 107144.CrossRef Google Scholar

He, GY, Zhao, YX and Yan, CL (2023) Parameter estimation in multiaxial fatigue short crack growth model using hierarchical Bayesian linear regression. Fatigue & Fracture of Engineering Materials & Structures 46(3), 845–865.CrossRef Google Scholar

Hoffman, MD and Gelman, A (2014) The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research 15(1), 1593–1623.Google Scholar

Huang, Y and Beck, J (2015) Hierarchical sparse bayesian learning for strucutral health monitoring with incomplete modal data. International Journal for Uncertainty Quantification 5(2), 139–169.CrossRef Google Scholar

Huang, Y, Beck, J and Li, H (2019) Multitask sparse Bayesian learning with applications in structural health monitoring. Computer-Aided Civil and Infrastructure Engineering 34(9), 732–754.CrossRef Google Scholar

Hughes, AJ, Bull, LA, Gardner, P, Barthorpe, RJ, Dervilis, N and Worden, K (2022a) On risk-based active learning for structural health monitoring. Mechanical Systems and Signal Processing 167, 108569.CrossRef Google Scholar

Hughes, AJ, Bull, LA, Gardner, P, Dervilis, N and Worden, K (2022b) On robust risk-based active-learning algorithms for enhanced decision support. Mechanical Systems and Signal Processing, 181, 109502.CrossRef Google Scholar

Kreft, IG and De Leeuw, J (1998) Introducing Multilevel Modeling. New York: Sage.CrossRef Google Scholar

Kumar, P and Gupta, A (2020) Active learning query strategies for classification, regression, and clustering: A survey. Journal of Computer Science and Technology 35, 913–945.CrossRef Google Scholar

Li, X, Jia, X-D, Zhang, W, Ma, H, Luo, Z and Li, X (2020) Intelligent cross-machine fault diagnosis approach with deep auto-encoder and domain adaptation. Neurocomputing 383, 235–247.CrossRef Google Scholar

Morgenstern, O, Von Neumann, J, Kuhn, HW and Rubinstein, A (1964) Theory of Games and Economic Behavior. New York: John Wiley & Sons.Google Scholar

Murphy, KP (2012) Machine Learning: A Probabilistic Perspective. Cambridge: The MIT Press.Google Scholar

Phan, D, Pradhan, N and Jankowiak, M (2019) Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv preprint arXiv:1912.11554. Available at https://arxiv.org/abs/arXiv:1912.11554 Google Scholar

Shahraki, AF, Yadav, OP and Liao, H (2017) A review on degradation modelling and its engineering applications. International Journal of Performability Engineering 13(3), 299.Google Scholar

Sharkawy, AB, El-Sharief, MA and Soliman, M-ES (2014) Surface roughness prediction in end milling process using intelligent systems. International Journal of Machine Learning and Cybernetics 5, 135–150.CrossRef Google Scholar

Song, J, Wei, P, Valdebenito, MA, Faes, M and Beer, M (2022) Data-driven and active learning of variance-based sensitivity indices with bayesian probabilistic integration. Mechanical Systems and Signal Processing 163, 108106.CrossRef Google Scholar

Tsialiamanis, G, Mylonas, C, Chatzi, E, Dervilis, N, Wagg, DJ and Worden, K (2021) Foundations of population-based SHM, Part IV: The geometry of spaces of structures and their feature spaces. Mechanical Systems and Signal Processing 157, 107692.CrossRef Google Scholar

Wang, M, Min, F, Zhang, Z-H and Wu, Y-X (2017) Active learning through density clustering. Expert Systems with Applications 85, 305–317.CrossRef Google Scholar

Wickramarachchi, C (2019) Automated Testing of Advanced Cutting Tool Materials. PhD thesis, University of Sheffield, Sheffield, UK.Google Scholar

Wu, D, Lin, C-T and Huang, J (2019) Active learning for regression using greedy sampling. Information Sciences 474, 90–105.CrossRef Google Scholar

Xu, S and Noh, HY (2021) PhyMDAN: Physics-informed knowledge transfer between buildings for seismic damage diagnosis through adversarial learning. Mechanical Systems and Signal Processing 151, 107374.CrossRef Google Scholar

Yu, H and Kim, S (2010) Passive sampling for regression. In IEEE International Conference on Data Mining. Piscataway: IEEE, 1151–1156.Google Scholar

Zhang, W, Peng, G, Li, C, Chen, Y and Zhang, Z (2017) A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors 17(2), 425.CrossRef Google Scholar PubMed

Zhu, X, Zhang, P, Lin, X and Shi, Y (2007) Active learning from data streams. In Seventh IEEE International Conference on Data Mining (ICDM 2007). Piscataway: IEEE, 757–762.CrossRef Google Scholar