Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-07T08:30:02.983Z Has data issue: false hasContentIssue false

The transfer learning of uncertainty quantification for industrial plant fault diagnosis system design

Published online by Cambridge University Press:  13 December 2024

J. Blair*
Affiliation:
Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, UK Data Science Department, National Physical Laboratory, Teddington, UK
O. Amin
Affiliation:
Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, UK Data Science Department, National Physical Laboratory, Teddington, UK
B. D. Brown
Affiliation:
Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, UK
S. McArthur
Affiliation:
Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, UK
A. Forbes
Affiliation:
Data Science Department, National Physical Laboratory, Teddington, UK
B. Stephen
Affiliation:
Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, UK
*
Corresponding author: J. Blair; E-mail: j.blair@strath.ac.uk

Abstract

The performance and confidence in fault detection and diagnostic systems can be undermined by data pipelines that feature multiple compounding sources of uncertainty. These issues further inhibit the deployment of data-based analytics in industry, where variable data quality and lack of confidence in model outputs are already barriers to their adoption. The methodology proposed in this paper supports trustworthy data pipeline design and leverages knowledge gained from one fully-observed data pipeline to a similar, under-observed case. The transfer of uncertainties provides insight into uncertainty drivers without repeating the computational or cost overhead of fully redesigning the pipeline. A SHAP-based human-readable explainable AI (XAI) framework was used to rank and explain the impact of each choice in a data pipeline, allowing the decoupling of positive and negative performance drivers to facilitate the successful selection of highly-performing pipelines. This empirical approach is demonstrated in bearing fault classification case studies using well-understood open-source data.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Illustration of the major stages and flow of data in a simplified industrial data acquisition pipeline.

Figure 1

Figure 2. Flowchart of the pipeline design, construction, explanation and transfer to new systems.

Figure 2

Table 1. Summary of pipeline design choices

Figure 3

Table 2. Summary of pipeline stages, choices and their rankings for source and target datasets (averaged over 5 runs)

Figure 4

Figure 3. Probability distribution of classification errors across the source and target data sets. The source data set consists of 31,680 CW pipelines. Many pipelines result in low, near 0% errors and the distribution has a heavy tail at higher errors with a peak around 40%. The target data set consists of 2640 generated pipelines. Many pipelines result in low errors, with a heavy tail towards high errors. There is another peak near 30% error, similar to the source domain.

Figure 5

Table 3. Summary of “best” and “worst” pipeline choices per stage for source and target datasets

Figure 6

Figure 4. Histogram of classification errors from SHAP recommended “best” pipeline design choices for the source and target domains. All chosen source domain and target domain pipelines have very low classification error of maximum 0.057% and 0% error, respectively.

Figure 7

Figure 5. Quantile-Quantile Plot of the source and target pipeline error distributions. The Pearson correlation coefficient between the two distributions is 0.97 showing a strong linear correlation between the quantiles of the two error distributions. As the two systems align well with the theoretical plot, information gained from one system would provide useful insight into the behaviour of the other. This is effectively transference of the expected errors between the source and target systems.

Supplementary material: File

Blair et al. supplementary material

Blair et al. supplementary material
Download Blair et al. supplementary material(File)
File 142.3 KB
Submit a response

Comments

No Comments have been published for this article.