Hostname: page-component-77f85d65b8-pztms Total loading time: 0 Render date: 2026-03-28T19:50:39.183Z Has data issue: false hasContentIssue false

Enhancing prognostic model interpretability for advanced engine failure prediction using prognostic metrics and explainable AI

Published online by Cambridge University Press:  02 March 2026

R. Avsar*
Affiliation:
Faculty of Engineering and Natural Sciences, Istanbul Medeniyet University , Istanbul, Turkey
Rights & Permissions [Opens in a new window]

Abstract

Predictive maintenance in safety-critical systems like turbofan engines increasingly relies on machine learning (ML) models to estimate remaining useful life (RUL), but the ‘black box’ nature of these models hinders their adoption and trustworthiness. While traditional ex-ante prognostic metrics (e.g. monotonicity, trendability) are used to pre-screen sensor data, a systematic comparison against the post-hoc explanations of what a model actually learns is lacking. We explore the application of SHapley Additive exPlanations (SHAP) from explainable artificial intelligence (XAI) to investigate feature importance in engine failure prediction using the second dataset of the Commercial Modular Aero-Propulsion System Simulation (CMAPSS). The preprocessing pipeline includes z-score normalisation of sensor data and the calculation of a health index (HI) to quantify system degradation. A power-law fit is applied to the HI to capture the underlying trends of engine wear and failure progression. We use the normalisation data to calculate prognostic feature selection metrics: monotonicity, trendability and prognosability. Then, we train two machine learning models – random forest (RF) regressor and gradient boosting (GB) method – directly from the raw data to predict the RUL based on the actual sensor readings. The SHAP values generated for both models are analysed to identify the features with the most significant impact on RUL predictions. By comparing the SHAP value distributions across models and prognostic predictors, we highlight feature robustness and their relative influence on engine degradation and failure prediction. This work provides insights into the interpretability of machine learning models in prognostics and enhances the understanding of sensor contributions to engine health monitoring. The results demonstrate the effectiveness of SHAP in elucidating feature importance, supporting the development of more transparent and reliable prognostic systems.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Royal Aeronautical Society
Figure 0

Table 1. Dataset parameters

Figure 1

Figure 1. The raw sensor data for the first unit. Different colours represent different sensor data explained in Table 1.

Figure 2

Figure 2. Overlay of z-score normalised sensor data for all engines. Sensors 1, 5, 18 and 19 are not shown or appear empty due to zero variance. Sensors 10 and 16 show highly scattered data due to too much sensor noise.

Figure 3

Figure 3. Generation and smoothing of the HI for five sample units.

Figure 4

Figure 4. Final normalised HI for the same five units.

Figure 5

Table 2. Final hyperparameter values for ML models

Figure 6

Table 3. Performance metrics on the test set (with 95% bootstrap confidence intervals)

Figure 7

Figure 5. Monotonicity scores for each sensor with mean, median, interquartile range and outliers.

Figure 8

Figure 6. Trendability scores for each sensor with mean, median, interquartile range and outliers.

Figure 9

Figure 7. Degradation range scores for each sensor with mean, median, interquartile range and outliers. This metric is a key component of the prognosability calculation (Section 2.2.3), where a high score is achieved when the variance in this range is small relative to the overall degradation magnitude.

Figure 10

Figure 8. Prognostic metric correlations. Points are colour-coded by group. The Outlier sensors are red (Sensors 6, 10, 16), the Top Performer sensors are blue (Sensors 4, 9, 11, 15), and the remaining sensors are grey.

Figure 11

Figure 9. Arithmetic and geometric means of the prognostic metric scores for each sensor. Sensors 10 and 16 have near zero score due to too much sensor noise.

Figure 12

Figure 10. SHAP summary plot for the RF model.

Figure 13

Figure 11. SHAP summary plot for the GB model.

Figure 14

Figure 12. SHAP dependence plot for Sensor 15 (BPR, bypass ratio) coloured by the value of Sensor 11 (Ps30, static pressure at HPC outlet, psia).

Figure 15

Figure 13. SHAP dependence plot for Sensor 15 (BPR, bypass ratio) are coloured by the value of Sensor 13 (NRf, corrected fan speed, rpm).

Figure 16

Figure 14. SHAP force plot for a single prediction case study.

Figure 17

Figure 15. SHAP force plot for another prediction.

Figure 18

Table 4. Feature and SHAP value comparison for case study instances

Figure 19

Figure 16. Z-score normalized sensor data for a single unit.

Figure 20

Figure 17. Correlation heatmap of the three prognostic metrics.