Explainable machine learning for predicting coronary heart disease risk in patients with carotid atherosclerosis: A retrospective study with SHAP and decision curve analysis

Lei Zhang; Mengke Lyu; Mingyuan Du; Yizhuo Li; Haifeng Yan; Xiaohui Li; Wenshuang Niu; Lizhi Pang

doi:10.1017/cts.2026.10722

Explainable machine learning for predicting coronary heart disease risk in patients with carotid atherosclerosis: A retrospective study with SHAP and decision curve analysis

Published online by Cambridge University Press: 06 March 2026

Lei Zhang ,

Mengke Lyu

Mingyuan Du ,

Yizhuo Li ,

Haifeng Yan ,

Xiaohui Li ,

Wenshuang Niu and

Lizhi Pang

Show author details

Lei Zhang: Affiliation:
Heart Center, The First Affiliated Hospital of Henan University of Chinese Medicine; National Regional (TCM) Cardiovascular Diagnosis and Treatment Center, China Collaborative Innovation Center of Prevention and Treatment of Major Diseases by Chinese and Western Medicine, China The First Affiliated Hospital of Henan University of Chinese Medicine, China
Mengke Lyu*: Affiliation:
The First Affiliated Hospital of Henan University of Chinese Medicine, China
Mingyuan Du: Affiliation:
Heart Center, The First Affiliated Hospital of Henan University of Chinese Medicine; National Regional (TCM) Cardiovascular Diagnosis and Treatment Center, China Collaborative Innovation Center of Prevention and Treatment of Major Diseases by Chinese and Western Medicine, China The First Affiliated Hospital of Henan University of Chinese Medicine, China
Yizhuo Li: Affiliation:
Heart Center, The First Affiliated Hospital of Henan University of Chinese Medicine; National Regional (TCM) Cardiovascular Diagnosis and Treatment Center, China Collaborative Innovation Center of Prevention and Treatment of Major Diseases by Chinese and Western Medicine, China The First Affiliated Hospital of Henan University of Chinese Medicine, China
Haifeng Yan: Affiliation:
Heart Center, The First Affiliated Hospital of Henan University of Chinese Medicine; National Regional (TCM) Cardiovascular Diagnosis and Treatment Center, China Collaborative Innovation Center of Prevention and Treatment of Major Diseases by Chinese and Western Medicine, China The First Affiliated Hospital of Henan University of Chinese Medicine, China
Xiaohui Li: Affiliation:
Heart Center, The First Affiliated Hospital of Henan University of Chinese Medicine; National Regional (TCM) Cardiovascular Diagnosis and Treatment Center, China Collaborative Innovation Center of Prevention and Treatment of Major Diseases by Chinese and Western Medicine, China The First Affiliated Hospital of Henan University of Chinese Medicine, China
Wenshuang Niu: Affiliation:
The Fifth Clinical Medical College of Henan University of Traditional Chinese Medicine, China
Lizhi Pang: Affiliation:
The First Clinical Medical College of Zhengzhou University, China
*: Corresponding author: M. Lyu; Email: skylmk@126.com

Article contents

Rights & Permissions

Abstract

Background:

Carotid atherosclerosis is associated with increased coronary heart disease (CHD) risk, yet current risk models lack specificity and interpretability for this population. This study aimed to develop explainable machine learning (ML) models to predict CHD in these patients.

Methods:

We retrospectively analyzed 487 patients with carotid atherosclerosis (191 CHD, 296 non-CHD) from January 2022 to July 2025. Thirty-eight variables were collected, including demographic, clinical, and biochemical indicators. LASSO regression identified six key predictors. Seven ML models were trained and evaluated using area under receiver operating characteristic curve (AUC), PRC-AUC, calibration curves, and decision curve analysis (DCA). SHAP was applied to interpret the best-performing model.

Results:

Logistic regression model achieved the highest test-set performance (AUC = 0.827; PRC-AUC = 0.752), with strong generalizability and calibration. SHAP analysis identified age and diastolic blood pressure as the most influential features, aligning with model coefficients. DCA demonstrated superior clinical net benefit of the logistic regression model across probability thresholds.

Conclusion:

A six-variable logistic model provides accurate and interpretable CHD risk prediction in patients with carotid atherosclerosis. Its transparency and clinical utility support its integration into personalized risk management.

Keywords

Carotid atherosclerosis coronary heart disease machine learning risk prediction shap

Information

Type: Research Article
Information: Journal of Clinical and Translational Science , Volume 10 , Issue 1 , 2026 , e57

DOI: https://doi.org/10.1017/cts.2026.10722 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2026. Published by Cambridge University Press on behalf of Association for Clinical and Translational Science

Introduction

Coronary heart disease (CHD) remains a prominent global cause of mortality and morbidity, responsible for approximately 16% of all deaths worldwide [Reference Roth, Abate and Abate1]. Its pathogenesis is intricately linked to the development of atherosclerotic plaques and vascular inflammation, with risk factors such as hypertension, dyslipidemia, and diabetes exacerbating disease progression [Reference Ley2]. Importantly, carotid atherosclerosis, characterized by intima-media thickening or plaque formation in the carotid arteries, shares a common pathophysiological basis with CHD, including endothelial dysfunction and lipid accumulation [Reference Cuomo3,Reference Pitchika, Markus and Schipf4]. Individuals with carotid atherosclerosis demonstrate a 2–3-fold higher risk of CHD development compared to the general population, underscoring the importance of targeted risk assessment and early intervention in this high-priority cohort [Reference Shimoda, Kitamura and Imano5,Reference O’Leary, Polak, Kronmal, Manolio, Burke and Wolfson6].

CHD risk assessment tools, exemplified by the Framingham Risk Score and SCORE2, predominantly utilize traditional risk factors like age, blood pressure, and smoking status [Reference D’Agostino, Vasan and Pencina7]. However, these tools often fail to consider disease-specific markers and intricate interplays among variables. This limitation results in reduced predictive accuracy, particularly in specialized cohorts such as individuals with carotid atherosclerosis. In this context, factors like coagulation function (e.g., thrombin time) and renal function indices (e.g., creatinine) may hold significant predictive value [Reference Chen, Zelnick and Huber8]. Moreover, the escalating prevalence of overweight and metabolic syndrome underscores the necessity for comprehensive risk evaluation tools that encompass diverse data dimensions, encompassing anthropometric measurements, biochemical parameters, and clinical history [Reference Calle, Thun, Petrelli, Rodriguez and Heath9,Reference Gajalakshmi, Lacey, Kanimozhi, Sherliker, Peto and Lewington10].

In recent years, machine learning (ML) has emerged as a potent tool for enhancing predictive accuracy in clinical data by capturing intricate nonlinear relationships and high-dimensional interactions [Reference Banerjee, Chen and Fatemifar11]. Studies have demonstrated the superior performance of ML models like XGBoost and Random Forest over traditional statistical methods in predicting CHD risk [Reference El-Wahab12]. Nevertheless, a significant obstacle to the widespread adoption of ML in clinical practice is the opaque nature of many sophisticated algorithms, which hinders transparency and trust among healthcare providers [Reference Nizette, Hammedi, Van Riel and Steils13]. Explainable AI (XAI) methodologies, such as SHAP, mitigate this issue by quantifying the impact of individual features on predictions, thereby improving the interpretability of models [Reference Lundberg and Lee14].

Despite advancements, three primary constraints endure in contemporary research: (1) The majority of ML models for predicting CHD concentrate on the general populace, neglecting sufficient data on individuals with carotid atherosclerosis; (2) Limited research systematically evaluates various ML algorithms” effectiveness, with a focus on clinical applicability such as net benefit assessment through decision curve analysis; (3) The process of feature selection frequently lacks meticulousness, resulting in overfitting or the incorporation of extraneous variables.

This retrospective single-center study aimed to address gaps by developing and validating ML models to predict CHD risk in patients with carotid atherosclerosis. The study utilized 38 candidate variables encompassing demographic, clinical, and biochemical features. Key predictive factors were identified through LASSO regression to enhance model parsimony and generalizability. The study compared the performance of seven ML algorithms (Logistic regression model, Decision Tree, Random Forest, KNN, XGBoost, LightGBM, and Stacking) using various metrics such as area under the receiver operating characteristic curve (AUC), precision-recall, and calibration. Additionally, interpretability was improved through SHAP (SHapley Additive exPlanations) analysis, and clinical utility was evaluated via decision curve analysis to facilitate practical application.

This study aims to enhance the identification of high-risk CHD patients with carotid atherosclerosis by combining explainability and predictive accuracy. The goal is to offer clinicians a dependable tool for tailoring prevention and management strategies to individual patients.

Study data

This retrospective cross-sectional study examined 487 adult patients aged 18–80 diagnosed with carotid atherosclerosis using ultrasound or computed tomography angiography (CTA) at the First Affiliated Hospital of Henan University of Chinese Medicine between January 1, 2022, and July 20, 2025. Carotid plaque was defined as focal wall thickening exceeding 50% of the surrounding intima-media thickness or carotid intima-media thickness of 1.5 mm or more. Carotid atherosclerosis was characterized by intima-media thickening of 1.0 mm or greater or the presence of atherosclerotic plaque in one or both carotid arteries.

Participants were categorized into two cohorts according to their clinical diagnosis of CHD: the CHD group (n = 191) and the non-CHD group (n = 296). The diagnosis of CHD followed established criteria outlined in the 2023 ESC Guidelines for Acute Coronary Syndromes Management, encompassing stable angina or acute coronary syndrome. A thorough evaluation was conducted on all participants, encompassing demographic and laboratory assessments. Although ultrasound and CTA were used to confirm the diagnosis of carotid atherosclerosis, detailed imaging-derived metrics (e.g., IMT, plaque characteristics) were not included as predictor variables due to inconsistent availability in the electronic medical records.

Exclusion criteria included congenital heart disease, non-atherosclerotic cardiac conditions such as myocarditis, active malignancy, severe psychiatric disorders, pregnancy or lactation, and participation in other clinical trials within the past 3 months. This retrospective study, approved on April 1, 2024 by the Ethics Committee of The First Affiliated Hospital of Henan University of Chinese Medicine (approval number: 2024HL-159-01), adhered to the principles of the Declaration of Helsinki. Given the retrospective design and use of anonymized clinical data, the requirement for informed consent was waived by the ethics committee. All patient information was anonymized and de-identified before analysis to protect patient privacy in accordance with national health research regulations.

Data collection and feature variables

Clinical data were obtained from the hospital’s electronic medical records and laboratory information systems and meticulously reviewed by two trained research physicians using a standardized case report form (CRF). The dataset included 38 candidate feature variables, classified as follows:

Demographic and general information included age, gender, height, weight, body mass index (BMI), education level (junior high school or below, high school and above), household annual income (<80,000 RMB, ≥80,000 RMB), occupation (mental workers, physical workers), payment type (basic medical care, urban medical care), marital status (married, separated), body temperature, respiratory rate, heart rate, systolic blood pressure, and diastolic blood pressure.

Medical history: Hypertension, diabetes mellitus, hyperlipidemia, hyperhomocysteinemia, prior transient ischemic attack (TIA), ongoing smoking and alcohol consumption.

Hematological and biochemical parameters, including red blood cell count (RBC), white blood cell count (WBC), platelet count (PLT), hemoglobin concentration (HGB), total cholesterol (TC), triglycerides (TG), low-density lipoprotein (LDL), high-density lipoprotein (HDL), prothrombin time (PT), fibrinogen content (FIB), activated partial thromboplastin time (APTT), thrombin time (TT), glycated hemoglobin (HbA1c), creatinine (Cr), uric acid (UA), and homocysteine (Hcy), were assessed.

To ensure data accuracy, all variables underwent validation by cross-referencing with primary reports. Missing or outlier values were addressed following predefined quality control protocols, such as mean imputation for moderately missing data and exclusion for variables with high rates of missing values.

Outcome definition

The main outcome assessed in this study was the occurrence of CHD, which was characterized by a history of myocardial infarction, coronary artery stenosis of ≥50% as verified by angiography, or a clinical diagnosis of angina pectoris substantiated by electrocardiographic and biomarker findings.

Methods

Data collection and preprocessing

All patients entered the cohort at the time of their index clinical evaluation for carotid atherosclerosis, defined as the first documented carotid ultrasound or CTA performed between January 1, 2022 and July 20, 2025 at the First Affiliated Hospital of Henan University of Chinese Medicine. Demographic information, laboratory measurements, vital signs, and medical history variables were all obtained at this same index encounter, ensuring that all predictors reflected true baseline status. Medical history conditions (e.g., hypertension, diabetes, hyperlipidemia, prior stroke) were extracted from structured electronic medical record entries documented before or on the index date. These diagnoses were obtained from structured fields within the hospital’s electronic medical record system, which are internally mapped to ICD-10 diagnostic terminology (e.g., I10 for hypertension, E11 for diabetes, E78.5 for dyslipidemia, I63 for ischemic stroke), ensuring standardized and reproducible extraction.

Because this study followed a cross-sectional diagnostic design, the target outcome – presence or absence of CHD – represented each patient’s clinical status at the index date. No longitudinal follow-up, temporal prediction, or post-index data were used.

A total of 38 clinical variables – including demographics, medical history, lifestyle factors, laboratory indicators, and imaging-confirmation results – were retrieved from electronic medical records. Ultrasound and CTA findings were used solely to confirm the diagnosis of carotid atherosclerosis and were not incorporated as predictor variables due to inconsistent availability of detailed imaging metrics. Continuous variables were standardized using z-score normalization, and missing values were handled according to predefined rules. To prevent information leakage, all preprocessing steps were fitted only on the training set and applied unchanged to the validation and test sets.

To ensure methodological rigor and reproducibility, the modeling workflow followed a strictly sequential pipeline (Figure 1). After preprocessing, the dataset was randomly split into training (70%), validation (15%), and test (15%) subsets through stratified sampling to preserve outcome prevalence. Feature selection was restricted to the training set, beginning with univariate screening followed by LASSO regression with 10-fold cross-validation. Hyperparameters for all ML models were tuned using five-fold cross-validation within the training set. The validation set was used exclusively for probability threshold optimization via the Youden index, and final model performance was evaluated on the independent test set.

Figure 1.

Flowchart of the study design and modeling pipeline. The pipeline included sequential steps: dataset partitioning (training/validation/test), training-set–based preprocessing, training-set–based feature selection, model training with cross-validation, validation-set threshold optimization, final evaluation on the independent test set, and SHAP/DCA analyses.

Dataset partitioning and class imbalance management

To enhance reproducibility and mitigate sampling bias, the dataset underwent stratified random splitting into training (70%), validation (15%), and test (15%) subsets, with a fixed random seed (random_state = 2024). To tackle the moderate class imbalance (CHD: non-CHD ≈ 1:1.55), compatible models such as Logistic regression model, Decision Tree, and Random Forest were trained using inverse class weighting (class_weight = “balanced”). In the case of gradient boosting models like XGBoost and LightGBM, scale_pos_weight was adjusted to approximately 1.55 to address the imbalance (296/191 ≈ 1.55).

Model construction and hyperparameter optimization

Seven ML algorithms were developed: logistic regression, decision tree, Random Forest, K-nearest neighbors (KNN), XGBoost, LightGBM, and a stacking ensemble. Hyperparameters were optimized via five-fold cross-validation performed exclusively within the training set, with AUC as the primary selection metric (Supplementary Table 1).

The stacking ensemble incorporated the six individual models as base learners and logistic regression as the meta-learner. Out-of-fold predictions generated during cross-validation were used to train the meta-learner, ensuring that no information from the validation or test sets influenced its training.

All analyses were conducted in Python using scikit-learn, xgboost, and lightgbm libraries, with fixed random seeds to ensure reproducibility.

Feature selection

The 38 initial candidate predictors were selected based on established coronary heart disease (CHD) risk factors reported in major cardiovascular guidelines (e.g., ESC 2023; ACC/AHA 2019) and prior literature, supplemented by expert clinical input from cardiologists and neurologists at our institution. A total of 38 candidate predictors were initially assessed. Univariate analyses were conducted within the training dataset to identify features with significant differences between CHD and non-CHD groups.

To reduce dimensionality and avoid overfitting, LASSO regression with 10-fold cross-validation was applied to the variables retained from univariate screening. This procedure identified six key predictors used for final model development. No feature selection steps involved the validation or test datasets to avoid information leakage.

Model evaluation metrics

The model’s performance was thoroughly assessed across training, validation, and test datasets utilizing various metrics. Discrimination metrics included AUC, accuracy, sensitivity, specificity, and F1 score. Calibration was evaluated through Brier scores and calibration curves employing 10 equally populated bins. Precision-Recall (PR) analysis assessed model resilience to class imbalance by calculating average precision (AP). Clinical utility was determined through decision curve analysis (DCA) spanning threshold probabilities from 0.01 to 0.8.

Novel evaluation metrics were developed to calculate specificity and generate visual representations through the utilization of plotting tools such as matplotlib, seaborn, and scikit-learn.

Classification metrics that depend on a probability cutoff (e.g., accuracy, sensitivity, specificity, and F1 score) were calculated using thresholds optimized from the validation set rather than the default 0.5 threshold, in order to avoid information leakage from the test set.

Model interpretability via SHAP

To improve transparency and facilitate clinical interpretation, the SHAP method was employed on the optimal Logistic Regression model. A LinearExplainer was instantiated with the training data, and SHAP values were computed on the test set. Various visual representations were produced, such as Summary plots (bar and beeswarm) and Waterfall plots to elucidate individualized prediction breakdowns.

These visual aids facilitated a comprehensive comprehension of both the global and local impacts of features on CHD risk.

DCA

Decision curve analysis was performed on all models to assess the net clinical benefit at various probability thresholds. Threshold-specific net benefit values were computed, and reference lines for the “treat all” and “treat none” strategies were incorporated to provide a clinical decision-making context for the model’s utility. The outcomes of the decision curve analysis were documented in Excel format to ensure transparency and reproducibility.

Software and reproducibility

The analyses were performed utilizing Python 3.13 on a Windows 11 platform. Key packages utilized encompassed pandas, numpy, scikit-learn, xgboost, lightgbm, matplotlib, and shap. Random processes were standardized through fixed seeds (random_state = 2024), and model training protocols were version-controlled to ensure reproducibility.

Sample size and model deployment

The minimum sample size necessary for predictive modeling was determined following guidelines for developing clinical prediction models, considering the principles of events per variable (EPV), discrimination precision, and calibration requirements [Reference Riley, Ensor and Snell15–Reference Collins, Reitsma, Altman and Moons17]. To ensure model stability and prevent overfitting in predicting CHD, a minimum of 10–20 outcome events per predictor variable was aimed for [Reference van Smeden, de Groot and Moons18]. Following LASSO regularization, six variables were chosen, establishing an initial EPV range of 60–120 CHD cases.

To assess adequacy, we employed the pmsampsize R package (v1.1.4) to determine the minimum necessary sample size based on the following criteria: binary outcome; anticipated R ² = 0.15 (a conservative estimate for clinical data); 6 predictors; and an expected outcome prevalence of 30%. The analysis yielded a minimum required sample size of N = 368, with a minimum of 103 CHD events, to attain a shrinkage factor exceeding 0.9 and mitigate overfitting.

Furthermore, an AUC-based power analysis with a significance level of 0.05 and a power of 0.90, aiming to detect a clinically significant improvement in AUC from 0.50 to 0.60, indicated the need for a minimum of 95 cases with CHD and 240 CHD-negative cases, totaling 335 individuals [Reference Cheng, Branscum and Johnson19]. The actual study population, consisting of 487 individuals with 191 CHD cases and 296 controls, exceeded these requirements, guaranteeing adequate power, model stability, and generalizability for both developing the model and comparing its performance.

Results

Univariate analysis and preliminary feature selection

A total of 487 patients diagnosed with carotid atherosclerosis participated in the study, with 191 individuals in the CHD cohort and 296 in the non-CHD control group. Initially, 38 candidate variables were gathered, covering demographic characteristics (e.g., age, sex, body mass index [BMI], education level), medical history (e.g., hypertension, diabetes mellitus, smoking status), biochemical markers (e.g., thrombin time (TT), triglycerides [TG], low-density lipoprotein [LDL], creatinine [Cr], and homocysteine [Hcy]), as well as specific vital signs (e.g., diastolic blood pressure, body temperature). Univariate analysis identified significant differences in variables such as age, BMI, diastolic blood pressure, thrombin time (TT), creatinine (Cr), and homocysteine (Hcy) between the CHD and non-CHD groups (refer to Table 1), indicating their potential relevance to CHD risk. These variables were subsequently incorporated into the feature selection and model development phase.

Table 1.

Baseline characteristics of patients with and without CHD. Demographic, clinical, and biochemical parameters were compared between groups with and without CHD utilizing suitable statistical analyses. Continuous variables were expressed as mean ± standard deviation, while categorical variables were presented as frequencies (percentages). The reported P values signify the statistical significance observed between the two groups

Feature selection via LASSO regression

To enhance feature selection, decrease dimensionality, and alleviate overfitting risks, we employed LASSO regression on the initial set of 38 variables. The regularization parameter (α = 0.1174) was optimized through ten-fold cross-validation, aligning with the lowest mean squared error as illustrated in Figure 2A.

Figure 2.

LASSO regression for feature selection. (A) Ten-fold cross-validation plot for selecting the optimalλvalue minimizing mean squared error. (B) Coefficient trajectories of candidate features across differentλvalues, emphasizing the six ultimate predictors selected.

Six key predictors closely linked to CHD risk in individuals with carotid atherosclerosis were identified: age, BMI, diastolic blood pressure, thrombin time (TT), creatinine (Cr), and homocysteine (Hcy) (Figure 2B). These factors were utilized as input variables for developing a ML model, establishing a reliable basis for precise CHD risk assessment.

Performance evaluation of machine learning models across training, validation, and testing sets discrimination performance

ROC curve analysis

In the training dataset, ensemble models exhibited superior discriminative performance, with XGBoost and Random Forest showing the highest AUC values. Logistic regression model and KNN yielded comparatively lower AUCs (Figure 3A).

Figure 3.

ROC curves of seven machine learning models across datasets. (A) Training set; (B) Validation set; (C) Testing set. Logistic regression model and ensemble models (e.g., XGBoost, Random Forest) exhibited strong discrimination in training but variable generalization performance in testing.

In the validation set, AUC values declined across all models, indicating varying degrees of overfitting. KNN and logistic regression model maintained relatively better discrimination, whereas Decision Tree showed the weakest performance (Figure 3B).

In the testing dataset, the logistic regression model demonstrated the best discrimination (AUC = 0.827), indicating strong generalizability. Stacking and Random Forest also achieved acceptable performance, while Decision Tree showed limited discriminatory capability (Figure 3C; Supplementary Table 2).

Classification performance

Precision-recall curve (PRC) analysis

Patterns in the PRC analysis are generally consistent with the ROC findings. In the training set, ensemble models again demonstrated strong performance, while logistic regression model ranked lower (Figure 4A).

Figure 4.

PR curves of seven machine learning models. (A) Training set. (B) Validation set. (C) Testing set. Logistic regression model achieved the highest PRC-AUC in the testing set, indicating balanced precision and recall under real-world conditions.

In the validation set, all models showed reduced area under the precision–recall curve (PRC-AUC) values. KNN, Random Forest, and logistic regression model provided comparatively better balance between precision and recall, whereas XGBoost exhibited a notable performance drop (Figure 4B).

In the testing dataset, the logistic regression model achieved the best overall PRC-AUC performance, whereas Random Forest and XGBoost demonstrated reduced precision under real-world conditions (Figure 4C).

Calibration performance

Across the training dataset, LightGBM, XGBoost, and Stacking exhibited strong calibration, closely approximating the ideal diagonal line (Figure 5A).

Figure 5.

Calibration curves of machine learning models. (A) Training set. (B) Validation set. (C) Testing set. Logistic regression model, Random Forest, and XGBoost showed relatively good alignment between predicted and observed probabilities in the testing set.

In the validation set, calibration curves varied considerably, suggesting greater uncertainty in predicted probabilities. Only logistic regression model and LightGBM showed moderate concordance with observed outcomes (Figure 5B).

In the testing dataset, logistic regression model, Random Forest, and XGBoost demonstrated more stable calibration, while Decision Tree and KNN exhibited overestimation or underestimation across different probability ranges (Figure 5C).

Model interpretability and clinical utility

To assess overall model stability, a 10-fold cross-validation was conducted during training. All models achieved mean AUC values above 0.65, with random forest and logistic regression model showing the most consistent performance, while Decision Tree, KNN, and Stacking displayed greater variability (Figure 6).

Figure 6.

Ten-fold cross-validation results for all models. Bar plots display the mean AUC with standard deviation for each model across ten validation folds. Logistic regression model and Random Forest achieved the highest average AUCs (0.738 and 0.740, respectively).

Regarding clinical applicability, DCA demonstrated that the logistic regression model consistently yielded favorable net benefit across the training, validation, and testing cohorts (Figure 7). Although LightGBM achieved slightly higher benefit at certain thresholds, its performance was less stable, reinforcing the practical advantages of logistic regression model.

Figure 7.

DCA of the logistic regression model. (A) Training set. (B) Validation set. (C) Testing set. The net benefit across different probability thresholds is illustrated. Logistic regression model consistently outperformed or closely matched other models, confirming its clinical utility.

Based on its balanced discrimination, calibration, and interpretability, the logistic regression model was selected as the final recommended model. SHAP was then applied to enhance model transparency and evaluate feature contributions at both the individual and global levels.

At the individual level, the SHAP waterfall plot revealed that age and diastolic blood pressure exerted the largest negative contributions to predicted CHD risk, while creatinine, thrombin time, and homocysteine increased risk probability (Figure 8A).

Figure 8.

(A) SHAP waterfall plot for individual prediction (Sample 17). The figure shows the contribution of each feature to the CHD prediction for a representative patient. Age and diastolic blood pressure exerted the strongest negative influence, while creatinine and thrombin time had positive contributions. (B) SHAP summary bar plot of feature importance. Mean absolute SHAP values across all testing samples are presented, ranking features by their overall impact on model output. Age, diastolic blood pressure, and thrombin time were the most influential predictors.

At the global level, SHAP summary plots confirmed age and diastolic blood pressure as the most influential predictors, followed by thrombin time (TT), BMI, creatinine (Cr), and homocysteine (Hcy). The ranking of SHAP importance aligned with logistic regression model coefficients, supporting the biological plausibility and interpretability of the model (Figure 8B).

Overall, the logistic regression model demonstrated strong predictive performance, stable generalization, and high interpretability, supporting its potential use in early identification and personalized risk assessment of CHD among patients with carotid atherosclerosis.

Supplementary threshold optimization analysis

To further explore clinically meaningful probability cutoffs, we conducted a supplementary threshold optimization analysis using the validation dataset. For each model, thresholds were varied across a clinically relevant range, and the optimal cutoff was defined according to the Youden index (sensitivity + specificity − 1). This procedure ensured that threshold selection was performed without accessing the test data, thereby preventing information leakage.

As an illustrative example, the optimal threshold for the random forest model shifted from the default 0.50 to 0.55 based on validation-set performance, which improved precision and the F1 score at the expense of sensitivity. These findings indicate that different models may be better suited for distinct clinical priorities (e.g., minimizing false positives vs. maximizing sensitivity). All classification metrics reported in the main text therefore reflect thresholds derived from the validation set rather than any fixed default value(see Supplementary Table 3).

Discussion

This study developed and compared multiple ML models to predict CHD in individuals with carotid atherosclerosis, with a focus on model interpretability and clinical applicability. Overall, a parsimonious logistic regression model demonstrated comparable or superior performance to more complex ensemble algorithms and was therefore selected as the final predictive model. Using LASSO regression, six clinically relevant predictors were identified, and subsequent SHAP analysis confirmed their biological plausibility and relative contribution to risk stratification. Furthermore, decision curve analysis indicated that the logistic regression model provided a consistent net clinical benefit across a broad range of threshold probabilities, supporting its potential utility as a practical tool for CHD risk assessment in this population.

Key findings in context

Logistic regression model outperformed complex ensemble models such as XGBoost, which exhibited substantial overfitting despite high training performance. This observation is consistent with the principle of Occam’s razor, which suggests that simpler models often generalize better in datasets of moderate size with low-to-moderate feature dimensionality [Reference Cheng, Lee, Nfor, Hsiao, Huang and Liaw20]. The overfitting seen in XGBoost and Random Forest models may be attributed to their capacity to memorize noise present in the training data, underscoring the importance of prioritizing real-world utility over training accuracy [Reference Wang, Zhao and Jin21].

The six features selected by the LASSO method exhibited clear biological plausibility [Reference Benjamin, Muntner and Alonso22]. Age, a well-established risk factor for CHD, was identified by SHAP analysis as the most influential variable. However, its negative SHAP value in specific instances (e.g., Sample 17) suggests context-specific effects, possibly indicating nonlinear relationships in elderly patients with carotid atherosclerosis. The inverse relationship between diastolic blood pressure and CHD risk in this particular cohort contradicts findings from general population studies [Reference Chen, Zelnick and Huber8] but may be attributed to the distinct hemodynamic profile of individuals with carotid stenosis, where lower diastolic pressure could potentially mitigate carotid plaque burden [Reference Shimoda, Kitamura and Imano5].

Biochemical markers provide additional evidence supporting pathogenic connections: increased levels of creatinine indicate compromised renal function, a recognized risk factor for CHD through endothelial dysfunction [Reference Raggi and Stein23]; elevated homocysteine levels contribute to oxidative stress and vascular inflammation [Reference Kimenai, Janssen and Eggers24]; and prolonged thrombin time may indicate coagulation abnormalities [Reference Wu, Zhou and Chen25], which are implicated in atherothrombosis. These correlations strengthen the credibility of the model and offer mechanistic explanations for the co-occurrence of CHD in carotid atherosclerosis.

Comparison with existing literature

Our study contributes to the existing literature in three key aspects. Firstly, although previous research has demonstrated the superior performance of ensemble models over traditional approaches in predicting CHD [Reference Cheng, Lee, Nfor, Hsiao, Huang and Liaw20], our results underscore the continued relevance of simpler models in specific subpopulations (e.g., individuals with carotid atherosclerosis) owing to their reliability and ease of interpretation. This finding aligns with recent observations by Evangelou et al [Reference Nazerian, Mueller and Vanni26], who highlighted the variability in ML model efficacy across different populations and scenarios.

The novel identification of TT as a significant predictor distinguishes this study. While many CHD risk models typically emphasize prothrombin time (PT) or INR [Reference Mennini, Meucci and Pesarini27], our findings underscore the clinical relevance of thrombin time, a marker of fibrinogen functionality. This observation may underscore the potential impact of fibrinogen on the advancement of carotid plaque and the development of CHD, necessitating additional research [Reference Chen, Ma and Xue28].

Thirdly, SHAP-based interpretability fills a crucial void in ML research concerning cardiovascular diseases. In contrast to research that merely presents model performance metrics without providing detailed explanations at the feature level [Reference Dorraki, Liao and Abbott29], our investigation precisely determines the impact of each variable on predictions. For instance, we elucidate the dual nature of age as a risk factor in general but as a protective factor in certain instances. This level of transparency is imperative for gaining acceptance among clinicians and facilitating the clinical application of our findings [Reference Lundberg, Erion and Chen30].

Clinical implications

The robust performance of the logistic regression model, combined with its consistent net benefit in DCA, supports its clinical utility for stratifying CHD risk in patients with carotid atherosclerosis. Importantly, the model relies on six routinely collected clinical variables – age, BMI, blood pressure, total cholesterol, creatinine, and homocysteine – making it practical for real-world integration without requiring advanced imaging or specialized diagnostic resources. This simplicity enhances its potential for use in primary care and resource-limited settings, enabling timely and accessible risk assessment.

The model’s interpretability facilitates personalized risk communication. For instance, a patient exhibiting elevated levels of creatinine (Cr) and homocysteine (Hcy) could receive guidance on managing renal function and supplementing with B-vitamins to lower Hcy levels [Reference Mirza, Almansouri and Muslim31]. Similarly, individuals with a high body mass index (BMI) could benefit from tailored advice on weight management. These practical insights serve to connect predictive modeling with preventive healthcare.

Limitations

This study has several limitations. First, its retrospective single-center design may introduce selection bias and limit generalizability to broader populations. Second, mean imputation was applied to handle missing data, which may underestimate variability; more robust methods such as multiple imputation should be considered in future research [Reference White, Royston and Wood32].

Third, although major cardiovascular guidelines recommend additional predictors – such as family history of premature CHD, physical activity, C-reactive protein (CRP), lipoprotein(a), and apolipoprotein B – their inclusion was limited by real-world data constraints. Many of these variables were either unavailable, inconsistently recorded, or exhibited high missingness within the electronic medical record system, and were therefore excluded to avoid introducing bias. Importantly, all available candidate predictors were subjected to standardized feature selection procedures, and only variables identified as stable contributors through univariate analysis and LASSO regression were retained.

Fourth, several potentially relevant factors – such as medication history, carotid plaque morphology, and treatment adherence – were not available, which may have affected predictive performance. Finally, the model has yet to be tested in external cohorts; prospective multicenter validation will be essential before clinical implementation [Reference Collins, Reitsma, Altman and Moons17].

Future directions

Future research should focus on several key areas: (1) validating the model across multiple centers using prospective cohorts to ensure broader generalizability; (2) incorporating additional data modalities – such as carotid ultrasound–derived imaging markers and relevant genetic indicators –to further enhance predictive precision as data availability improves; (3) exploring dynamic threshold optimization tailored to different clinical scenarios (e.g., screening versus secondary prevention); and (4) developing user-friendly tools, such as web-based or electronic health record–integrated calculators, to facilitate seamless clinical implementation.

In conclusion, this study shows that a concise logistic regression model, constructed using biologically relevant predictors, provides reliable CHD risk assessment in patients with carotid atherosclerosis. The model’s interpretability and clinical usefulness render it a valuable instrument for individualized risk assessment, underscoring the significance of harmonizing model intricacy with practical relevance in cardiovascular investigations.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/cts.2026.10722.

Data availability statement

The datasets produced and/or analyzed in this study are not publicly accessible due to patient confidentiality and institutional guidelines; however, they can be obtained from the corresponding author upon request.

Author contributions

Lei Zhang: Conceptualization, Data curation, Funding acquisition, Writing-original draft, Writing-review & editing; Mengke Lyu: Conceptualization, Data curation, Methodology, Resources, Visualization, Writing-original draft, Writing-review & editing; Mingyuan Du: Data curation; Yizhuo Li: Data curation; Haifeng Yan: Data curation; Xiaohui Li: Data curation, Resources; Wenshuang Niu: Data curation; Lizhi Pang: Data curation.

Funding statement

This research was financially supported by the National Natural Science Foundation of China (82205021, 82405290).

Competing interests

The authors declare that they have no competing interests.

Ethical standard

This retrospective study, approved on April 1, 2024 by the Ethics Committee of The First Affiliated Hospital of Henan University of Chinese Medicine (approval number: 2024HL-159-01), adhered to the principles of the Declaration of Helsinki. Given the retrospective design and use of anonymized clinical data, the requirement for informed consent was waived by the ethics committee. All patient information was anonymized and de-identified before analysis to protect patient privacy in accordance with national health research regulations.

Consent for publication

Not applicable.

References

Roth, GA, Abate, D, Abate, KH, et al. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392:1736–1788. doi: 10.1016/s0140-6736(18)32203-7.CrossRef Google Scholar

Ley, K. Inflammation and atherosclerosis. Cells. 2021;10:1197. doi: 10.3390/cells10051197.CrossRef Google Scholar PubMed

Cuomo, S. Increased carotid intima–media thickness in children–adolescents, and young adults with a parental history of premature myocardial infarction. Eur Heart J. 2002;23:1345–1350. doi: 10.1053/euhj.2001.3111.CrossRef Google Scholar

Pitchika, A, Markus, MRP, Schipf, S, et al. Effects of apolipoprotein E polymorphism on carotid intima-media thickness, incident myocardial infarction and incident stroke. Sci Rep. 2022;12:5142. doi: 10.1038/s41598-022-09129-5.CrossRef Google Scholar PubMed

Shimoda, S, Kitamura, A, Imano, H, et al. Associations of carotid intima-media thickness and plaque heterogeneity with the risks of stroke subtypes and coronary artery disease in the Japanese general population: the circulatory risk in communities study. J Am Heart Assoc. 2020;9:e017020. doi: 10.1161/jaha.120.017020.CrossRef Google Scholar PubMed

O’Leary, DH, Polak, JF, Kronmal, RA, Manolio, TA, Burke, GL, Wolfson, SK. Carotid-artery intima and media thickness as a risk factor for myocardial infarction and stroke in older adults. N Engl J Med. 1999;340:14–22. doi: 10.1056/NEJM199901073400103.CrossRef Google Scholar PubMed

D’Agostino, RB, Vasan, RS, Pencina, MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008;117:743–753. doi: 10.1161/CIRCULATIONAHA.107.699579.CrossRef Google Scholar PubMed

Chen, Y, Zelnick, LR, Huber, MP, et al. Association between kidney clearance of secretory solutes and cardiovascular events: the chronic renal insufficiency cohort (CRIC) study. Am J Kidney Dis. 2021;78:226–235.e1. doi: 10.1053/j.ajkd.2020.12.005.CrossRef Google Scholar PubMed

Calle, EE, Thun, MJ, Petrelli, JM, Rodriguez, C, Heath, CW. Body-mass index and mortality in a prospective cohort of U.S. adults. N Engl J Med. 1999;341:1097–1105. doi: 10.1056/NEJM199910073411501.CrossRef Google Scholar

Gajalakshmi, V, Lacey, B, Kanimozhi, V, Sherliker, P, Peto, R, Lewington, S. Body-mass index, blood pressure, and cause-specific mortality in India: a prospective cohort study of 500 810 adults. Lancet Glob Health. 2018;6:e787–e794. doi: 10.1016/S2214-109X(18)30267-5.CrossRef Google Scholar

Banerjee, A, Chen, S, Fatemifar, G, et al. Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility. Bmc Med. 2021;19:85. doi: 10.1186/s12916-021-01940-7.CrossRef Google Scholar PubMed

El-Wahab, EWA. Predicting coronary heart disease using risk assessment charts and risk factor categories. J Public Health. 2020;29:1037–1045. doi: 10.1007/s10389-020-01224-z.CrossRef Google Scholar

Nizette, F, Hammedi, W, Van Riel, ACR, Steils, N. Why should I trust you? Influence of explanation design on consumer behavior in AI-based services. J Serv Manage. 2024;36:50–74. doi: 10.1108/JOSM-05-2024-0223.CrossRef Google Scholar

Lundberg, S, Lee, S-I. A Unified Approach to Interpreting Model Predictions. Adv Neural Inf Process Syst. 2017;30:4765–4774. doi: 10.48550/arXiv.1705.07874.Google Scholar

Riley, RD, Ensor, J, Snell, KIE, et al. Calculating the sample size required for developing a clinical prediction model. The BMJ. 2020;368:m441. doi: 10.1136/bmj.m441.CrossRef Google Scholar PubMed

Riley, RD, Ensor, J, Snell, KIE, et al. Importance of sample size on the quality and utility of AI-based prediction models for healthcare. Lancet Digit Health. 2025;7:100857. doi: 10.1016/j.landig.2025.01.013.CrossRef Google Scholar PubMed

Collins, GS, Reitsma, JB, Altman, DG, Moons, KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Eur Urol. 2015;67:1142–1151. doi: 10.1136/bmj.g7594.CrossRef Google Scholar PubMed

van Smeden, M, de Groot, JAH, Moons, KGM, et al. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. Bmc Med Res Methodol. 2016;16:163. doi: 10.1186/s12874-016-0267-3.CrossRef Google Scholar PubMed

Cheng, D, Branscum, AJ, Johnson, WO. Sample size calculations for ROC studies: parametric robustness and Bayesian nonparametrics. Stat Med. 2011;31:131–142. doi: 10.1002/sim.4396.CrossRef Google Scholar PubMed

Cheng, CH, Lee, BJ, Nfor, ON, Hsiao, CH, Huang, YC, Liaw, YP. Using machine learning-based algorithms to construct cardiovascular risk prediction models for Taiwanese adults based on traditional and novel risk factors. Bmc Med Inform Decis. 2024;24:199. doi: 10.1186/s12911-024-02603-2.CrossRef Google Scholar PubMed

Wang, C, Zhao, Y, Jin, B, et al. Development and validation of a predictive model for coronary artery disease using machine learning. Front Cardiovasc Med. 2021;8:700951. doi: 10.3389/fcvm.2021.614204.Google Scholar PubMed

Benjamin, EJ, Muntner, P, Alonso, A, et al. Heart disease and stroke statistics—2019 update: a report from the American Heart Association. Circulation. 2019;139:e56–e528. doi: 10.1161/CIR.0000000000000659.CrossRef Google Scholar PubMed

Raggi, P, Stein, JH. Carotid intima-media thickness should not be referred to as subclinical atherosclerosis: a recommended update to the editorial policy at atherosclerosis. Atherosclerosis. 2020;312:119–120. doi: 10.1016/j.atherosclerosis.2020.09.015.CrossRef Google Scholar PubMed

Kimenai, DM, Janssen, EBNJ, Eggers, KM, et al. Do age-adjusted sex-specific cut-off values improve the agreement between high sensitivity cardiac troponins I and T? A retrospective study. Clin Chim Acta. 2021;519:76–82. doi: 10.1016/j.cca.2021.04.007.Google Scholar

Wu, X, Zhou, Q, Chen, Q, et al. Association of homocysteine level with risk of stroke: a dose–response meta-analysis of prospective cohort studies. Nutr Metab Cardiovasc Dis. 2020;30:1861–1869. doi: 10.1016/j.numecd.2020.07.026.CrossRef Google Scholar PubMed

Nazerian, P, Mueller, C, Vanni, S, et al. Integration of transthoracic focused cardiac ultrasound in the diagnostic algorithm for suspected acute aortic syndromes. Eur Heart J. 2019;40:1952–1960. doi: 10.1093/eurheartj/ehz207.CrossRef Google Scholar PubMed

Mennini, FS, Meucci, F, Pesarini, G, et al. Cost-effectiveness of transcatheter aortic valve implantation versus surgical aortic valve replacement in low surgical risk aortic stenosis patients. Int J Cardiol. 2022;357:26–32. doi: 10.1016/j.ijcard.2022.03.034.CrossRef Google Scholar PubMed

Chen, M, Ma, X, Xue, Y, et al. Association of sleep duration with incident carotid plaque: a prospective cohort study. J Am Heart Assoc. 2025;14:e039215. doi: 10.1161/JAHA.124.039215.CrossRef Google Scholar PubMed

Dorraki, M, Liao, Z, Abbott, D, et al. Cardiovascular disease risk prediction via machine learning using mental health data. Eur Heart J. 2022;43:4858–4869. doi: 10.1093/ehjdh/ztac076.2784.CrossRef Google Scholar

Lundberg, SM, Erion, G, Chen, H, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56–67. doi: 10.1038/s42256-019-0138-9.CrossRef Google Scholar PubMed

Mirza, AMW, Almansouri, NE, Muslim, MF, et al. Effect of vitamin D supplementation on cardiovascular outcomes: an updated meta-analysis of RCTs. Ann Med Surg. 2024;86:6665–6672. doi: 10.1097/MS9.0000000000002458.CrossRef Google Scholar PubMed

White, IR, Royston, P, Wood, AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30:377–399. doi: 10.1002/sim.4067.CrossRef Google Scholar PubMed

Figure 1. Flowchart of the study design and modeling pipeline. The pipeline included sequential steps: dataset partitioning (training/validation/test), training-set–based preprocessing, training-set–based feature selection, model training with cross-validation, validation-set threshold optimization, final evaluation on the independent test set, and SHAP/DCA analyses.

Table 1. Baseline characteristics of patients with and without CHD. Demographic, clinical, and biochemical parameters were compared between groups with and without CHD utilizing suitable statistical analyses. Continuous variables were expressed as mean ± standard deviation, while categorical variables were presented as frequencies (percentages). The reported P values signify the statistical significance observed between the two groups

Figure 2. LASSO regression for feature selection. (A) Ten-fold cross-validation plot for selecting the optimalλvalue minimizing mean squared error. (B) Coefficient trajectories of candidate features across differentλvalues, emphasizing the six ultimate predictors selected.

Figure 3. ROC curves of seven machine learning models across datasets. (A) Training set; (B) Validation set; (C) Testing set. Logistic regression model and ensemble models (e.g., XGBoost, Random Forest) exhibited strong discrimination in training but variable generalization performance in testing.

Figure 4. PR curves of seven machine learning models. (A) Training set. (B) Validation set. (C) Testing set. Logistic regression model achieved the highest PRC-AUC in the testing set, indicating balanced precision and recall under real-world conditions.

Figure 5. Calibration curves of machine learning models. (A) Training set. (B) Validation set. (C) Testing set. Logistic regression model, Random Forest, and XGBoost showed relatively good alignment between predicted and observed probabilities in the testing set.

Figure 6. Ten-fold cross-validation results for all models. Bar plots display the mean AUC with standard deviation for each model across ten validation folds. Logistic regression model and Random Forest achieved the highest average AUCs (0.738 and 0.740, respectively).

Figure 7. DCA of the logistic regression model. (A) Training set. (B) Validation set. (C) Testing set. The net benefit across different probability thresholds is illustrated. Logistic regression model consistently outperformed or closely matched other models, confirming its clinical utility.

Figure 8. (A) SHAP waterfall plot for individual prediction (Sample 17). The figure shows the contribution of each feature to the CHD prediction for a representative patient. Age and diastolic blood pressure exerted the strongest negative influence, while creatinine and thrombin time had positive contributions. (B) SHAP summary bar plot of feature importance. Mean absolute SHAP values across all testing samples are presented, ranking features by their overall impact on model output. Age, diastolic blood pressure, and thrombin time were the most influential predictors.

Zhang et al. supplementary material 1

Zhang et al. supplementary material

DOI: https://doi.org/10.1017/cts.2026.10722.sm001

File 17.5 KB

Zhang et al. supplementary material 2

Zhang et al. supplementary material

DOI: https://doi.org/10.1017/cts.2026.10722.sm002

File 20.3 KB

Zhang et al. supplementary material 3

Zhang et al. supplementary material

DOI: https://doi.org/10.1017/cts.2026.10722.sm003

File 34.4 KB

Zhang et al. supplementary material 4

Zhang et al. supplementary material

DOI: https://doi.org/10.1017/cts.2026.10722.sm004

File 13.1 KB

Article contents

Explainable machine learning for predicting coronary heart disease risk in patients with carotid atherosclerosis: A retrospective study with SHAP and decision curve analysis

Abstract

Keywords

Information

Introduction

Study data

Data collection and feature variables

Outcome definition

Methods

Data collection and preprocessing

Dataset partitioning and class imbalance management

Model construction and hyperparameter optimization

Feature selection

Model evaluation metrics

Model interpretability via SHAP

DCA

Software and reproducibility

Sample size and model deployment

Results

Univariate analysis and preliminary feature selection

Feature selection via LASSO regression

Performance evaluation of machine learning models across training, validation, and testing sets discrimination performance

ROC curve analysis

Classification performance

Precision-recall curve (PRC) analysis

Calibration performance

Model interpretability and clinical utility

Supplementary threshold optimization analysis

Discussion

Key findings in context

Comparison with existing literature

Clinical implications

Limitations

Future directions

Supplementary material

Data availability statement

Author contributions

Funding statement

Competing interests

Ethical standard

Consent for publication

References

Zhang et al. supplementary material 1

Zhang et al. supplementary material 2

Zhang et al. supplementary material 3

Zhang et al. supplementary material 4

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests