Nomenclature
- EEG
-
electroencephalogram
- PPG
-
photoplethysmogram
- NASA-TLX
-
NASA Task Load Index
- ECG
-
eelectrocardiography
- fNIRS
-
functional near-infrared spectroscopy
- HRV
-
heart rate variability
- PSD
-
power spectral density
- ANS
-
autonomic nervous system
- FIR
-
finite impulse response
- ICA
-
independent component analysis
- PPI
-
pulse-to-pulse interval
- SVM
-
support vector machine
- KNN
-
K-nearest neighbours
- RF
-
random forest
- CNN
-
convolutional neural networks
- ROC
-
receiver operating characteristic
Greek symbol
- δ
-
Delta waves
- θ
-
Theta waves
- α
-
Alpha waves
- β
-
Beta waves
- γ
-
Gamma waves
1.0 Introduction
Although technological development can assist pilots in flight and enhance aviation safety, pilots remain an indispensable element of the human–machine-environment system. Previous studies have shown that human error is the contributing factor of aircraft accidents [Reference Morais1]. Communication, the processing of multiple sources of information, and the maintenance of a high level of situational awareness are required for pilots to respond to critical situations and ensure flight safety, which will cause a high mental workload. Additionally, carrier-based aircraft pilots experience a more significant mental workload than other pilots, especially during combat scenarios and high-risk tasks. Flight phases have also been proven to be strongly correlated with flight accidents [Reference Zhang2]. Statistics show that approximately 41.5% of carrier-based aircraft accidents occur during the landing stage, which requires precise control, rapid decision-making and continuous adaptation to dynamic factors such as deck motion and sea conditions, significantly increasing cognitive demands and resulting in a higher mental workload for pilots [Reference Wang3]. When task demands exceed a pilot’s cognitive processing capacity, mental workload is considered excessively high, resulting in tunnel vision, making it difficult for pilots to detect critical environmental information or warning signals, while also causing slower decision-making, which increases the likelihood of operational errors. Moreover, under high workload conditions, pilots may experience anxiety and stress, further impairing their ability to react quickly and accurately to unexpected events such as system failures or adverse weather, thereby increasing the risk of flight accidents. Therefore, it is crucial to detect and understand the mental workload variation of carrier-based aircraft pilots and manage pilots’ real-time mental workload during flight tasks based on the detection results, as this approach will help pilots maintain good mental condition and enhance their combat ability and flight safety.
Mental workload is defined as the cognitive resources and mental effort required to perform a specific task [Reference Young4]. In scientific research, three primary methods are widely used to assess mental workload: subjective measures, performance measures, and physiological measures [Reference Charles and Nixon5]. Subjective assessment methods primarily use questionnaires, such as the NASA Task Load Index (NASA-TLX), overall workload scale, and subjective workload analysis technique, to collect self-reported workload data from subjects for an initial evaluation. Performance measures evaluate mental workload by assessing reaction times, error rates, and task scores during tasks. Physiological measures provide a quantitative assessment through physiological signals such as electroencephalography (EEG), electrocardiography (ECG), and photoplethysmography (PPG) [Reference Tao6].
Previous studies have developed mental workload assessment methods in the aviation field. For example, Li et al. [Reference Li7] used statistical measurement methods with functional near-infrared spectroscopy (fNIRS) and ECG to detect changes in mental workload during multitasking in a simulated flight. Wang et al. [Reference Wang8] investigate the dynamic process of mental workload for pilots performing a risky flight task in simulated scenarios, using statistically significant analysis of HRV features between different workload groups. These studies use statistical analysis methods to explore the relationship between mental workload and physiological features. Further, there has been progress in mental workload classification based on physiological features. Hernández et al. [Reference Hernández-Sabaté9] proposed a convolutional neural network to classify EEG features across different mental workloads in a continuous performance task test. Zhang et al. detected and assessed pilots’ mental workload during emergency flight scenarios induced by different equipment failures based on fNIRS data, and the results showed that pilots’ mental workload was strongly associated with the activities of different brain regions [Reference Zhang10]. Mohanavelu et al. conducted several studies investigating the mental workload of fighter pilots. One study examined the relationship between dynamic mental workload and fighter pilots’ performance using ECG and NASA-TLX scores [Reference Mohanavelu11], while another assessed pilots’ mental workload by monitoring neuronal activities [Reference Mohanavelu12]. Though the aforementioned studies have made progress in mental workload classification, research on carrier-based aircraft pilots remains limited. Furthermore, in the realm of multiple sensor signal fusion, existing research frequently combines EEG signals with fNIRS signals due to their complementary nature in capturing brain activity [Reference Debie13]. However, few studies have focused on the fusion of EEG signals with PPG signals. EEG measures the brain’s neural electrical activity, providing valuable insights into brain dynamics and mental states [Reference Zemla14, Reference Seleznov15]. The power spectral density (PSD) in each frequency band of EEG signals is a widely used indicator and has proven effective and sensitive in recognising mental workload [Reference Chikhi, Matton and Blanchet16]. PPG, on the other hand, measures heart rate by detecting variations in light absorption due to the pulsatile nature of blood flow and can serve as a sensitive parameter for detecting mental workload [Reference Hao17]. Heart rate variability (HRV) features extracted from PPG signals can be used to analyse the relationship between autonomic nervous system (ANS) activity and mental workload [Reference Arakaki18]. By integrating EEG and PPG signals, a comprehensive analysis of mental workload changes can be achieved, capturing both brain-related and heart-related physiological responses. Moreover, advanced portable physiological signal devices have increasingly demonstrated practicality, reliability and accuracy comparable to traditional research-grade devices, while being convenient and cost-efficient across various scientific research fields [Reference Kam19–Reference Barki and Chung21]. These methods and data can offer a robust approach for mental workload classification in carrier-based aircraft pilots.
Furthermore, portable physiological signal devices have increasingly demonstrated practicality and reliability in different scientific research fields. These devices are convenient and cost-efficient and have been shown to have similar accuracy to traditional research-grade devices. For example, Kam et al. compared the quality of EEG recordings obtained from wireless systems against that of traditional research-grade EEG systems and showed that wireless EEG systems are suitable for experimental studies [Reference Kam19]. Marini et al. performed a systematic comparison of signal quality between wireless EEG systems and research-grade EEG systems and reported that PSDs were strongly correlated with two systems, with minor topographical variability across the scalp, indicating the robustness and reliability of the portable system [Reference Marini20]. Moreover, the application of portable devices in experiments can minimise interference with the subjects and collect precise physiological signals that accurately reflect changes in mental workload induced by tasks. Studies have indicated the effectiveness of portable devices in mental workload detection. For instance, Barki et al. used portable PPG systems to detect mental workload, achieving high levels of accuracy (92.04%) and F1 score (90.8%) [Reference Barki and Chung21].
The present study designed a simulation experiment for carrier-based aircraft flight tasks, during which self-reported mental workload scores and physiological signals were collected using NASA-TLX scales and portable devices. The subjective workload scores were used to investigate and categorise the mental workload of the three flight phases into low, moderate and high levels. Following this, features in the time domain, frequency domain and nonlinear domain were extracted from the PPG and EEG signals. To further analyse these features, statistical analysis was conducted to determine their significance across different mental workload levels. Then, two multimodal fusion methods were developed: feature-level fusion and decision-level fusion. Machine learning and deep learning algorithms were then employed to build mental workload classifiers, allowing for a comparative evaluation of the performance between multimodal fusion and single-modal approaches. The findings are expected to provide a robust framework for the classification and management of carrier-based aircraft pilot mental workload during flight training, thereby improving flight safety in practice.
2.0 Materials and methods
The research framework is illustrated in Fig. 1. First, subjective mental workload scores and physiological data were collected. Subsequently, data pre-processing and feature extraction were performed, followed by statistical analysis to determine the significance of features across different mental workload levels. The study then examined the correlation between mental workload and physiological indicators, with EEG features reflecting brain activity and HRV features from PPG data indicating autonomic nervous system activity. Finally, machine learning and deep learning algorithms were employed for mental workload classification, and the performance of single-modal and multimodal approaches was compared.
The research framework of this study.

2.1 Experiment design and data collection
2.1.1 Participants
Five male pilot cadets and one female pilot cadet (age 19.5 ± 0.5 years), from the Flight Technology College of the Civil Aviation Flight University of China, volunteered to participate in this study. They were well-trained and had an average of 80 hours of carrier-based aircraft flight simulation experience in Su-33 simulators. To ensure an adequate sample size, this study requires each participant to complete the tasks multiple times. All the participants were right-handed according to the Edinburgh Handedness Inventory and had normal eyesight and hearing. There was no history of mental illness or drug use among the participants. Before the experiment, all participants read and signed a consent form. This research adhered to the principles of the Declaration of Helsinki and received approval from the Ethical Review Board at Southwest Jiaotong University (No. SWJTU-2109-001-QT). The selection of six pilot cadets was based on the availability of qualified participants within a controlled training environment, ensuring consistency in training background and simulator experience. Although the sample size is small, the use of a controlled setting minimised variability, allowing for more precise workload assessment. Additionally, the repeated-measures design enabled each participant to complete multiple trials, enhancing data reliability and compensating for the limited sample size.
2.1.2 Apparatus
The simulator used in this study is a 2D screen-based simulation software (Su-33 carrier-based aircraft flight simulation system), as shown in Fig. 2. This system incorporates professional flight simulation software, including Prepar3D V3 and Microsoft Flight Simulator X. The display system presents pilots with a cockpit perspective of a carrier-based aircraft, simulating realistic visual scenes and weather conditions, and displaying flight instruments to allow real-time monitoring and adjustment of flight parameters. Additionally, the simulator includes control sticks and throttle levers to enhance the authenticity of the flight control experience. Although 2D simulations offer less operational realism compared to 3D simulations, many studies have shown that physiological data collected from 2D simulations can accurately reflect changes in mental workload. For example, Liu et al. [Reference Liu22] used Cessna 2D flight simulation software to simulate the pilot’s five-leg flight, and their physiological data analysis showed consistent brain activity patterns under different mental workload. Kakkos et al. [Reference Kakkos23] compared the mental workload classification performance of EEG functional connectivity features in 2D and 3D flight simulations, finding both scenarios achieved a classification accuracy of 82%.
Illustration of the flight simulation system.

EMOTIV INSIGHT, a 5-channel portable EEG device, is used for real-time EEG signal collection. EMOTIV INSIGHT features five semidry polymer sensors (AF3, AF4, T7, T8 and Pz) and two references (common mode sense and driven right leg). These sensors are placed in accordance with the international 10–20 EEG system, with AF3 and AF4 on the frontal lobe, T7 and T8 on the temporal lobe, and Pz on the occipital lobe. The system has a sampling rate of 128 Hz and supports connectivity with computers or mobile phones via Bluetooth. For PPG signal collection, the portable device Polar Verity Sense was used. Polar Verity Sense can provide real-time heart rate data by using optical sensors to track heartbeats, and it can also collect pulse-to-pulse (PPI) data via Polar Sensor Logger software. The device is compact and lightweight and supports Bluetooth and ANT+ connectivity. Figure 3 provides a detailed representation and information on EMOTIV INSIGHT and Polar Verity Sense.
Wearable devices. (a) EMOTIV INSIGHT and channel locations; (b) Polar Verity Sense.

2.1.3 Experimental tasks and data collection
The experimental tasks consisted of three flight phases – take off and climb, cruise and approach and landing – and participants were required to accurately fly over multiple waypoints during the cruise phase to simulate real combat scenarios for carrier-based aircraft. Before the experiment, all participants were informed of the detailed information they received regarding the experimental tasks and the specific procedures involved. During the experiment, participants were equipped with portable devices, and these devices were checked to ensure that the data were recorded correctly, followed by a ten-minute resting period to stabilise the subjects’ physiological signals. In the flight task, the subjects operated the Su-33 aircraft, taking off from the runway at the Batumi Airport and sequentially navigating through waypoints to land on the aircraft carrier Kuznetsov. Each flight task was followed by a 10-minute rest interval to mitigate the mental workload induced by the previous task. After completing all tasks, participants filled out the NASA-TLX scale for each flight phase to assess their subjective mental workload. Participants were required to successfully complete the carrier landing tasks three times to collect sufficient data. The flight route is shown in Fig. 4. The following is the detailed task requirement of different flight phases:
-
1. Take off and Climb Phase: Participants operated the Su-33 on the airport runway, applying full throttle to reach a speed of 320 km/h before taking off. They then needed to fly over point 2 at an altitude of 1,200 meters with a vertical deviation tolerance of 100 meters and a 500-meter radius, followed by waypoint 3 at an altitude of 600 meters with a vertical deviation tolerance of 80 meters and a 400-meter radius.
-
2. Cruise Phase: Participants flew over points 4, 5, 6, and 7 along the navigation route, maintaining an altitude of 380 meters with a vertical deviation tolerance within 20 meters and a radius of 200 meters, while keeping the speed below 600 km/h.
-
3. Approach and Landing Phase: The aircraft’s flight altitude descended from 380 meters at point 7 to the final landing on the aircraft carrier. The required landing speed was below 320 km/h.
The simulation flight route.

2.2 Data processing
2.2.1 Subjective mental workload scores
The subjective mental workload ratings of participants were collected using the NASA-TLX scale. This scale comprises six dimensions: Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration. Participants rated each dimension based on their subjective experiences during the task, using a scale from 0 to 10. The weights for each dimension were determined through pairwise comparisons, and the overall mental workload was calculated as the weighted average of these dimensions.
2.2.2 EEG data preprocessing and feature extraction
EEG signal preprocessing is essential for obtaining robust data and results. In this study, five steps were followed: (a) positioning the five EEG sensors from EMOTIV INSIGHT according to the international 10–20 EEG system, (b) applying average referencing by subtracting the average signal from all other electrodes, (c) interpolating missing or bad channels using information from surrounding electrodes, (d) filtering the signals with a finite impulse response (FIR) filter (1 Hz to 40 Hz) to remove noise, and (e) using independent component analysis (ICA) to detect and remove artifacts, particularly electrooculogram artifacts. All EEG signal processing procedures were performed using the open-source Python package MNE 1.3.0 [Reference Gramfort24].
The pre-processed EEG signals were segmented into 30-second epochs with a 50% overlap using a series of orthogonal tapered window functions. PSD features [Reference Gu25], time-domain features (mean, variance, standard deviation, peak-to-peak amplitude, skewness, kurtosis, root mean squared, zero crossings, Hjorth mobility, Hjorth complexity) [Reference Phanikrishna and Chinara26], and nonlinear features (sample entropy, approximate entropy) [Reference Zhou27], were extracted across the six frequency bands: δ (1–3 Hz), θ (4–7 Hz), α (8–11 Hz), β1 (12–20 Hz), β2 (21–29 Hz) and γ1 (30–40 Hz) [Reference Gu25]. In total, 30 frequency domain features (6 bands × 5 electrodes), 50 time domain features (10 features × 5 electrodes), and 10 nonlinear features (2 features × 5 electrodes) were extracted. Detailed information about these features is provided in Table 1.
Extracted EEG features

2.2.3 PPG data preprocessing and feature extraction
The collected PPG signals contained noise such as motion artifacts, illumination variations, and baseline drift, leading to outliers and ectopic beats [Reference Talukdar28]. Outliers were defined as data points significantly outside the range of 280 ms to 1,500 ms [Reference Tiwari29], while ectopic beats were identified when the PPI differed by more than 20% from the previous PPI [Reference Gilgen-Ammann, Schweizer and Wyss30]. To ensure robust data and results, these outliers and ectopic beats were removed, and missing values were filled using linear interpolation. HRV features from the time domain, frequency domain, and nonlinearity were then extracted from the PPI data. A 30-second time window with a 50% overlap was used to extract HRV features, allowing for dynamic and continuous analysis of mental workload variations. Detailed information about each feature is shown in Table 2.
Extracted HRV features

2.2.4 Statistical analysis
To analyse the subjective mental workload scores and determine the cognitive workload levels across different flight phases, a significance analysis was conducted. This was followed by a Bonferroni post-hoc test to further explore significant differences between the flight phases. The significance of physiological features in relation to cognitive workload levels was also analysed. All statistical analyses were performed using SPSS 26.0, with the significance level set at p < 0.05. Moreover, the study analysed the variations in physiological features in relation to mental workload levels to understand their trends and associated physiological activities.
Additionally, for single-modal feature selection, PCA was employed. PCA is a widely used dimensionality reduction technique in various research fields. For single-modal feature selection, PCA plays a crucial role in identifying the most informative features from each physiological signal type. By achieving a cumulative variance ratio of principal components exceeding 90%, PCA significantly enhances efficiency by reducing data dimensionality while preserving essential information [Reference Shao31].
2.3 Mental workload classification
2.3.1 Multimodal fusion methods
The framework of multimodal fusion methods is shown in Fig. 5.
Framework of multimodal fusion methods: decision level fusion and feature level fusion.

Feature-level fusion is a method of multimodal data fusion that integrates features from different sensors or data sources into a comprehensive feature set to enhance data robustness [Reference Cai32]. It typically involves multiple data sources or sensors, each providing diverse types of information. The objective of this fusion approach is to combine information from these sources to achieve a more comprehensive understanding. In this research, we extract features from both EEG and PPG data and concatenated them into a unified feature vector. This feature vector is subsequently processed using PCA for dimension reduction, resulting in a final set of fused features.
Decision-level fusion is a technology in the field of information fusion and signal processing that involves the integration of multiple information sources or data at the decision level to make a final decision or draw a conclusion based on the integrated information [Reference Gaw, Yousefi and Gahrooei33]. Here, we considered the approach of using the same classifier with different datasets to fuse two sets of physiological features at the decision level, allowing for a comparison of classifier performance on both single-modal and multimodal datasets. The selected EEG and PPG features using PCA were fed into classifiers, with each classifier making decisions separately. A soft voting method was subsequently used for decision fusion in this study. Individual classification models are used for each modality to generate prediction results, and these results are aggregated with weights based on each modality’s classifier accuracy.
2.3.2 Classification algorithms
Three machine learning algorithms – support vector machine (SVM) [Reference Cervantes34], K-nearest neighbours (KNN) [Reference Zhou27], and random forest (RF) [Reference Speiser35] – along with one deep learning algorithm, convolutional neural networks (CNN), were used for three-level mental workload classification. SVM maps data into a higher-dimensional space, facilitating the identification of the optimal hyperplane that best separates data points into different classes. KNN, a non-parametric, instance-based learning algorithm, classifies data points based on the majority class among their k-nearest neighbours in the feature space. RF, an ensemble learning method, utilises multiple decision trees for classification, enhancing accuracy and reducing overfitting by aggregating the predictions of these trees. The optimal parameters for SVM (C), K-nearest neighbours (K), and random forest (the number of decision trees) were determined using a five-fold grid search.
CNNs are ideal for processing structured data and are widely used for classification tasks. The architecture used in this study comprised five hidden layers: two convolutional layers and three fully connected layers. It began with an input layer, followed by a dense layer with 256 neurons using the ReLU activation function. The data was reshaped into a 16x16 tensor for convolution operations with 4x4 receptive fields and one unit of padding. Max pooling with a 1x2 window was applied, and a dropout layer with a 0.2 rate was used to mitigate overfitting. The subsequent fully connected layers had 128 and 64 neurons with ReLU activation.
Features extracted from the PPG and EEG data were randomly divided into five-fold cross-validation sets. In each iteration, four sets were used for training classifiers, while the remaining set was used for testing. This process was repeated five times to ensure that each segment was subjected to both training and testing. The evaluation of classifier performance was based on various metrics, including accuracy, precision, recall, F1 score and the area under the receiver operating characteristic (ROC) curve, which can provide a comprehensive assessment of classifier effectiveness.
3.0 Results
3.1 Subjective measurement results
Figure 6 displays the NASA-TLX scores for each flight phase. The results of ANOVA for subjective mental workload scores showed significant differences among the three flight phases (p < 0.05). The Bonferroni post hoc analysis further indicated that subjective scores in the approach and landing phase and cruise phase were significantly higher than those in the take off and climb phase (p < 0.05). The results indicated that pilots experience the highest mental workload during the approach and landing phase, followed by the cruise phase, and the lowest workload during the take off and climb phase. Based on these results, we categorise the data collected in different flight phases into three level mental workload.
The boxplot of NASA-TLX scores.

3.2 Interrelations between mental workload and physiological indicators
Physiological indicators have been found to be correlated with mental workload, providing insights into the interrelations between mental processes and physiological responses [Reference Jin36]. HRV is a measure of the variation in time between successive heartbeats. HRV quantifies the changes in the time intervals between each heartbeat, reflecting ANS activity and serving as an indicator of the balance between the sympathetic and parasympathetic branches of the ANS [Reference Shao37]. Normally, a higher HRV suggests a resilient ANS capable of effective responses to stressors, while a lower HRV often signifies a dominance of sympathetic activity and potential links to stress, anxiety, and mental workload [Reference Reyes del Paso38]. Therefore, we investigated the variations in HRV features to better understand how ANS activity responds to mental workload. Table 3 presents the results of the significance analysis of the HRV features. The results indicate that there are significant differences in most HRV features across the three levels of mental workload, which suggests that HRV features can effectively serve as indicators for classifying pilots’ mental workload. Specifically, among these HRV features, HF, SDNN, PNN50 and RMSSD are commonly used to reflect the activity of the ANS [Reference Ben Mrad39, Reference Shaffer and Ginsberg40]. The participants’ responses to mental workload variations are depicted in Fig. 7, which reveals a decrease followed by an increase with increasing mental workload.
Results of the significance analysis for the HRV features

*Statistically significant effects (p < 0.05).
**Statistically significant effects (p < 0.01).
Results of the significance analysis for the EEG features

*Statistically significant effects (p < 0.05).
**Statistically significant effects (p < 0.01).
Illustration of HRV feature responses to mental workload changes. (a) HF responses to mental workload changes. (b) SDNN responses to mental workload changes. (c) RMSSD responses to mental workload changes. (d) PNN50 responses to mental workload changes.

EEG features in different frequency bands are widely used to analyse and understand cognitive processes and brain activities [Reference Bagheri and Power41]. The results of the significance analysis of the EEG features are shown in Table 4. Only a limited number of features exhibited a statistically significant difference across the three mental workload levels. To further explore brain activity across various brain regions, Fig. 8 shows topographic maps of PSD values at three distinct levels of mental workload. PSD can provide valuable insights into the frequency distribution of neural signals and brain activities [Reference Stuart42]. We can observe from the picture that the PSD values are generally highest in the left temporal lobe across the δ, θ, α and β1 frequency bands. PSD in the left temporal lobe across the δ, θ, α, β1 and β2 frequency bands decreased significantly in the high mental workload level group compared with the other mental workload levels group. Furthermore, in comparison to those of individuals with low and medium mental workload levels, the PSD values of β2 and γ1 in the left frontal and right temporal regions increased significantly under high mental workload. This finding suggested a notable modulation of neural activity associated with increased cognitive demands.
Classification results of all methods

*Indicates the highest value of the evaluation metrics.
Topographic maps of PSD features.

3.3 Mental workload classification results
These principal components with a greater than 90% cumulative variance ratio for each feature set were selected as inputs for the classifiers, with 11 principal components chosen for EEG features, 5 principal components chosen for HRV features, and 15 principal components chosen for fused features. These selected features served as input feature sets for single-modal classifiers and as inputs for feature-level fusion. Additionally, the chosen EEG and HRV features were simultaneously used as inputs for decision-level fusion. This approach ensured that each classifier received the most informative features, enhancing the accuracy and robustness of the mental workload classification.
The classification of pilots’ mental workload using single-modal feature-based methods and two multimodal feature fusion-based methods was compared with that of various classification algorithms. The classification results of the different methods are shown in Table 5, and the ROC-AUC results are shown in Fig. 9. When assessing the performance of single-modal physiological features, the EEG feature set outperformed the PPG feature set across all the evaluation metrics, and the CNN model achieved better overall performance than did the other machine learning models. For the multimodal fusion methods, feature-level fusion outperforms decision-level fusion, achieving higher results in most evaluation metrics. For single-modal and multimodal fusion methods, multimodal fusion techniques generally achieve superior outcomes when applied with KNN, SVM or CNN. Notably, when employing RF classifiers, the performance of feature-level fusion significantly improves as all the metrics increase by approximately 9–12 percentage points, achieving the highest classification accuracy, precision, recall and F1 score in this study. However, the performance of decision-level fusion decreased compared with that of the single-modal EEG feature set when using RF.
ROC-AUC results of different methods. (a) EEG features as inputs. (b) HRV features as inputs. (c) Decision level fusion methods. (d) Feature level fusion methods.

4.0 Discussion
This study investigated carrier-based pilots’ mental workload recognition using EEG and PPG signals during simulation flight. The performance of single modal-based and multimodal-based methods were compared for mental workload classification using machine learning and deep learning algorithms. Since few studies have focused on carrier-based pilots’ mental workload recognition, it is crucial to perform this study to provide new perspectives on carrier-based pilots’ mental workload regulation, ultimately enhancing the safety of carrier-based aircraft operations.
The mental workload of carrier-based pilots during different flight phases was assessed using the NASA-TLX scale. Subjective scores indicated that pilots experienced the highest mental workload during the Approach and Landing phase, followed by the cruise phase, and the workload during the take off and climb phase was lowest. These findings are inconsistent with the results of an earlier study [Reference Liu43], which reported disparate measurements of mental workload in the cruise phase and take off and climb phase. The variation in the results can be attributed to differences in the experimental design. In the cruise phase, we introduced tasks that simulate combat scenarios, resulting in an increase in participants’ mental workload compared with that in the take-off and climb phase. These findings can help to better understand the cognitive needs of carrier-based pilots in combat flight, facilitating the management of pilots’ mental workload.
Physiological feature patterns are related to changes in mental workload. The statistical analysis results, as described in Table 4, revealed that most of the HRV features show significant difference to mental workload variation. These findings are consistent with those of a previous study [Reference Singh, Chanel and Roy44] indicating that HRV features are effective indicators of pilots’ mental workload detection. Moreover, HRV analysis can provide information about sympathetic and parasympathetic activity during flight, serving as a reflection of ANS function [Reference Ha and Kim45]. The patterns of HF, SDNN, PNN50 and RMSSD observed in Fig. 9 indicate a decrease followed by an increase with increasing mental workload, demonstrating the modulation of sympathetic and parasympathetic factors. HF, RMSSD, and PNN50 are associated with parasympathetic nervous system activity [Reference Yugar46], and their initial decline may indicate a shift towards sympathetic dominance, reflecting the body’s adaptive response to increased cognitive demands. The subsequent increase could suggest that the modulation of sympathetic and parasympathetic systems has to maintain the balance of the ANS. The SDNN reflects overall HRV and is influenced by both sympathetic and parasympathetic activities [Reference Shaffer and Ginsberg40]. A decrease in SDNN with increasing mental workload might suggest a reduction in overall HRV, which could be indicative of increased sympathetic dominance, while a subsequent increase could indicate adaptive regulation as the body attempts to optimise its physiological state in the face of continued mental workload. These results show the complex interplay between sympathetic and parasympathetic activities as the body responds to and adapts to varying levels of cognitive demands.
The topographic maps of PSD presented in Fig. 10 offer a comprehensive view of neural activity across different brain regions and frequency bands at varying levels of mental workload. The analysis of PSD is crucial for understanding the frequency distribution of neural signals and brain activities during cognitive tasks. One prominent observation is the consistently highest PSD values in the left temporal lobe across the δ, θ, α and β1 frequency bands at the three mental workload levels, which suggests robust engagement of the left temporal region in processing information related to mental workload. These findings are consistent with those of studies indicating the involvement of the left temporal lobe in semantic processing and memory retrieval [Reference Cope47], confirming its pivotal role in cognitive tasks. Interestingly, compared with those of individuals with low and medium mental workload, the PSD of the δ, θ, α and β1 bands in the left temporal lobe were all significantly decreased with high mental workload. δ, θ and β1 power have been linked to mental fatigue, mental workload, and cognitive resources, with an increase with the increase in cognitive resources [Reference Chikhi, Matton and Blanchet16, Reference Harmony48], contrasting with our results. The reason could be that the left temporal region exhibits a unique pattern of modulation in response to the specific cognitive demands of our experimental tasks to cope with higher mental workload levels. The variation in α PSD with mental workload is aligned with the findings of Borghini et al. [Reference Borghini49]. Moreover, the increase in PSD values in the β2 and γ1 frequency bands in the left frontal and right temporal regions during high mental workload points to a significant modulation of neural activity associated with increased cognitive demands. This response highlights the brain’s adaptive capacity to meet the challenges posed by cognitively demanding tasks.
A possible application of the proposed model.

The comparison of the performance in classifying pilots’ mental workload utilising both single-modal feature-based methods and two multimodal feature fusion-based methods. First, when evaluating the performance of single-modal physiological features, the EEG feature set consistently outperformed the HRV feature set across all the examined metrics. These results demonstrated the ability of EEG features to effectively capture relevant information about pilots’ mental workload, which aligns with the findings of previous studies [Reference Mohanavelu12] that have compared the effectiveness of HRV features and EEG features. A comparison between feature-level fusion and decision-level fusion showed that the former method consistently outperformed the latter method. Feature-level fusion combines information from different modalities at an early stage of processing, allowing for the integration of complementary features before the classification step. This enables the model to capture more comprehensive and synergistic information, leading to improved performance. These findings are similar to those of a study by C. Akalya devi et al. that compared the performance of feature-level fusion and decision-level fusion approaches [Reference Akalya devi and Karthika Renuka50]. They found that the feature-level fusion approach, which combines text, speech and video features, outperformed the decision-level fusion approach in terms of accuracy, sensitivity and specificity. Moreover, multimodal fusion methods generally outperform single-modal methods, especially when using KNN, SVM and CNN. These improvements in multimodal fusion are consistent with those of previous studies, suggesting that fusion may provide complementary information for better mental workload classification. Notably, when using RF classifiers, feature-level fusion demonstrated a significant improvement, achieving the highest classification accuracy, precision, recall, and F1 score. This can be attributed to the fact that RF classifiers excel at capturing nonlinear relationships in data [Reference Steyrl51]. Feature-level fusion ensures that the model has access to a broader range of features, enabling it to effectively capture complex nonlinear patterns associated with mental workload.
A possible application of the proposed model is illustrated in Fig. 10. The model provides valuable insights into pilot mental workload, enabling both procedural optimisation and adaptive intervention strategies. By analysing physiological data collected across different flight phases, instructors can identify tasks or procedures that impose excessive cognitive demands. This information can then be leveraged to optimise training programs by systematically adjusting task difficulty, modifying training workflows. For instance, instructors can gradually expose pilots to high-workload conditions in task simulations, such as complex multi-task flight scenarios and emergency response drills, which will allow pilots to maintain performance and decision-making accuracy even in high-demand flight operations. While simulation-based research offers valuable insights for pilots training, it does not fully capture the dynamic and unpredictable nature of real-world flight operations. In actual missions, pilots must respond to rapidly changing environmental conditions, air traffic communication demands, and operational uncertainties. To translate these findings into real-world applications, this model can be integrated into intelligent cockpit systems to enhance human–machine dynamic interaction. By continuously monitoring pilots’ mental workload levels, the system can autonomously trigger optimal task assignment strategies, providing non-intrusive interventions such as adjusting display complexity, prioritising critical information, or temporarily reducing non-essential communication. These adjustments ensure that pilots remain within an optimal workload range, allowing them to focus on high-priority decision-making and effectively manage workload without inducing additional stress.
5.0 Conclusions
This study investigates the effectiveness of multiple sensor physiological signal fusion methods for carrier-based aircraft pilot mental workload classification using machine learning and deep learning algorithms. The fusion process involves combining EEG and PPG signals at both the decision level and feature level. The findings show that multimodal fusion methods achieve better results than single-modal methods. Notably, feature-level fusion outperforms decision-level fusion, obtaining the highest accuracy of 97.69% using RF algorithms. These findings indicate the significance of fusing physiological signals for enhanced mental workload classification. Furthermore, this paper indicates the complex correlation of HRV features and EEG PSD features with mental workload level variation, revealing the modulation of ANS activities and brain activities under varying cognitive resource demands. Additionally, this research employs portable physiological signal collection devices to collect EEG and PPG signals. These portable devices serve to minimise interference and discomfort for the subjects, enhancing the overall quality of the data collected. We can conclude from the classification results and physiological signal patterns that portable devices are feasible and effective for studying pilots’ mental workload. In conclusion, this study provides insights and reference into the classification of mental workload among carrier-based aircraft pilots and aids in the management of mental workload across different flight tasks, ultimately contributing to the enhancement of flight skill training and the improvement of flight safety.
While this study provides multimodal fusion methods for carrier-based aircraft pilots’ mental workload classification, several limitations should be acknowledged. First, despite participants performing multiple repetitions of the experiments, the sample size was limited. Secondly, using civil aviation flight cadets as a substitute sample for carrier-based aircraft pilots, while practical, introduces limitations due to potential differences in characteristics between the two groups. Additionally, while 2D simulations can capture the basic characteristics of flight tasks, their spatial perception and realism are lower than those of 3D simulations, making it challenging to fully replicate the complex and dynamic environment of actual flights. Future work will use a more advanced flight simulator and adopt a hybrid approach that integrates simulation-based experiments with in-flight data collection. This methodology will enhance the validation of workload assessment models, providing a more comprehensive understanding of pilot cognitive demands in operational settings. Additionally, expanding the sample size will enable a more robust evaluation of the mental workload, improving the generalisability of the findings.
Acknowledgements
The authors extend sincere appreciation for the collaboration and contributions that were instrumental in the realisation of this paper. Thank the participation of pilot cadets from the Flight Technology College of Civil Aviation Flight University of China.
Financial support
This research was funded by the Open Fund of Key Laboratory of Flight Techniques and Flight Safety, CAAC (No. FZ2021KF09); the Special Fund of Key Laboratory of Flight Techniques and Flight Safety, CAAC (No. FZ2022ZX04 and FZ2022ZX54); the Research on Basic Scientific Research Business Expense Project of Sichuan Fire Research Institute of MEM (No. 20238804Z).
Competing interests
The authors declare none.




