Hostname: page-component-77c78cf97d-d2fvj Total loading time: 0 Render date: 2026-04-24T09:42:10.476Z Has data issue: false hasContentIssue false

Cluster-based ensemble learning model for improving sentiment classification of Arabic documents

Published online by Cambridge University Press:  01 June 2023

Rana Husni Al Mahmoud
Affiliation:
Faculty of Information Technology, Applied Science Private University, Amman, Jordan
Bassam H. Hammo*
Affiliation:
King Abdullah II School of Information Technology, The University of Jordan, Amman, Jordan King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Amman, Jordan
Hossam Faris
Affiliation:
King Abdullah II School of Information Technology, The University of Jordan, Amman, Jordan
*
Corresponding author: Bassam H. Hammo; Email: b.hammo@ju.edu.jo
Rights & Permissions [Opens in a new window]

Abstract

This article reports on designing and implementing a multiclass sentiment classification approach to handle the imbalanced class distribution of Arabic documents. The proposed approach, sentiment classification of Arabic documents (SCArD), combines the advantages of a clustering-based undersampling (CBUS) method and an ensemble learning model to aid machine learning (ML) classifiers in building accurate models against highly imbalanced datasets. The CBUS method applies two standard clustering algorithms: K-means and expectation–maximization, to balance the ratio between the major and the minor classes by decreasing the number of the major class instances and maintaining the number of the minor class instances at the cluster level. The merits of the proposed approach are that it does not remove the majority class instances from the dataset nor injects the dataset with artificial minority class instances. The resulting balanced datasets are used to train two ML classifiers, random forest and updateable Naïve Bayes, to develop prediction data models. The best prediction data models are selected based on F1-score rates. We applied two techniques to test SCArD and generate new predictions from the imbalanced test dataset. The first technique uses the best prediction data models. The second technique uses the majority voting ensemble learning model, which combines the best prediction data models to generate the final predictions. The experimental results showed that SCArD is promising and outperformed the other comparative classification models based on the F1-score rates.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Table 1. Comparison of sentiment analysis approaches for Arabic and other languages

Figure 1

Figure 1. The flow diagram of the research methodology.

Figure 2

Table 2. Sources and topics of news articles

Figure 3

Table 3. Datasets

Figure 4

Table 4. The HARD dataset

Figure 5

Figure 2. The workflow of the SCArD algorithm.

Figure 6

Table 5. Time Complexity of the SCArD algorithm

Figure 7

Table 6. The cost matrix of the Gulf Crisis problem

Figure 8

Figure 3. Performance of the RF classifier based on the number of extracted features.

Figure 9

Table 7. Feature sets characteristics

Figure 10

Table 8. The imbalanced training dataset after applying the SMOTE oversampling technique

Figure 11

Table 9. The evaluation metrics of the RF and the UNB classifiers applied to the Gulf crisis feature sets

Figure 12

Table 10. The evaluation metrics of the RF and the UNB classifiers using SMOTE.

Figure 13

Table 11. The evaluation metrics of the RF and the UNB classifiers using the cost-sensitive classifier.

Figure 14

Table 12. Number of instances per each cluster after applying the CBUS method to the Gulf crisis training dataset

Figure 15

Table 13. Number of instances per each cluster after applying the CBUS method to the Morocco-2016 dataset

Figure 16

Table 14. Number of instances per each cluster after applying the CBUS method to the LABR dataset

Figure 17

Table 15. Number of instances per each cluster after applying the CBUS method to the HARD dataset

Figure 18

Table 16. The evaluation metrics of the RF and the UNB classifiers using the CBUS method applied to the Gulf crisis dataset.

Figure 19

Figure 4. The evaluation metrics of the balancing algorithms, majority voting ensemble model, and the best prediction data models applied to the Gulf crisis dataset.

Figure 20

Figure 5. The evaluation metrics of the majority voting ensemble model and the best prediction data models applied to the Morocco-2016.

Figure 21

Table 17. The evaluation metrics of the RF and the UNB classifiers using the CBUS method applied to the Morocco-2016 dataset.

Figure 22

Table 18. The evaluation metrics of the RF and the UNB classifiers using the CBUS method applied to the LABR dataset.

Figure 23

Figure 6. The evaluation metrics of the majority voting ensemble model and the best prediction data models applied to the LABR.

Figure 24

Figure 7. The evaluation metrics of the majority voting ensemble model and the best prediction data models applied to the HARD.

Figure 25

Table 19. The evaluation metrics of the RF and the UNB classifiers using the CBUS method applied to the HARD dataset.

Figure 26

Table 20. The evaluation metrics of the RF classifier, balancing algorithms, majority voting ensemble, and the best prediction data models applied to the Gulf crisis test dataset.

Figure 27

Table 21. The evaluation metrics of the RF classifier, majority voting ensemble, and the best prediction data models applied to the Morocco-2016 test dataset.

Figure 28

Table 22. The evaluation metrics of the RF classifier, majority voting ensemble, and the best prediction data models applied to the LABR test dataset.

Figure 29

Table 23. The evaluation metrics of the RF classifier, majority voting ensemble, and the best prediction data models applied to the HARD test dataset.

Figure 30

Table 24. Summarization of the best classification models applied to all datasets based on the F1-score rates for negative (−ve), neutral (N), and positive (+ve) sentiment classes

Figure 31

Table 25. Summarization of the best models of the LABR and HARD datasets trained on Arabic-BERT based on the F1-score rates for negative (−ve), neutral (N), and positive (+ve) sentiment classes