Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-03-29T17:48:59.881Z Has data issue: false hasContentIssue false

Multiclass hate speech detection with an aggregated dataset

Published online by Cambridge University Press:  16 January 2025

Sinéad Walsh
Affiliation:
Department of Computing, Atlantic Technological University, Letterkenny, Co. Donegal, Ireland
Paul Greaney*
Affiliation:
Department of Computing, Atlantic Technological University, Letterkenny, Co. Donegal, Ireland Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, Letterkenny, Co. Donegal, Ireland
*
Corresponding author: Paul Greaney; Email: paul.greaney@atu.ie
Rights & Permissions [Opens in a new window]

Abstract

Detecting and removing hate speech content in a timely manner remains a challenge for social media platforms. Automated techniques such as deep learning models offer solutions which can keep up with the volume and velocity of user content production. Research in this area has mainly focused on either binary classification or on classifying tweets into generalised categories such as hateful, offensive, or neither. Less attention has been given to multiclass classification of online hate speech into the type of hate or group at which it is directed. By aggregating and re-annotating several relevant hate speech datasets, this study presents a dataset and evaluates several models for classifying tweets into the categories ethnicity, gender, religion, sexuality, and non-hate. We evaluate the dataset by training several models: logistic regression, LSTM, BERT, and GPT-2. For the LSTM model, we assess a range of NLP features using a multi-classification LSTM model, and conclude that the highest performing feature combination consists of word $n$-grams, character $n$-grams, and dependency tuples. We show that while more recent larger models can achieve a slightly higher performance, increased model complexity alone is not sufficient to achieve significantly improved models. We also compare this approach with a binary classification approach and evaluate the effect of dataset size on model performance.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. Breakdown of dataset by source and class

Figure 1

Table 2. Feature selection results

Figure 2

Figure 1. Feature selection—feature set 3 training & validation graphs.

Figure 3

Table 3. Hyperparameter tuning value ranges with optimal values determined by grid search

Figure 4

Figure 2. LSTM model—test set normalised & unnormalised confusion matrices.

Figure 5

Table 4. LSTM model—test set evaluation metrics

Figure 6

Table 5. Comparison of model test scores on the aggregated dataset

Figure 7

Figure 3. Confusion matrices for binary classification models.

Figure 8

Table 6. Comparison of model test scores on smaller subsets of the dataset

Figure 9

Figure 4. Confusion matrices for subset models.