Hostname: page-component-77f85d65b8-jkvpf Total loading time: 0 Render date: 2026-03-27T21:31:09.329Z Has data issue: false hasContentIssue false

Automated hate speech detection and span extraction in underground hacking and extremist forums

Published online by Cambridge University Press:  20 June 2022

Linda Zhou*
Affiliation:
Department of Computer Science and Technology, University of Cambridge, Cambridge CB2 1TN, UK
Andrew Caines
Affiliation:
Department of Computer Science and Technology, University of Cambridge, Cambridge CB2 1TN, UK
Ildiko Pete
Affiliation:
Department of Computer Science and Technology, University of Cambridge, Cambridge CB2 1TN, UK
Alice Hutchings
Affiliation:
Department of Computer Science and Technology, University of Cambridge, Cambridge CB2 1TN, UK
*
*Corresponding author. Email: lz423@cantab.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Hate speech is any kind of communication that attacks a person or a group based on their characteristics, such as gender, religion and race. Due to the availability of online platforms where people can express their (hateful) opinions, the amount of hate speech is steadily increasing that often leads to offline hate crimes. This paper focuses on understanding and detecting hate speech in underground hacking and extremist forums where cybercriminals and extremists, respectively, communicate with each other, and some of them are associated with criminal activity. Moreover, due to the lengthy posts, it would be beneficial to identify the specific span of text containing hateful content in order to assist site moderators with the removal of hate speech. This paper describes a hate speech dataset composed of posts extracted from HackForums, an online hacking forum, and Stormfront and Incels.co, two extremist forums. We combined our dataset with a Twitter hate speech dataset to train a multi-platform classifier. Our evaluation shows that a classifier trained on multiple sources of data does not always improve the performance compared to a mono-platform classifier. Finally, this is the first work on extracting hate speech spans from longer texts. The paper fine-tunes BERT (Bidirectional Encoder Representations from Transformers) and adopts two approaches – span prediction and sequence labelling. Both approaches successfully extract hateful spans and achieve an F1-score of at least 69%.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. Examples of texts where propaganda (Da San Martino et al.2020) and toxic spansa are highlighted in bold

Figure 1

Table 2. List of keywords for the search of potential hate speech

Figure 2

Figure 1. Framework for interpreting Fleiss’ kappa.

Figure 3

Algorithm 1. Active learning (Uncertainty sampling strategy) algorithm.

Figure 4

Table 3. Distribution of posts over categories in the multi-platform training data

Figure 5

Table 4. Distribution of posts over categories in the test sets from HatEval, HackForums and the two extremist forums

Figure 6

Table 5. Distribution of the data instances for span extraction across different platforms

Figure 7

Table 6. Hate speech frequency across different platforms

Figure 8

Table 7. Examples of vocabulary that members on Incels.co use

Figure 9

Figure 2. Architecture of BERT+span.

Figure 10

Figure 3. Architecture of BERT+token.

Figure 11

Figure 4. Example of how single ground truth spans are pre-processed for BERT+span and BERT+token. The original text has been encoded into sub-tokens with Bert WordPiece tokenizer.

Figure 12

Figure 5. Example of how multiple ground truth spans are pre-processed for BERT+span and BERT+token. The original text has been encoded into sub-tokens with Bert WordPiece tokenizer.

Figure 13

Table 8. Performance of the classifiers, where [Mono] and [Multi] mean mono- and multi-platform classifier, in percentage in terms of accuracy, precision, recall and F1-score. Values in bold are the best scores

Figure 14

Figure 6. Confusion matrix for the SVC[Multi].

Figure 15

Table 9. Performance of the span extraction models in percentage in terms of exact match, precision, recall and F1-score. Mean, standard deviation and maximum values across five runs are reported. Values in bold are the best scores