Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-06T14:25:32.689Z Has data issue: false hasContentIssue false

Focal inferential infusion coupled with tractable density discrimination for implicit hate detection

Published online by Cambridge University Press:  13 December 2024

Sarah Masud
Affiliation:
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Ashutosh Bajpai
Affiliation:
Indian Institute of Technology Delhi, New Delhi, India Wipro Research, Bengaluru, India
Tanmoy Chakraborty*
Affiliation:
Indian Institute of Technology Delhi, New Delhi, India
*
Corresponding author: Tanmoy Chakraborty; Email: tanchak@iitd.ac.in
Rights & Permissions [Opens in a new window]

Abstract

Although pretrained large language models (PLMs) have achieved state of the art on many natural language processing tasks, they lack an understanding of subtle expressions of implicit hate speech. Various attempts have been made to enhance the detection of implicit hate by augmenting external context or enforcing label separation via distance-based metrics. Combining these two approaches, we introduce FiADD, a novel focused inferential adaptive density discrimination framework. FiADD enhances the PLM finetuning pipeline by bringing the surface form/meaning of an implicit hate speech closer to its implied form while increasing the intercluster distance among various labels. We test FiADD on three implicit hate datasets and observe significant improvement in the two-way and three-way hate classification tasks. We further experiment on the generalizability of FiADD on three other tasks, detecting sarcasm, irony, and stance, in which surface and implied forms differ, and observe similar performance improvements. Consequently, we analyze the generated latent space to understand its evolution under FiADD, which corroborates the advantage of employing FiADD for implicit hate speech detection.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. The three objectives of FiADD as applied to implicit hate detection are (a) adaptive density discrimination, (b) higher penalty on boundary samples, and (c) bringing the surface and semantic form of the implicit hate closer.

Figure 1

Table 1. The L1 intercluster distances between neutral (N) and explicit hate (E)), as well as non-hate and implicit hate (I) samples based on ALD and ACLD

Figure 2

Figure 2. The architecture of FiADD. Input X is a set of texts, implied annotations (only for implicit class), and class labels. PLM: pretrained language model (frozen). ${R'}_{nhate}$, ${R'}_{exp}$, and ${R'}_{imp}$ are the representatives for seed and imposter clusters of non-hate, explicit, and implicit, respectively. ${R'}_{inf}$ represents inferential meaning for corresponding ${R'}_{imp}$. ACE is alpha cross-entropy, and $ADD^{Inf+foc}$ is the adaptive density discriminator with inferential + focal objective.

Figure 3

Table 2. Datasets employed in evaluating the FiADD framework. The statistics enlist the class-wise distributions for (a) Hate Speech and (b) SemEval datasets

Figure 4

Table 3. Some sample posts from AbuseEval and ImpGab along with their implied annotations. We also provide the cross-annotator scores and the cross-annotator remarks

Figure 5

Table 4. Baseline selection based on comparison of $ADD^{foc}$ over: (a) vanilla ADD for two-way hate speech classification via LSTM. (b) ACE for three-way hate speech classification via BERT

Figure 6

Table 5. Results for two-way hate classification on BERT and HateBERT. We also highlight the highest Hate class macro-F1 that the respective model can achieve

Figure 7

Table 6. Results for three-way hate classification on BERT and HateBERT

Figure 8

Table 7. Comparative performance for sarcasm, irony, and stance detection

Figure 9

Figure 3. The variation in performance with changing values of (a) number of clusters (k) and (b) focal parameter ($\gamma$ ). We employ BERT on AbuseEval with $ADD^{foc}$ in the two-way classification.

Figure 10

Table 8. Results for two-way classification task across all three hate-speech datasets using two pretrained language models, BERT and HateBERT. Highlighted with green color are the outcomes where one of the variants of FiADD outperforms baseline ACE

Figure 11

Table 9. Results for three-way classification tasks across all three hate-speech datasets using two pretrained language models, BERT and HateBERT. Highlighted with green color are the outcomes where one of the variants of FiADD outperforms baseline ACE

Figure 12

Figure 4. Error analysis with (a) correctly and (b) incorrectly classified samples in three-way classification on LatentHatred. Here, scores A and B are the relative positions of implicit sample w.r.t non-hate and explicit space finetuned with ACE and $ADD^{inf + foc}$, respectively.

Figure 13

Figure 5. 2D t-SNE plots of the last hidden representations after applying K-means (K = 3) on the implicit class for AbuseEval (a, b, c), ImpGab (d, e, f), and LatentHatred (g, h, i). $\{0, 1, 2\}$ are the subcluster ids. The higher the Silhouette score, the better discriminated the clusters.

Figure 14

Figure 6. 2D t-SNE plots of the last hidden representations obtained for the implicit class and its respective inferential (implied) set for AbuseEval (a, b, c), ImpGab (d, e, f), and LatentHatred (g, h, i). The lower the Silhouette score, the closer the surface and implied forms of hate.