Hostname: page-component-6766d58669-6mz5d Total loading time: 0 Render date: 2026-05-16T02:56:16.678Z Has data issue: false hasContentIssue false

StereoHate: Toward identifying stereotypical bias and target group in hate speech detection

Published online by Cambridge University Press:  24 May 2024

Krishanu Maity*
Affiliation:
Department of CSE, Indian Institute of Technology Patna, Patna, India
Nilabja Ghosh
Affiliation:
Department of Computer Science, Ramakrishna Mission Vivekananda Educational and Research Institute, Howrah, India
Raghav Jain
Affiliation:
Department of CSE, Indian Institute of Technology Patna, Patna, India
Sriparna Saha
Affiliation:
Department of CSE, Indian Institute of Technology Patna, Patna, India
Pushpak Bhattacharyya
Affiliation:
Department of CSE, Indian Institute of Technology Bombay, Mumbai, India, India
*
Corresponding author: Krishanu Maity; Email: krishanumaity@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Though social media helps spread knowledge more effectively, it also stimulates the propagation of online abuse and harassment, including hate speech. It is crucial to prevent hate speech since it may have serious adverse effects on both society and individuals. Therefore, it is not only important for models to detect these speeches but to also output explanations of why a given text is toxic. While plenty of research is going on to detect online hate speech in English, there is very little research on low-resource languages like Hindi and the explainability aspect of hate speech. Recent laws like the “right to explanations” of the General Data Protection Regulation have spurred research in developing interpretable models rather than only focusing on performance. Motivated by this, we create the first interpretable benchmark hate speech corpus hate speech explanation (HHES) in the Hindi language, where each hate post has its stereotypical bias and target group category. Providing descriptions of internal stereotypical bias as an explanation of hate posts makes a hate speech detection model more trustworthy. Current work proposes a commonsense-aware unified generative framework, CGenEx, by reframing the multitask problem as a text-to-text generation task. The novelty of this framework is it can solve two different categories of tasks (generation and classification) simultaneously. We establish the efficacy of our proposed model (CGenEx-fuse) on various evaluation metrics over other baselines when applied to the Hindi HHES dataset.

Disclaimer

The article contains profanity, an inevitable situation for the nature of the work involved. These in no way reflect the opinion of authors.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table 1. Train, validation, and test split distribution of target group category in HHES dataset

Figure 1

Table 2. Two examples from HHES dataset

Figure 2

Figure 1. A commonsense-aware unified generative framework (CGenEx) architecture.

Figure 3

Figure 2. Commonsense-aware encoder module internal architecture.

Figure 4

Table 3. Results of different baselines and the two proposed frameworks, CGenEx-con and CGenEx-fuse, in a multitask setting. For the target tasks, the results are in terms of macro-F1 score (F1), accuracy (Acc), and Matthews correlation coefficient (MCC) values. F1, Acc, and MCC metrics are given in %. The maximum scores attained are represented by bold-faced values; gray highlight represents statistically significant results

Figure 5

Table 4. Classwise precision, recall, and F1 score of the target identification task generated by single-task and multitask variants of our proposed model (CGenEx-fuse)

Figure 6

Table 5. Ablation study to show the effect of reinforcement learning-based training

Figure 7

Figure 3. Confusion matrices: single-task vs. multitask variants of mBART-CGenEx-fuse model for target identification task.

Figure 8

Table 6. Comparison of performance on English SBIC dataset: proposed models vs. baseline models in single-task and multitask settings

Figure 9

Table 7. Comparative study of stereotype and target of a post by proposed models vs. actual annotations; ST: single task, MT:multitask; mBART embedded models have been selected for error analysis

Figure 10

Table 8. Translation of Hindi posts to English and commonsense inference generated by ConceptNet from English to Hindi