HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter

Published 23 Nov 2024 in cs.CL | (2411.15462v3)

Abstract: To address the global challenge of online hate speech, prior research has developed detection models to flag such content on social media. However, due to systematic biases in evaluation datasets, the real-world effectiveness of these models remains unclear, particularly across geographies. We introduce HateDay, the first global hate speech dataset representative of social media settings, constructed from a random sample of all tweets posted on September 21, 2022 and covering eight languages and four English-speaking countries. Using HateDay, we uncover substantial variation in the prevalence and composition of hate speech across languages and regions. We show that evaluations on academic datasets greatly overestimate real-world detection performance, which we find is very low, especially for non-European languages. Our analysis identifies key drivers of this gap, including models' difficulty to distinguish hate from offensive speech and a mismatch between the target groups emphasized in academic datasets and those most frequently targeted in real-world settings. We argue that poor model performance makes public models ill-suited for automatic hate speech moderation and find that high moderation rates are only achievable with substantial human oversight. Our results underscore the need to evaluate detection systems on data that reflects the complexity and diversity of real-world social media.

Abstract PDF HTML Upgrade to Chat

Authors (7)

Summary

The paper presents a globally representative dataset of 240,000 tweets capturing hate speech across eight languages, enhancing real-world detection analysis.
The paper demonstrates that models perform significantly worse on the realistic HateDay dataset compared to traditional academic benchmarks, particularly for non-European languages.
The paper highlights the urgent need for context-aware moderation strategies, as current automated systems struggle to reliably distinguish hate speech from offensive language.

Insights from a Global Hate Speech Dataset: An Analysis of HateDay

The study "HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter" presents a comprehensive analysis of hate speech detection capabilities across various languages and countries using a dataset reflective of real-world social media scenarios. This dataset, comprising 240,000 annotated tweets sampled from Twitter on September 21, 2022, includes representations from eight languages and four countries, designed to address significant gaps in current hate speech detection methodologies.

Key Objectives and Methodology

The primary objective of this research is to evaluate the performance of hate speech detection models in real-world settings, addressing the biases present in existing academic datasets. The HateDay dataset is curated to present a realistic representation of global social media conversations by encompassing a wide range of languages (Arabic, English, French, German, Indonesian, Portuguese, Spanish, and Turkish) and differentiating between countries where English is predominant (United States, India, Nigeria, and Kenya).

Annotation efforts were meticulously executed by a diverse team, with the task being structured to abide by predefined guidelines. This rigorous annotation ensures the dataset’s robustness, particularly in identifying hate speech, offensive content, and neutral statements across various cultural and linguistic contexts.

Findings on Detection Performance

The paper reports stark disparities in detection performance between HateDay and traditional academic datasets, which tend to overestimate models' capabilities. Notably, the average precision of detection models drops significantly when evaluated on HateDay compared to academic datasets and functional tests. The models, including supervised and zero-shot learning approaches, exhibit performance deficits particularly for non-European languages. This establishes that existing detection models cannot reliably moderate hate speech effectively without extensive human intervention, which is shown to be financially and logistically impractical.

A significant observation is the challenge models face in distinguishing hate speech from offensive language. This is compounded by the fact that offensive content is more prevalent and often mistaken for hate speech due to lexical similarities. Furthermore, there is a notable mismatch between the focus of academic research on certain hate targets and their real-world prevalence, suggesting a need for better alignment to improve detection performance.

Implications for Hate Speech Moderation

The implications of these findings are profound, particularly in the context of social media moderation. The study highlights the impracticality of human-in-the-loop moderation on a large scale due to substantial costs associated with reviewing a large volume of flagged content, even if automation assists in filtering this initial content. The results suggest a pressing need for more contextually aware models and additional resources focused on underrepresented hate speech types, such as political hate, which appears prevalent in real-world scenarios but less frequently targeted in academic datasets.

Future Directions and Recommendations

The paper underscores the importance of developing detection models that are not only adept in academic settings but also in practical application across diverse linguistic and cultural landscapes. It advocates for evaluations using datasets representative of the settings in which these models will be applied. Moreover, there's a call for transparency from platforms on real-world model performance to enable more effective moderation strategies.

In conclusion, HateDay provides a critical resource for further research, offering a benchmark for hate speech detection that aligns more closely with the complexities of social media environments worldwide. The research sets a vital precedent for future work aimed at refining hate speech detection and developing robust, fair, and effective moderation strategies.

Markdown Report Issue