Understanding Abuse: A Typology of Abusive Language Detection Subtasks

Published 28 May 2017 in cs.CL | (1705.09899v2)

Abstract: As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and we discuss its implications for data annotation and feature construction. We emphasize the practical actions that can be taken by researchers to best approach their abusive language detection subtask of interest.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (451)

View on Semantic Scholar

Summary

The paper proposes a two-dimensional framework distinguishing directed versus generalized and explicit versus implicit abuse.
It enhances annotation practices by aligning subtasks with targeted features for more consistent and transparent labeling.
The study informs modeling strategies by tailoring feature selection based on abuse type, improving detection accuracy.

Understanding Abuse: A Typology of Abusive Language Detection Subtasks

The paper "Understanding Abuse: A Typology of Abusive Language Detection Subtasks" by Waseem et al. offers a structured typology aimed at addressing the various subtasks associated with detecting abusive language online, such as hate speech, cyberbullying, and trolling. Recognizing the overlaps and distinctive features of these tasks, the authors propose a typology that clarifies these relationships and suggests implications for data annotation and feature construction.

Key Contributions

The study introduces a two-dimensional typology for categorizing abusive language:

Directed vs. Generalized: This dimension distinguishes whether the abuse is targeted at a specific individual or directed at a broader group.
Explicit vs. Implicit: The degree of explicitness relates to whether the abusive content is overtly malign or veiled, often requiring context for interpretation.

The authors argue that most instances of abusive language can be placed within this framework, offering a clearer understanding of the nuances between subtasks.

Typology and Current Literature

The typology reveals the complexity of existing labeling and annotation practices, highlighting contradictions and overlaps in various studies. This framework aids in resolving such inconsistencies by providing a structured approach to understanding abuse types.

The paper further explores how these dimensions influence the effectiveness of annotation practices. Directed language tends to yield higher annotator agreement due to its specific nature, while generalized language often suffers from broader and more subjective interpretations.

Implications for Research and Practice

Annotation Practices: The authors advocate for annotation strategies that differentiate between the types of abuse defined in their typology. Directed abuse can benefit from higher inter-annotator agreement, whereas generalized and implicit abuse requires more nuanced annotation guidelines.

Modeling Strategies: Feature selection is critical in modeling abusive language. Directed abuse might leverage target-identification features, while generalized abuse might benefit from understanding terms associated with specific groups. Explicit language detection can use keyword-based methods, whereas implicit language requires more sophisticated semantic understanding, potentially leveraging embeddings or context-aware models.

The typology also implies that different features and methods should be employed based on the nature of the abusive language being analyzed. For instance, explicit abuse might be straightforward to detect using lexicons, but implicit elements might need context-driven or additional non-textual data.

Future Directions

The paper encourages interdisciplinary learning among researchers working on distinct but overlapping subtasks in abusive language detection. This could facilitate advances in one area benefiting others, such as hate speech detection learning from cyberbullying research. Moreover, it underscores the necessity for better definition and discussion of annotation guidelines, promoting transparency and consistency in future research.

The implications for real-world applications are significant. Social media platforms may prioritize explicit abuse detection due to its clear-cut nature, whereas activists may need to focus on identifying more nuanced, implicit abuse. This bifurcation suggests that priorities for detection systems can significantly differ based on stakeholder objectives.

Conclusion

Waseem et al.'s typology offers a comprehensive framework for approaching the detection of abusive language across multiple subtasks. By emphasizing the importance of distinguishing between context, explicitness, and directness, the paper provides a foundation for improved systems that can more accurately model and identify various forms of online abuse. Future work is encouraged to transparently share annotation and modeling methodologies, fostering a deeper empirical understanding of abusive language nuances.

Markdown Report Issue