AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context

Published 21 Oct 2024 in cs.CL and cs.AI | (2410.16520v3)

Abstract: As our understanding of autism and ableism continues to increase, so does our understanding of ableist language towards autistic people. Such language poses a significant challenge in NLP research due to its subtle and context-dependent nature. Yet, detecting anti-autistic ableist language remains underexplored, with existing NLP tools often failing to capture its nuanced expressions. We present AUTALIC, the first benchmark dataset dedicated to the detection of anti-autistic ableist language in context, addressing a significant gap in the field. The dataset comprises 2,400 autism-related sentences collected from Reddit, accompanied by surrounding context, and is annotated by trained experts with backgrounds in neurodiversity. Our comprehensive evaluation reveals that current LLMs, including state-of-the-art LLMs, struggle to reliably identify anti-autistic ableism and align with human judgments, underscoring their limitations in this domain. We publicly release AUTALIC along with the individual annotations which serve as a valuable resource to researchers working on ableism, neurodiversity, and also studying disagreements in annotation tasks. This dataset serves as a crucial step towards developing more inclusive and context-aware NLP systems that better reflect diverse perspectives.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces AUTALIC, a novel dataset of 2,400 Reddit sentences to detect anti-autistic ableist language using expert annotations.
It details a rigorous methodology leveraging keyword extraction and trained annotators to capture nuanced anti-autistic ableism.
Experimental results show that existing NLP models, including BERT and LLMs, yield high false-positive rates when classifying this language.

Overview of "AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context"

The paper "AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context" presents a novel dataset aimed at addressing the underexplored challenge of detecting anti-autistic ableist language within NLP research. The authors introduce AUTALIC, the first benchmark dataset meticulously compiled to identify ableist language specifically targeting autistic individuals. This work fills a significant void in the existing body of NLP resources and research, with a focus on contextual and subtle nuance often missed by current models.

Key Contributions

The primary contribution of the paper is the creation and public release of the AUTALIC dataset, comprising 2,400 autism-related sentences sourced from Reddit. The dataset is annotated by experts well-versed in neurodiversity, ensuring contextual accuracy and relevance. This dataset serves as a foundational step towards fostering more inclusive NLP systems that can better align with diverse human perspectives, particularly in understanding and mitigating ableist language against autistic people.

Methodology and Findings

The data collection process was comprehensive, involving the extraction of sentences using a set of carefully chosen keywords related to autism. Reddit was selected as the primary source given its textual focus and broad user base. The annotators underwent detailed training, which covered the complexities of anti-autistic ableism and included a glossary of terms common in autism-related discourse.

The paper highlights several experimental setups, including assessments with logistic regression models and various LLMs like BERT and state-of-the-art LLMs. It was found that existing models struggle significantly with this classification task. Even fine-tuning approaches exhibited poor performance with high false-positive rates, indicating prevalent limitations in current systems’ ability to detect such nuanced language reliably.

Implications

The paper's findings underscore critical implications for both the practical and theoretical realms of NLP. From a practical perspective, the dataset provides an invaluable resource for the development of more adept hate speech detection systems. Theoretically, it challenges current NLP models to adapt to domain-specific nuances and encourages future research dedicated to understanding biases and ethical concerns within AI applications.

Future Directions

The research opens pathways for further inquiry into the integration of diverse perspectives in NLP model training. With AUTALIC laying the groundwork, subsequent research may explore enhanced model architectures or training paradigms that prioritize alignment with nuanced human judgment. There is also potential for expanding the dataset across different cultural or linguistic contexts, understanding that expressions of ableism might vary globally.

In conclusion, this paper's introduction of AUTALIC marks a critical advancement in the specialized task of identifying anti-autistic ableism within NLP systems. While initial results show the limitations of existing models, the dataset and findings provide a robust foundation for advancing research in inclusive and equitable language processing.

Markdown Report Issue