"I Searched for a Religious Song in Amharic and Got Sexual Content Instead": Investigating Online Harm in Low-Resourced Languages on YouTube

Published 26 May 2024 in cs.HC | (2405.16656v1)

Abstract: Online social media platforms such as YouTube have a wide, global reach. However, little is known about the experience of low-resourced language speakers on such platforms; especially in how they experience and navigate harmful content. To better understand this, we (1) conducted semi-structured interviews (n=15) and (2) analyzed search results (n=9313), recommendations (n=3336), channels (n=120) and comments (n=406) of policy-violating sexual content on YouTube focusing on the Amharic language. Our findings reveal that -- although Amharic-speaking YouTube users find the platform crucial for several aspects of their lives -- participants reported unplanned exposure to policy-violating sexual content when searching for benign, popular queries. Furthermore, malicious content creators seem to exploit under-performing language technologies and content moderation to further target vulnerable groups of speakers, including migrant domestic workers, diaspora, and local Ethiopians. Overall, our study sheds light on how failures in low-resourced language technology may lead to exposure to harmful content and suggests implications for stakeholders in minimizing harm. Content Warning: This paper includes discussions of NSFW topics and harmful content (hate, abuse, sexual harassment, self-harm, misinformation). The authors do not support the creation or distribution of harmful content.

Abstract PDF HTML Upgrade to Chat

Authors (2)

References (94)

Citations (4)

View on Semantic Scholar

Summary

The paper reveals that YouTube’s moderation systems often expose Amharic-speaking users to explicit sexual content despite benign queries.
It employs qualitative interviews with 15 users and quantitative analysis of search results to expose flaws in content moderation for low-resourced languages.
The study emphasizes the need for improved algorithms, culturally aware policies, and robust reporting mechanisms to protect vulnerable groups.

Investigating Online Harm in Low-Resourced Languages on YouTube: A Study on Amharic Content

Introduction

The paper "I Searched for a Religious Song in Amharic and Got Sexual Content Instead" by Hellina Hailu Nigatu and Inioluwa Deborah Raji explores the nuanced challenges faced by users who speak low-resourced languages, particularly Amharic, when navigating YouTube. This comprehensive study employs qualitative interviews and quantitative analysis of YouTube content to shed light on the systemic inadequacies of online content moderation and the resulting exposure to harmful material.

Methodology

The research is bifurcated into two main studies:

Study 1: Conducted semi-structured interviews with 15 Amharic-speaking women to understand their experiences with harmful content on YouTube.
Study 2: Analyzed search results, recommendations, channels, and comments to characterize the nature and extent of policy-violating sexual content on the platform.

Key Findings

User Experiences and the Inadequacy of Content Moderation

Amharic-speaking users reported substantial difficulties when searching for content in their language. The search results often contained unrelated or inappropriate content, including explicit sexual material even when searching for benign queries. This problem is exacerbated by the use of both Ge'ez script and Romanized Amharic, as search algorithms struggle to handle the linguistic nuances.

Participants cited several instances where they inadvertently encountered sexually explicit content while searching for cultural or religious videos. The platform's content moderation mechanisms appear to be significantly less effective for low-resourced languages, leading to the frequent surfacing of harmful content.

Coping Mechanisms and Reporting Challenges

Users reported that their attempts to report harmful content were often futile, receiving little to no feedback from the platform. Consequently, many resorted to alternative coping mechanisms such as creating multiple accounts for different types of content or avoiding the platform's recommendations altogether.

Quantitative Analysis of Harmful Content

Search and Recommendation Systems

The study's analysis of YouTube's search and recommendation systems revealed a high prevalence of policy-violating sexual content among the search results for common queries in Amharic. Additionally, once a single explicit video was opened, the subsequent recommendations also frequently contained similar inappropriate content, indicating failures in YouTube's moderation algorithms.

Content Creation Tactics

Malicious content creators exploit several strategies to bypass content moderation algorithms. These include search engine optimization (SEO) manipulation, misleading thumbnails and titles, as well as lexical variations that use a mix of Ge'ez and Latin scripts. Channels often presented themselves with misleading credentials, such as using "Dr." in the channel name, to appear authoritative while sharing explicit content.

Comment Sections

The comment sections of these videos also reflected significant issues, including the dissemination of hate speech, vulgar language, and inappropriate sharing of personal information. Disturbingly, users often sought medical advice from content creators who falsely presented themselves as professionals, further highlighting the risks posed by inadequate moderation.

Implications for Policy and Future Research

The findings of this study have vital implications for multiple stakeholders:

Social Media Platforms: Platforms like YouTube need to enhance the efficacy of their content moderation systems for low-resourced languages. This could involve better NLP models, culturally aware content policies, and more robust reporting mechanisms that provide feedback to users.
Non-Governmental Organizations (NGOs): NGOs working with vulnerable groups can use these insights to develop better digital literacy programs and support mechanisms for individuals who might be disproportionately impacted by harmful online content.
Government Bodies: Establishing regulatory frameworks that hold platforms accountable for the enforcement of community guidelines across different languages and cultural contexts can aid in protecting citizens from online harm.

Conclusion

This paper provides an essential contribution to understanding the challenges faced by low-resourced language speakers in navigating online platforms. The study reveals the serious inadequacies in current content moderation systems and suggests several pathways for improvements. Future research should continue to explore these dynamics across other low-resourced languages and different online platforms to build a more inclusive digital environment that protects all users from harm.

Markdown Report Issue