"I Searched for a Religious Song in Amharic and Got Sexual Content Instead": Investigating Online Harm in Low-Resourced Languages on YouTube
Abstract: Online social media platforms such as YouTube have a wide, global reach. However, little is known about the experience of low-resourced language speakers on such platforms; especially in how they experience and navigate harmful content. To better understand this, we (1) conducted semi-structured interviews (n=15) and (2) analyzed search results (n=9313), recommendations (n=3336), channels (n=120) and comments (n=406) of policy-violating sexual content on YouTube focusing on the Amharic language. Our findings reveal that -- although Amharic-speaking YouTube users find the platform crucial for several aspects of their lives -- participants reported unplanned exposure to policy-violating sexual content when searching for benign, popular queries. Furthermore, malicious content creators seem to exploit under-performing language technologies and content moderation to further target vulnerable groups of speakers, including migrant domestic workers, diaspora, and local Ethiopians. Overall, our study sheds light on how failures in low-resourced language technology may lead to exposure to harmful content and suggests implications for stakeholders in minimizing harm. Content Warning: This paper includes discussions of NSFW topics and harmful content (hate, abuse, sexual harassment, self-harm, misinformation). The authors do not support the creation or distribution of harmful content.
- 2019a. The #BenderRule: On Naming the Languages We Study and Why It Matters. https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/
- 2019b. Lebanon: ‘Their house is my prison’: Exploitation of migrant domestic workers in Lebanon. https://www.amnesty.org/en/documents/mde18/0022/2019/en/
- 2022. The African Union Artificial Intelligence Continental Strategy For Africa | AUDA-NEPAD. https://www.nepad.org/news/african-union-artificial-intelligence-continental-strategy-africa
- 2022. Essential YouTube Statistics. https://datareportal.com/essential-youtube-stats
- 2022. Inside Facebook’s African Sweatshop. https://time.com/6147458/facebook-africa-content-moderation-employee-treatment/
- 2022. World-first online safety laws introduced in Parliament. GOV (March 2022). https://www.gov.uk/government/news/world-first-online-safety-laws-introduced-in-parliament
- 2023. Artificial Intelligence is at the core of discussions in Rwanda as the AU High-Level Panel on Emerging Technologies convenes experts to draft the AU-AI Continental Strategy | AUDA-NEPAD. https://www.nepad.org/news/artificial-intelligence-core-of-discussions-rwanda-au-high-level-panel-emerging
- 2023. Pioneering Africa’s AI Future: Convening of African AI Experts to Finalise the AU-AI Continental Strategy | AUDA-NEPAD. https://www.nepad.org/news/pioneering-africas-ai-future-convening-of-african-ai-experts-finalise-au-ai-continental
- 2023. QualCoder. https://qualcoder.wordpress.com/
- 2023. Tracking Exposed. https://tracking.exposed/
- 2023. YouTube Data API. https://developers.google.com/youtube/v3
- 2024. Principles and background. https://www.esafety.gov.au/industry/safety-by-design/principles-and-background
- 2024. YouTube Community Guidelines enforcement – Google Transparency Report. https://transparencyreport.google.com/youtube-policy/removals [Online; accessed 20. Jan. 2024].
- Clement Ola Adekoya. 2021. Information and Misinformation during the# EndSARS Protest in Nigeria: An Assessment of the Role of Social Media. COVENANT JOURNAL OF LIBRARY AND INFORMATION SCIENCE (2021).
- A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3053–3070. https://doi.org/10.18653/v1/2022.naacl-main.223
- Gabe Adugna. 2022. Research: Language Learning - Amharic: Home. https://library.bu.edu/amharic/Home
- Emmanuel Akinwotu. 2021. Facebook’s role in Myanmar and Ethiopia under new scrutiny. the Guardian (Oct. 2021). https://www.theguardian.com/technology/2021/oct/07/facebooks-role-in-myanmar-and-ethiopia-under-new-scrutiny
- Syed Mustafa Ali. 2016. A brief introduction to decolonial computing. XRDS: Crossroads, The ACM Magazine for Students 22, 4 (June 2016), 16–21. https://doi.org/10.1145/2930886
- YouTube Tracking Exposed: Investigating Brexit Polarization. https://youtube.tracking.exposed/trexit/
- Ahmed Ansari. 2019. Decolonizing design through the perspectives of cosmological others: Arguing for an ontological turn in design research and practice. XRDS: Crossroads, The ACM Magazine for Students 26, 2 (Nov. 2019), 16–19. https://doi.org/10.1145/3368048
- Sara Atske. 2019. 1. Popular YouTube channels produced a vast amount of content, much of it in languages other than English. https://www.pewresearch.org/internet/2019/07/25/popular-youtube-channels-produced-a-vast-amount-of-content-much-of-it-in-languages-other-than-english/
- The 5js in ethiopia: Amharic hate speech data annotation using toloka crowdsourcing platform. In 2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA). IEEE, 114–120.
- Exploring Amharic Hate Speech Data Collection and Classification Approaches. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing. 49–59.
- Jack Bandy. 2021. Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (April 2021), 74:1–74:34. https://doi.org/10.1145/3449148
- Multimodal datasets: misogyny, pornography, and malignant stereotypes. http://arxiv.org/abs/2110.01963 arXiv:2110.01963 [cs].
- Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (Jan. 2006), 77–101. https://doi.org/10.1191/1478088706qp063oa Publisher: Routledge _eprint: https://www.tandfonline.com/doi/pdf/10.1191/1478088706qp063oa.
- Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection. In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023). 1–13.
- Transparency Report Help Center. 2023. YouTube Community Guidelines Enforcement FAQs. Technical Report. Google. https://support.google.com/transparencyreport/answer/9209072#zippy=%2Chow-is-violative-view-rate-vvr-calculated
- #thyghgapp: Instagram Content Moderation and Lexical Variation in Pro-Eating Disorder Communities. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW ’16). Association for Computing Machinery, New York, NY, USA, 1201–1213. https://doi.org/10.1145/2818048.2819963
- The Bag of Communities: Identifying Abusive Behavior Online with Preexisting Internet Data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3175–3187. https://doi.org/10.1145/3025453.3026018
- Jim Cummins. 2017. Teaching Minoritized Students: Are Additive Approaches Legitimate? Harvard Educational Review 87, 3 (Sept. 2017), 404–425. https://doi.org/10.17763/1943-5045-87.3.404
- Dipto Das. 2023. Decolonization through Technology and Decolonization of Technology. In Companion Proceedings of the 2023 ACM International Conference on Supporting Group Work (GROUP ’23). Association for Computing Machinery, New York, NY, USA, 51–53. https://doi.org/10.1145/3565967.3571754
- Handling and Presenting Harmful Text. arXiv:2204.14256 [cs] (April 2022). http://arxiv.org/abs/2204.14256 arXiv: 2204.14256.
- AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages. http://arxiv.org/abs/2211.03263 arXiv:2211.03263 [cs].
- Paul Dourish and Scott D. Mainwaring. 2012. Ubicomp’s colonial impulse. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing (UbiComp ’12). Association for Computing Machinery, New York, NY, USA, 133–142. https://doi.org/10.1145/2370216.2370238
- A Latent Variable Model for Geographic Lexical Variation. ACL Anthology (Oct. 2010), 1277–1287. https://aclanthology.org/D10-1124
- Shawna Ferris and Danielle Allard. 2016. Tagging for activist ends and strategic ephemerality: creating the Sex Work Database as an activist digital archive. Feminist Media Studies 16, 2 (March 2016), 189–204. https://doi.org/10.1080/14680777.2015.1118396 Publisher: Routledge _eprint: https://doi.org/10.1080/14680777.2015.1118396.
- Saifaddin Galal. 2023. Queries with the highest volume of YouTube search activity in Ethiopia in 2022. https://www.statista.com/statistics/1307177/most-popular-youtube-searches-in-ethiopia/
- Victimization as a Result of Non-Consensual Dissemination of Sexting and Psychopathology Correlates: An Exploratory Analysis. International Journal of Environmental Research and Public Health 18, 12 (June 2021), 6564. https://doi.org/10.3390/ijerph18126564
- Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis. arXiv preprint arXiv:2305.12709 (2023).
- Google. 2023. U Digital Services Act (EU DSA) Biannual VLOSE/VLOP Transparency Report. Technical Report. Google. https://storage.googleapis.com/transparencyreport/report-downloads/pdf-report-27_2023-8-28_2023-9-10_en_v1.pdf
- Universal Neural Machine Translation for Extremely Low Resource Languages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 344–354. https://doi.org/10.18653/v1/N18-1032
- Detecting Social Media Manipulation in Low-Resource Languages. In Companion Proceedings of the ACM Web Conference 2023 (WWW ’23 Companion). Association for Computing Machinery, New York, NY, USA, 1358–1364. https://doi.org/10.1145/3543873.3587615
- Disproportionate Removals and Differing Content Moderation Experiences for Conservative, Transgender, and Black Social Media Users: Marginalization and Moderation Gray Areas. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (Oct. 2021), 1–35. https://doi.org/10.1145/3479610
- Karen Hao and Andrea Paola Hernández. 2022. How the AI industry profits from catastrophe. MIT Technology Review (July 2022). https://www.technologyreview.com/2022/04/20/1050392/ai-industry-appen-scale-data-labels
- Countering malicious content moderation evasion in online social networks: Simulation and detection of word camouflage. Applied Soft Computing 145 (Sept. 2023), 110552. https://doi.org/10.1016/j.asoc.2023.110552
- Hypernymy Detection for Low-resource Languages: A Study for Hindi, Bengali, and Amharic. ACM Transactions on Asian and Low-Resource Language Information Processing 21, 4 (March 2022), 67:1–67:21. https://doi.org/10.1145/3490389
- Daphne Keller. 2022. Lawful but awful? Control over legal speech by platforms, governments, and internet users. U. Chi. L. Rev. Online (2022), 1.
- Kate Linebaugh. 2023. The Hidden Workforce That Helped Filter Violence and Abuse Out of ChatGPT - The Journal. - WSJ Podcasts. https://www.wsj.com/podcasts/the-journal/the-hidden-workforce-that-helped-filter-violence-and-abuse-out-of-chatgpt/ffc2427f-bdd8-47b7-9a4b-27e7267cf413
- Participatory Detection of Language Barriers towards Multilingual Sustainability(ies) in Africa. Sustainability 14, 13 (Jan. 2022), 8133. https://doi.org/10.3390/su14138133 Number: 13 Publisher: Multidisciplinary Digital Publishing Institute.
- Sonia Livingstone and Peter K. Smith. 2014. Annual Research Review: Harms experienced by child users of online and mobile technologies: the nature, prevalence and management of sexual and aggressive risks in the digital age. Journal of Child Psychology and Psychiatry 55, 6 (2014), 635–654. https://doi.org/10.1111/jcpp.12197 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/jcpp.12197.
- Eliza Mackintosh. 2021. Facebook knew it was being used to incite violence in Ethiopia. It did little to stop the spread, documents show |||| CNN Business. CNN (Oct. 2021). https://www.cnn.com/2021/10/25/business/ethiopia-violence-facebook-papers-cmd-intl/index.html
- YouTube Regrets. https://assets.mofoprod.net/network/documents/Mozilla_YouTube_Regrets_Report.pdf
- BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus. http://arxiv.org/abs/2207.03546 arXiv:2207.03546 [cs, eess].
- T Mhaka. 2020. How social media regulations are silencing dissent in Africa. Aljazeera.
- Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models. arXiv preprint arXiv:2210.13029 (2022).
- Folk Theories of Avoiding Content Moderation: How Vaccine-Opposed Influencers Amplify Vaccine Opposition on Instagram. Social Media + Society 8, 4 (Oct. 2022), 20563051221144252. https://doi.org/10.1177/20563051221144252 Publisher: SAGE Publications Ltd.
- AfriSenti: A Benchmark Twitter Sentiment Analysis Dataset for African Languages. In 4th Workshop on African Natural Language Processing.
- SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval). arXiv preprint arXiv:2304.06845 (2023).
- Evelyne Musambi and Cara Anna. 2023. Facebook content moderators in Kenya call the work ’torture.’ Their lawsuit may ripple worldwide. AP News (June 2023). https://apnews.com/article/kenya-facebook-content-moderation-lawsuit-8215445b191fce9df4ebe35183d8b322
- Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 2144–2160. https://doi.org/10.18653/v1/2020.findings-emnlp.195
- Understanding Hate Group Videos on YouTube. In CSCW’22 Companion: Companion Publication of the 2022 Conference on Computer Supported Cooperative Work and Social Computing. Association for Computing Machinery, New York, NY, USA, 32–36. https://doi.org/10.1145/3500868.3559465
- Safiya Umoja Noble. 2018a. Algorithms of oppression. In Algorithms of oppression. New York university press.
- Safiya Umoja Noble. 2018b. Algorithms of Oppression: How Search Engines Reinforce Racism. https://nyupress.org/9781479837243/algorithms-of-oppression
- Debora Nozza. 2021. Exposing the limits of zero-shot cross-lingual hate speech detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 907–914.
- Oluyemi Ogunseyin. 2023. FG seeks top Nigerian researchers to co-create National AI strategy. Guardian Nigeria News - Nigeria and World News (Aug. 2023). https://guardian.ng/news/fg-seeks-top-nigerian-researchers-to-co-create-national-ai-strategy
- How good are Large Language Models on African Languages? arXiv preprint arXiv:2311.07978 (2023).
- AI and the Everything in the Whole Wide World Benchmark. http://arxiv.org/abs/2111.15366 arXiv:2111.15366 [cs].
- Google Transparency Report. 2023. Views - Violative View Rate. Technical Report. Google. https://transparencyreport.google.com/youtube-policy/views?hl=en
- Auditing radicalization pathways on YouTube. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, Barcelona Spain, 131–141. https://doi.org/10.1145/3351095.3372879
- Sarah T. Roberts. 2021. Behind the Screen. https://yalebooks.yale.edu/9780300261479/behind-the-screen
- Niamh Rowe. 2023. ‘It’s destroyed me completely’: Kenyan moderators decry toll of training of AI models. The Guardian (Aug. 2023). https://www.theguardian.com/technology/2023/aug/02/ai-chatbot-training-human-toll-content-moderator-meta-openai
- Auditing Algorithms : Research Methods for Detecting Discrimination on Internet Platforms. https://www.semanticscholar.org/paper/Auditing-Algorithms-%3A-Research-Methods-for-on-Sandvig-Hamilton/b7227cbd34766655dea10d0437ab10df3a127396
- Online Harassment in Majority Contexts: Examining Harms and Remedies across Countries. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–16. https://doi.org/10.1145/3544548.3581020
- Farhana Shahid and Aditya Vashistha. 2023. Decolonizing Content Moderation: Does Uniform Global Community Standard Resemble Utopian Equality or Western Power Hegemony?. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–18. https://doi.org/10.1145/3544548.3581538
- Internet Society. 2020. Global Internet Shutdowns. (2020). https://pulse.internetsociety.org/shutdowns?search=ethiopia#events
- Auditing YouTube’s Recommendation Algorithm for Misinformation Filter Bubbles. ACM Transactions on Recommender Systems (Oct. 2022), 3568392. https://doi.org/10.1145/3568392 arXiv:2210.10085 [cs].
- Barney Glaser Strauss, Anselm. 2017. Discovery of Grounded Theory: Strategies for Qualitative Research. Routledge, New York. https://doi.org/10.4324/9780203793206
- Technologies for Social Justice: Lessons from Sex Workers on the Front Lines. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–14. https://doi.org/10.1145/3290605.3300882
- You reap what you sow: On the challenges of bias evaluation under multilingual settings. In Proceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models. 26–41.
- Tsegamlak Terefe and Dereje Hailemariam. 2017. Entropy estimation and entropy-based encoding of written Amharic language for efficient transmission in telecom networks. In 2017 IEEE AFRICON. 238–244. https://doi.org/10.1109/AFRCON.2017.8095488 ISSN: 2153-0033.
- “It’s common and a part of being a content creator”: Understanding How Creators Experience and Cope with Hate and Harassment Online. In CHI ’22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3491102.3501879
- Zoe Thomas. 2023. Big Tech Relies on Outsourcing. Lawsuits in Africa Could Upend That. - Tech News Briefing - WSJ Podcasts. https://www.wsj.com/podcasts/tech-news-briefing/big-tech-relies-on-outsourcing-lawsuits-in-africa-could-upend-that/aea41e18-85a3-4f67-9a51-7d8b378f4fb8
- Daricia Wilkinson and Bart Knijnenburg. 2022a. Many Islands, Many Problems: An Empirical Examination of Online Safety Behaviors in the Caribbean. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–25.
- Daricia Wilkinson and Bart Knijnenburg. 2022b. Many Islands, Many Problems: An Empirical Examination of Online Safety Behaviors in the Caribbean. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–25. https://doi.org/10.1145/3491102.3517643
- Erik Wingrove-Haugland and Jillian McLeod. 2022. Not “Minority” but “Minoritized”. Teaching Ethics (Jan. 2022). https://doi.org/10.5840/tej20221799
- John Woodhouse. 2022. Regulating online harms. (2022). https://researchbriefings.files.parliament.uk/documents/CBP-8743/CBP-8743.pdf
- Agent, Gatekeeper, Drug Dealer: How Content Creators Craft Algorithmic Personas. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 1–27. https://doi.org/10.1145/3359321
- Analysis of the Ethiopic Twitter dataset for abusive speech in Amharic. arXiv preprint arXiv:1912.04419 (2019).
- Detection of harassment on Web 2.0. (Jan. 2009).
- Leon Yin and Aaron Sankin. 2020. Google Ad Portal Equated “Black Girls” with Porn – The Markup. https://themarkup.org/google-the-giant/2020/07/23/google-advertising-keywords-black-girls Section: Google the Giant.
- Low-resource languages jailbreak gpt-4. arXiv preprint arXiv:2310.02446 (2023).
- Youtube. 2023. How does YouTube manage harmful content? Technical Report. Youtube. https://www.youtube.com/intl/ALL_ca/howyoutubeworks/our-commitments/managing-harmful-content/
- Transfer Learning for Low-Resource Neural Machine Translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1568–1575. https://doi.org/10.18653/v1/D16-1163
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.