MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Published 12 Feb 2025 in cs.CL | (2502.08319v1)

Abstract: Propaganda is a form of persuasion that has been used throughout history with the intention goal of influencing people's opinions through rhetorical and psychological persuasion techniques for determined ends. Although Arabic ranked as the fourth most-used language on the internet, resources for propaganda detection in languages other than English, especially Arabic, remain extremely limited. To address this gap, the first Arabic dataset for Multi-label Propaganda, Sentiment, and Emotion (MultiProSE) has been introduced. MultiProSE is an open-source extension of the existing Arabic propaganda dataset, ArPro, with the addition of sentiment and emotion annotations for each text. This dataset comprises 8,000 annotated news articles, which is the largest propaganda dataset to date. For each task, several baselines have been developed using LLMs, such as GPT-4o-mini, and pre-trained LLMs (PLMs), including three BERT-based models. The dataset, annotation guidelines, and source code are all publicly released to facilitate future research and development in Arabic LLMs and contribute to a deeper understanding of how various opinion dimensions interact in news media1.