UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections
The presented paper introduces a significant advancement in the detection and analysis of misleading narratives during electoral processes, focusing on the United Kingdom's 2019 and 2024 general elections. The researchers developed UKElectionNarratives, a dataset aimed at understanding the proliferation and impact of deceptive narratives that could influence voter perception and electoral integrity.
Data Construction and Methodology
The paper outlines the creation process of UKElectionNarratives, which encompasses three main phases: tweet collection, filtering using language models, and human annotation. The dataset consists of 2,000 annotated tweets, carefully selected based on retweet frequency and alignment with misleading narrative classifications as determined by three different LLMs, specifically gemma 2, Llama-3.1, and Mistral-7B. Such pre-filtering allowed for optimal human annotation, streamlining annotators’ efforts towards the tweets most likely containing misleading narratives.
An important contribution is the multi-level narrative taxonomy developed from a comprehensive literature review and brainstorming sessions with experts. It categorizes narratives into 10 super-narratives, such as Distrust in institutions and Anti-EU sentiments, and 32 specific narratives, providing a robust framework for annotation and subsequent analysis.
Empirical Findings
The statistical analysis of UKElectionNarratives reveals an uneven distribution of narratives, with Distrust in institutions and Distrust in democratic systems emerging as recurring themes across both elections. This emphasizes the persistent skepticism and concern regarding electoral processes and the integrity of democratic institutions.
Using BERTopic, researchers identified and labeled key topics within the corpus, which, when mapped to annotated narratives, demonstrated thematic consistency across election periods. Notably, issues surrounding national policies, immigration, and gender rights dominated the discourse, showcasing how certain themes persist in influencing public opinion over time.
Benchmarking Narrative Detection Models
The study evaluated various narrative detection models, including basic classifiers, PLMs like RoBERTa, and LLMs such as GPT-4o, employing zero-shot and few-shot setups with narrative descriptions to optimize performance. Findings indicate that GPT-4o models outperform PLMs, with zero-shot configurations enhanced by narrative descriptions yielding the highest detection accuracy. These results underscore the efficacy of LLMs in handling diverse and subtle linguistic cues associated with misleading narratives.
Implications and Future Work
The implications of this research are multifaceted, offering significant utility in automatic annotation and construction of new datasets for European elections, enhanced misinformation analysis, and fostering media literacy. Moreover, the theoretical framework set by the narrative taxonomy and the results from narrative detection models hold potential for broader applications in the AI domain, such as automated content moderation and advancing our understanding of political communication.
The paper acknowledges limitations related to potential annotator bias and dataset size, highlighting future work in data augmentation and model enhancement. Additionally, expanding the narrative taxonomy to encompass emerging narratives in different cultural contexts could improve its relevance and applicability.
In conclusion, UKElectionNarratives provides a valuable resource for studying election-related misinformation, supporting the development of tools and methodologies to counteract the spread of deceptive narratives, thus contributing to preserving democratic integrity and promoting informed public discourse.