Introducing MBIB -- the first Media Bias Identification Benchmark Task and Dataset Collection

Published 25 Apr 2023 in cs.IR and cs.AI | (2304.13148v1)

Abstract: Although media bias detection is a complex multi-task problem, there is, to date, no unified benchmark grouping these evaluation tasks. We introduce the Media Bias Identification Benchmark (MBIB), a comprehensive benchmark that groups different types of media bias (e.g., linguistic, cognitive, political) under a common framework to test how prospective detection techniques generalize. After reviewing 115 datasets, we select nine tasks and carefully propose 22 associated datasets for evaluating media bias detection techniques. We evaluate MBIB using state-of-the-art Transformer techniques (e.g., T5, BART). Our results suggest that while hate speech, racial bias, and gender bias are easier to detect, models struggle to handle certain bias types, e.g., cognitive and political bias. However, our results show that no single technique can outperform all the others significantly. We also find an uneven distribution of research interest and resource allocation to the individual tasks in media bias. A unified benchmark encourages the development of more robust systems and shifts the current paradigm in media bias detection evaluation towards solutions that tackle not one but multiple media bias types simultaneously.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (18)

View on Semantic Scholar

Summary

The paper introduces MBIB, a unified benchmark for media bias detection that aggregates nine tasks and curated datasets.
It leverages Transformer models like T5 and BART to evaluate performance using micro and macro F1-scores across multiple bias types.
The study highlights research gaps and paves the way for future benchmarks featuring multi-lingual datasets and expanded bias dimensions.

Introducing MBIB – A Media Bias Identification Benchmark

The paper "Introducing MBIB – the first Media Bias Identification Benchmark Task and Dataset Collection" introduces MBIB, a comprehensive benchmark for assessing media bias across various types. MBIB aggregates linguistic, cognitive, political, and additional bias types, presenting a unified framework for evaluating media bias detection techniques using state-of-the-art Transformer models.

Benchmark Overview and Dataset Curation

MBIB is engineered to address shortcomings in existing research where media bias detection is often isolated to specific bias types without allowing for standardized model comparisons. This benchmark introduces nine tasks with associated datasets allowing the simultaneous evaluation of multiple media bias types. During the dataset curation phase, the authors reviewed 115 datasets and ultimately selected 22, ensuring they met criteria such as accessibility, language, size, and label quality.

Figure 1: The dataset collection and selection process.

The selected datasets cover various forms of bias, offering rich representations of tasks such as linguistic bias, cognitive bias, and political bias. The imbalance in dataset availability across tasks highlights research gaps, such as a lack of extensive resources for reporting-level context bias, underscoring future opportunities for comprehensive corpus development.

Methodology and Implementation

The MBIB benchmark clusters tasks pertinent to media bias, analyzing individual datasets within each task to derive a wider perspective on media bias categorization. Each dataset underwent preprocessing to ensure uniformity in label representation, enabling binary classification where necessary, and comprehensive format alignment for consistency across tasks.

Figure 2: Dataset distribution over MBIB tasks.

Standard Transformer models, including T5 and BART, were applied to assess the benchmark, revealing their varying capabilities in task handling, particularly identifying easier vs. more complex bias types such as gender or political biases versus cognitive biases. Metrics such as micro and macro average $F_1$ -scores were utilized to present task-based performance evaluations comprehensively.

Baseline Performance and Model Analysis

Upon deploying five baseline models—ConvBERT, Bart, RoBERTa-Twitter, ELECTRA, and GPT-2—performance variances across tasks were analyzed, unveiling distinctive strengths among models for specific biases. The investigation identified no singular model being superior for all tasks, hence illustrating the merit of task diversity and dataset amalgamation in MBIB.

Figure 3: $F_{1}$ -scores comparison across baseline models by task showing micro-average performance.

Micro and macro-average evaluations offered insights into a model's capacity to generalize across different datasets within a task, tangibly reflecting on dataset size and detail level's effects on performance. Beyond reinforcing the significance of dataset breadth, the analysis advocates for the potential enhancements in results with refined approaches.

Implications and Future Directions

The introduction of MBIB ties into the broader vision for enhanced media bias scholarly work. Apart from steering the development of sophisticated, generalizable models, MBIB paves the way for future benchmarks inclusive of multi-lingual datasets and extended bias dimensions such as framing or sentiment analysis. As media consumption channels evolve, so does the necessity for robust, nuanced methodologies capable of tackling multifaceted biases within news propagations.

Conclusion

MBIB stands as a pivotal contribution to media bias research, setting a robust framework for the development of innovative bias detection models and fostering greater understanding of media content manipulation's ramifications. Through continuous dataset expansion and task refinement, MBIB is foundational for progress in unveiling the intricacies of media bias in an increasingly digital-first world.

Markdown Report Issue