IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators
Abstract: This study focuses on media bias detection, crucial in today's era of influential social media platforms shaping individual attitudes and opinions. In contrast to prior work that primarily relies on training specific models tailored to particular datasets, resulting in limited adaptability and subpar performance on out-of-domain data, we introduce a general bias detection framework, IndiVec, built upon LLMs. IndiVec begins by constructing a fine-grained media bias database, leveraging the robust instruction-following capabilities of LLMs and vector database techniques. When confronted with new input for bias detection, our framework automatically selects the most relevant indicator from the vector database and employs majority voting to determine the input's bias label. IndiVec excels compared to previous methods due to its adaptability (demonstrating consistent performance across diverse datasets from various sources) and explainability (providing explicit top-k indicators to interpret bias predictions). Experimental results on four political bias datasets highlight IndiVec's significant superiority over baselines. Furthermore, additional experiments and analysis provide profound insights into the framework's effectiveness.
- We can detect your bias: Predicting the political ideology of news articles. arXiv preprint arXiv:2010.05338.
- Redditbias: A real-world resource for bias evaluation and debiasing of conversational language models. arXiv preprint arXiv:2106.03521.
- The media frames corpus: Annotations of frames across issues. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 438–444.
- Analyzing framing through the casts of characters in the news. In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 1410–1420.
- Learning to flip the bias of news headlines. In Proceedings of the 11th International Conference on Natural Language Generation, pages 79–88, Tilburg University, The Netherlands. Association for Computational Linguistics.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Robert M Entman. 1993. Framing: Toward clarification of a fractured paradigm. Journal of communication, 43(4):51–58.
- Recounting the courts? applying automated content analysis to enhance empirical legal research. Journal of Empirical Legal Studies, 4(4):1007–1039.
- In plain sight: Media bias through the lens of factual reporting. arXiv preprint arXiv:1909.02670.
- A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 3007–3014.
- A large labeled corpus for online harassment research. In Proceedings of the 2017 ACM on web science conference, pages 229–233.
- Dylan Grosz and Patricia Conde-Cespedes. 2020. Automatic detection of sexist statements commonly used at the workplace. In Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2020 Workshops, DSFN, GII, BDM, LDRC and LBD, Singapore, May 11–14, 2020, Revised Selected Papers 24, pages 104–115. Springer.
- The Oxford handbook of political psychology. Oxford University Press.
- Political ideology detection using recursive neural networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1113–1122.
- Kristen Johnson and Dan Goldwasser. 2016. “all i know about politics is what i read in twitter”: Weakly supervised models for extracting politicians’ stances from twitter. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pages 2966–2977.
- Michelle YoungJin Kim and Kristen Johnson. 2022. Close: Contrastive learning of subframe embeddings for political bias classification of news media. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2780–2793.
- Detecting frames in news headlines and its application to analyzing news framing trends surrounding us gun violence. In Proceedings of the 23rd conference on computational natural language learning (CoNLL), pages 504–514.
- Politics: pretraining with same-story article comparison for ideology prediction and stance detection. arXiv preprint arXiv:2205.00619.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Embedding-based retrieval with llm for effective agriculture information extracting from unstructured data. arXiv preprint arXiv:2308.03107.
- Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
- Eitan Sapiro-Gheiler. 2019. Examining political trustworthiness through text-based measures of ideology. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 10029–10030.
- Neural media bias detection using distant supervision with babe–bias annotations by experts. arXiv preprint arXiv:2209.14557.
- Detecting frames in news headlines and lead images in us gun violence coverage. In Findings of the Association for Computational Linguistics: 2021 Conference on Empirical Methods in Natural Language Processing. November 2021, pages 4037-4050, Punta Cana, Dominican Republic.
- A frame of mind: Using statistical models for detection of framing and agenda setting campaigns. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1629–1638.
- Esther van den Berg and Katja Markert. 2020. Context in informational bias detection. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6315–6326.
- Continuity of topic, interaction, and query: Learning to quote in online conversations. arXiv preprint arXiv:2106.09896.
- Quotation recommendation and interpretation based on transformation from queries to quotations. arXiv preprint arXiv:2105.14189.
- Learning when and what to quote: A quotation recommender system with mutual promotion of recommendation and generation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3094–3105.
- Quotation recommendation for multi-party online conversations based on semantic and topic fusion. ACM Transactions on Information Systems.
- How to use t-sne effectively. Distill, 1(10):e2.
- Introducing mbib-the first media bias identification benchmark task and dataset collection. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2765–2774.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
- Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5(1):33–48.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.