Misleading through Inconsistency: A Benchmark for Political Inconsistencies Detection

Published 25 May 2025 in cs.CL | (2505.19191v1)

Abstract: Inconsistent political statements represent a form of misinformation. They erode public trust and pose challenges to accountability, when left unnoticed. Detecting inconsistencies automatically could support journalists in asking clarification questions, thereby helping to keep politicians accountable. We propose the Inconsistency detection task and develop a scale of inconsistency types to prompt NLP-research in this direction. To provide a resource for detecting inconsistencies in a political domain, we present a dataset of 698 human-annotated pairs of political statements with explanations of the annotators' reasoning for 237 samples. The statements mainly come from voting assistant platforms such as Wahl-O-Mat in Germany and Smartvote in Switzerland, reflecting real-world political issues. We benchmark LLMs on our dataset and show that in general, they are as good as humans at detecting inconsistencies, and might be even better than individual humans at predicting the crowd-annotated ground-truth. However, when it comes to identifying fine-grained inconsistency types, none of the model have reached the upper bound of performance (due to natural labeling variation), thus leaving room for improvement. We make our dataset and code publicly available.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

Overview of a Benchmark for Political Inconsistencies Detection

In the paper "Misleading through Inconsistency: A Benchmark for Political Inconsistencies Detection," the authors present a thorough examination of inconsistency detection within political discourse. Their work is pivotal in addressing how inconsistent political statements can undermine public trust and accountability. The paper proposes a novel inconsistency detection task and introduces a comprehensive scale for classifying inconsistency types, thus promoting new research directions in NLP.

Inconsistency Detection Task and Dataset

The paper delineates a task specifically tailored for detecting political inconsistencies, which extends beyond the traditional Natural Language Inference (NLI) concepts such as Entailment, Unrelated, or Contradiction. Recognizing such inconsistencies, particularly in a political domain, poses unique challenges that are not entirely covered by standard NLI frameworks.

The authors present a dataset that comprises 698 human-annotated pairs of political statements, with 237 samples containing explanations for the annotators' reasoning. These samples originate from voter assistance platforms like Wahl-O-Mat in Germany and Smartvote in Switzerland, ensuring the data reflects actual political discussions and stances. This dataset serves as a substantial resource for future research into political inconsistencies.

Benchmarking Large Language Models

A key aspect of the study involves benchmarking various Large Language Models (LLMs) on the dataset. The authors conclude that, generally, these models are proficient at detecting inconsistencies, sometimes even surpassing individual human annotators in predicting the crowd-annotated ground-truth. However, none have yet achieved maximum performance in identifying fine-grained inconsistency types due to natural labeling variation—an area where improvement is still necessary. These observations open paths for advancing NLP model designs that better capture the nuances of political discourse.

Implications and Future Work

The implications of this research are multifaceted. Practically, automated inconsistency detection can aid journalists in holding politicians accountable, fostering transparency, and encouraging public trust. Theoretically, the study provides a foundational framework for future explorations into automated political statement analysis.

Further developments could include enhancing models' performance on nuanced inconsistency types and expanding datasets to encompass political discourse from diverse cultural contexts. Future work could also explore integrating temporal and contextual dynamics, given the prominence of such factors in perceiving political inconsistency.

Conclusion

The paper "Misleading through Inconsistency: A Benchmark for Political Inconsistencies Detection" effectively lays the groundwork for a promising research avenue in political NLP applications. The benchmark introduced could catalyze advancements in understanding the subtleties of political communication, improve computational models in detecting rhetorical inconsistencies, and ultimately contribute to more transparent political processes.