LIAR² Dataset: Enhanced Deception Detection
- LIAR² is an extended corpus for fake claim classification that integrates sentiment and emotion features to improve detection accuracy.
- It converts a six-class labeling system into a binary framework and leverages BERT-Base embeddings to achieve a 70% accuracy benchmark.
- The dataset incorporates psycholinguistic cues and anonymized speaker metadata, supporting robust analysis of deceptive social-media claims.
Sentimental LIAR, frequently referenced as LIAR², is an extended corpus and benchmark for fake claim classification in short, social-media-style textual claims. Engineered to address the limitations of the original LIAR dataset—specifically low classification performance and absence of affective cues—LIAR² augments each claim with sentiment and emotion features derived from state-of-the-art APIs. By converting the original six-way, fine-grained truthfulness schema into a binary labeling setup and rigorously de-biasing speaker information, LIAR² enables the application of deep learning architectures, most notably BERT-Base, for automated detection of deceptive claims with substantially improved accuracy (Upadhayay et al., 2020).
1. Motivation and Theoretical Foundations
Psycholinguistic theory, including the Undeutsch Hypothesis and Zuckerman’s four-factor theory, posits that deceptive statements deviate from truthful ones not only in factual content but also in affective expression and stylistic structure. Deceptive utterances are theorized to display exaggerated affect or abnormal sentiment patterns relative to truthful claims. The original LIAR dataset, which consists of 12,836 short claims sourced from PolitiFact, revealed poor performance across traditional supervised models: a convolutional neural network achieved 27.4% accuracy, and attention-based LSTMs using speaker profiles peaked at approximately 41.5%. These results underscore the necessity of enriching claim representations with psycholinguistically meaningful features to more effectively distinguish true from false claims (Upadhayay et al., 2020).
2. Dataset Structure and Labeling Schema
LIAR² preserves the original LIAR claim set (12,836 entries) but refines the annotation framework for enhanced discriminative capability. The six-class labels—pants-fire, false, barely-true, half-true, mostly-true, true—are collapsed into a binary label space:
- False: Comprising pants-fire, false, barely-true, and half-true (~65%, 8,343 claims)
- True: Encompassing mostly-true and true (~35%, 4,493 claims)
Speaker names, previously a source of textual bias, are anonymized as numerical IDs, ensuring models must rely on claim content and derived features rather than overfitting to known speakers. Table 1 summarizes the canonical data splits.
| Split | # Claims | % of Total |
|---|---|---|
| Training | 10,269 | 80% |
| Development | 1,283 | 10% |
| Testing | 1,284 | 10% |
3. Sentiment and Emotion Feature Derivation
To operationalize psycholinguistic cues, LIAR² employs both Google Cloud Natural Language API and IBM Watson Natural Language Understanding API for sentiment and emotion quantification, respectively. For each claim:
- Sentiment: Google’s API yields a continuous score (SentimentScore). A binary categorical label is assigned via:
- Emotions: IBM Watson NLU provides normalized intensity scores in for five emotions: anger, disgust, sadness, fear, and joy.
There is no post-processing normalization; raw API scores are appended.
4. Feature Vector Construction and Preprocessing
Each claim representation is a concatenation of BERT-Base and auxiliary features, yielding a 781-dimensional vector structured as follows:
- BERT-Base [CLS] embedding: 768 dimensions (tokenized, lower-cased, truncated/zero-padded to max 128 tokens)
- Speaker-credibility counts (SPC): 5 integers ([barely_true_counts, false_counts, half_true_counts, mostly_true_counts, pants_on_fire_counts])
- Sentiment: 1-dimensional score and 2-dimensional one-hot (binary label)
- Emotions: 5 float intensities
- Total non-textual features: 13
Claims undergo binary relabeling, anonymization of speaker names, and BERT WordPiece tokenization. Manual spelling correction or stopword removal is eschewed in favor of subword modeling.
5. Descriptive Statistics
LIAR² claims have an average length of 17.5 words (σ = 6.2), corresponding to a mean of 96 characters (σ = 35). The distribution of claim lengths is centered on 15–20 words, with very few exceeding 40 tokens. The binary label distribution is
The table below summarizes average emotion intensities:
| Emotion | Mean | σ |
|---|---|---|
| Anger | 0.12 | 0.15 |
| Disgust | 0.40 | 0.28 |
| Sadness | 0.08 | 0.10 |
| Fear | 0.05 | 0.08 |
| Joy | 0.15 | 0.18 |
6. Benchmark Performance and Empirical Impact
By integrating sentiment and emotion features with BERT-Base embeddings, LIAR² enables deep models to exploit both contextual and affective cues in claim verification. The benchmark BERT-CNN architecture achieves an accuracy of 70%, reflecting an improvement of approximately 30 percentage points over the best previously reported result (41.5% with attention-based LSTM plus speaker profile features). The score on this setup is ~0.64. This establishes a new standard for short-text fake claim classification on the LIAR benchmark, supporting the hypothesis that psychological and affective cues are salient for automated deception detection (Upadhayay et al., 2020).
7. Comparison with the Original LIAR Dataset
- Label granularity: Shift from six-way multiclass (original LIAR) to two-way binary categorization (LIAR²).
- Metadata enrichment: Addition of per-claim sentiment and emotion vectors to existing metadata.
- De-biasing: Anonymization of speaker names to remove confounds from attributional bias.
- Model performance: LIAR² yields a marked improvement in classification accuracy and robustness, attributed to the expanded feature space and focus on psycho-affective markers.
A plausible implication is that advanced stylometric features, when combined with deep pre-trained LLMs, can significantly raise the ceiling for automated verification of short, content-rich claims in online discourse.