MFRC: Moral Foundations Reddit Corpus
- MFRC is a comprehensive benchmark dataset featuring annotated Reddit comments for multi-label moral sentiment analysis and fairness evaluation.
- The corpus spans multiple moral foundations like care, fairness, and authority, with detailed metrics on inter-annotator agreement and domain transfer.
- Empirical studies using MFRC highlight key insights in NLP fairness, cross-domain generalization, and ethical AI alignment through robust evaluation protocols.
The Moral Foundations Reddit Corpus (MFRC) is a benchmark dataset for computational analysis of moral sentiment and foundation detection within social media, specifically covering user-generated content from Reddit. It is constructed to facilitate empirical evaluation and methodological development in NLP for subjective, multi-label moral classification tasks, with a focus on annotator agreement, domain transfer, and fairness-aware modeling. The MFRC is widely used in research on sentiment analysis, AI alignment, and cross-domain robustness, as documented in a range of studies (Naranbat et al., 13 Oct 2025, Skorski et al., 24 Jul 2025, Trager et al., 2022, Nguyen et al., 2023, Golazizian et al., 2024).
1. Data Collection, Structure, and Annotation Protocols
The MFRC was originally introduced by Trager et al. (2022) and is available via HuggingFace (USC-MOLA-Lab/MFRC). Corpus construction pooled comments from 12 Reddit subreddits into three major topical domains: US Politics (e.g., r/politics, r/conservative), French Politics (e.g., r/geopolitics, r/europe), and Everyday Morality (e.g., r/IamTheAsshole, r/relationship_advice). These subreddits were selected to maximize thematic and discursive diversity. Comments were required to have a minimum Reddit score (≥10 or ≥10 up-votes, depending on the bucket), comprise at least 10 tokens, and—where relevant—mention candidate political figures (for French political content).
The largest available version contains approximately 17,885–18,000 comments, varying slightly by preprocessing and label filtering details across studies. Each comment is a single utterance (not a thread), with average length of approximately 42 tokens (Nguyen et al., 2023, Skorski et al., 24 Jul 2025).
Annotation follows the updated Moral Foundations Theory (MFT) taxonomy, labeling each comment for several moral concerns: Care/Harm, Equality, Proportionality, Loyalty/Betrayal, Authority/Subversion, Purity/Sanctity, and, in some releases, Thin Morality (undifferentiated moral language) and an implicit/explicit flag (Trager et al., 2022). Typically, comments are multi-labeled if multiple foundations apply. Annotators (27–6, depending on the task or subsample) were trained for several weeks, and each comment is coded by at least two (up to five) annotators, with post-annotation aggregation for modeling.
Preprocessing and harmonization steps include conversion to lowercase, whitespace and special character stripping, label mapping (merging Equality and Proportionality into Fairness, and exclusion of Thin Morality and Purity where alignment with the Twitter corpus is required), and explicit dataset splits for in-domain and cross-domain experiments (Naranbat et al., 13 Oct 2025).
2. Corpus Statistics and Label Distributions
Corpus size and class prevalence are contingent on the harmonization scheme:
- Trager et al. (2022): 16,123 comments (Trager et al., 2022)
- MFRC as used in cross-domain studies: 13,995 comments after harmonization to {Care, Fairness, Loyalty, Authority, Non-Moral} (Naranbat et al., 13 Oct 2025)
- Extended MFRC for model evaluation: 17,885 comments (Skorski et al., 24 Jul 2025)
- Subjective variant (MFSC): 2,000 comments × 24 annotators = 48,000 annotation events (Golazizian et al., 2024)
After harmonization to five core foundations, the in-domain label proportions (rounded from (Naranbat et al., 13 Oct 2025)) are:
| Label | Proportion |
|---|---|
| Non-Moral | ≈ 38% |
| Care | ≈ 12% |
| Fairness | ≈ 11% |
| Loyalty | ≈ 9% |
| Authority | ≈ 8% |
When all six classical MFT foundations are retained (including Purity/Sanctity):
| Foundation | Prevalence (Skorski et al., 24 Jul 2025) |
|---|---|
| Authority | 19.2% |
| Care | 26.5% |
| Fairness | 29.5% |
| Loyalty | 11.1% |
| Sanctity | 9.8% |
Notably, foundation prevalence varies by subreddit and topic bucket, with some categories (e.g., Purity) being especially rare outside religious/moral discussion subreddits.
3. Annotation Quality, Agreement, and Subjectivity
Inter-annotator agreement is quantified with prevalence- and bias-adjusted κ (PABAK), with domain-wide values in the medium range (PABAK ≈ 0.42–0.47) (Nguyen et al., 2023). Observed agreement (raw κ) is lower due to label class imbalance and the low frequency of certain foundations—a common challenge in moral sentiment datasets. Labels for each foundation are aggregated so that a comment is treated as positive if any annotator selected the label (logical OR). Annotator confidence is also recorded.
The MFRC's subjective variant (MFSC) uniquely offers post-level annotation by all annotators (24 undergraduate students, balanced across U.S. demographic characteristics), enabling research on annotator disagreement, sampling, and modeling personalized annotation policies (Golazizian et al., 2024). No classical κ or α inter-annotator reliability is reported for MFSC, but measures of item and annotator-level disagreement following Davani et al. (2023) are computed.
4. Experimental Protocols and Baseline Model Evaluation
Studies leveraging MFRC standardize on stratified or random 80/10/10 or 80/20 splits for train, dev, and test partitions in in-domain setups. For cross-domain evaluation, the full MFRC training set is used to test on the Moral Foundations Twitter Corpus (MFTC) and vice versa, with labels harmonized to the shared subset {Care, Fairness, Loyalty, Authority, Non-Moral} (Naranbat et al., 13 Oct 2025).
Modeling approaches include fine-tuning transformers (BERT-base, DistilBERT, DeBERTa-v3-base) in multi-label settings (problem_type="multi_label_classification") using binary cross-entropy with logits: Hyperparameters include AdamW, learning rate 2×10⁻⁵, 5 epochs, batch size 16–32, and single NVIDIA A100 GPUs. No augmentation or sampling is applied; label distributions remain naturally imbalanced (Naranbat et al., 13 Oct 2025).
Alternative modeling uses LLMs (e.g., Llama 3 8B, Mistral 8B, GPT-4o-mini, Claude) with zero-shot and few-shot prompt engineering and parameter-efficient fine-tuning (PEFT). Baseline scoring employs micro-F1, precision, recall, ROC/PR AUC, and the Balanced Error Rate.
Fine-tuned transformers routinely outperform LLMs on MFRC, with higher per-foundation F1s and much lower false negative rates. For example, DeBERTa-v3-base achieves F1 0.38–0.80 per foundation, versus GPT-4o-mini's F1 0.12–0.55 (by foundation) (Skorski et al., 24 Jul 2025). Multi-label prediction remains challenging for all models, amplifying the need for domain-specific fine-tuning.
5. Fairness, Generalization, and Diagnostic Metrics
MFRC is foundational for fairness-aware model evaluation under domain shift. The core diagnostic metrics used include:
- Demographic Parity Difference (DPD):
- Equalized Odds Difference (EOD):
- Moral Fairness Consistency (MFC):
Empirical results indicate pronounced asymmetry in generalization: transferring from Twitter to Reddit degrades micro-F1 by 14.9%, while Reddit to Twitter degrades by only 1.5% (Naranbat et al., 13 Oct 2025). Per-label cross-domain fairness disparities are greatest for Authority (ΔDP ≈ 0.22–0.23, ΔEO ≈ 0.40–0.41) and lowest for Loyalty and Fairness (ΔDP ≈ 0.03–0.05). MFC correlates perfectly negatively (ρ = –1.000, p < 0.001) with DPD, and remains statistically independent of F1, precision, and recall, establishing it as an orthogonal cross-domain stability metric.
| Label | ΔDP (95%CI) | ΔEO (95%CI) | MFC (DistilBERT) |
|---|---|---|---|
| authority | 0.22 (0.22–0.23) | 0.40 (0.39–0.41) | 0.7781 (0.7741–0.7822) |
| care | 0.04 (0.04–0.05) | 0.26 (0.25–0.28) | 0.9556 (0.9537–0.9576) |
| fairness | 0.05 (0.05–0.05) | 0.22 (0.21–0.23) | 0.9499 (0.9472–0.9524) |
| loyalty | 0.03 (0.03–0.03) | 0.20 (0.19–0.21) | 0.9666 (0.9647–0.9684) |
| non-moral | 0.08 (0.08–0.08) | 0.34 (0.33–0.36) | 0.9205 (0.9179–0.9234) |
The authority dimension exhibits both the lowest MFC and the greatest domain-specificity, largely due to the rarity and platform-specificity of authority cues in Reddit language (Naranbat et al., 13 Oct 2025).
6. Extensions, Subjective Annotation Variants, and Model Personalization
A smaller, exhaustively annotated subset (the Moral Foundations Subjective Corpus, MFSC) comprises 2,000 Reddit posts labeled by 24 undergraduates, yielding 48,000 annotation decisions. Each annotator assigns one of six moral foundations (Purity, Harm, Loyalty, Authority, Proportionality, Equality) or "non-moral," along with a three-level confidence score (Golazizian et al., 2024). Annotator-level features (e.g., Big Five personality survey data) are included. The MFSC supports annotation-budget optimization, active/annotator-adaptive modeling, and personalized prediction. No standard inter-annotator reliability coefficients are published for this variant; performance disparity is instead managed through item and annotator disagreement metrics.
Known limitations include the demographic homogeneity of annotators, lack of thread or conversational context, class imbalance (certain foundations underrepresented), and English-language–only content. These factors must be considered in downstream modeling and generalization studies.
7. Applications, Implications, and Recommendations for Use
MFRC is a principal benchmark for multi-label moral sentiment analysis, model fairness diagnostics, subjective annotation research, and AI alignment evaluation. It is suited for:
- Supervised model training for moral foundation detection at both aggregate and individualized annotation levels (Naranbat et al., 13 Oct 2025, Trager et al., 2022, Golazizian et al., 2024).
- Cross-domain transfer and generalization experiments, including domain adversarial and contrastive learning (Naranbat et al., 13 Oct 2025).
- Zero- and few-shot LLM moral reasoning evaluation, with transformer fine-tuning consistently outperforming large-scale prompting (Skorski et al., 24 Jul 2025).
- Fairness analysis under domain shift, guided by MFC, ΔDP, and ΔEO metrics.
- Studies of bias propagation and demographic/psychological correlates of annotation via attached metadata (Trager et al., 2022).
Best practices include computing per-label ΔDP/ΔEO, tracking MFC to diagnose cross-domain gaps, using an 80/10/10 split for in-domain work, and fixing random seeds/bootstrapping 95% confidence intervals (n=1000) for all metrics. Open code, splits, and checkpoints are recommended for reproducibility (Naranbat et al., 13 Oct 2025).
In aggregate, the MFRC offers a high-quality, richly annotated resource for empirical study of moral sentiment, domain-sensitive fairness evaluation, and the development of equitable, generalizable moral reasoning models in NLP.