Media Bias/Fact Check (MBFC)

Updated 26 January 2026

Media Bias/Fact Check (MBFC) is a volunteer-run system that systematically profiles media outlets’ political bias and factual reliability using ordinal labels and free-text background descriptions.
Its structured labels, based on expert methodologies, enable reproducible research in automated bias detection and credibility assessment across thousands of outlets.
MBFC underpins various computational approaches by providing benchmark datasets for evaluating NLP and machine learning models while highlighting challenges like temporal drift and limited coverage.

Media Bias/Fact Check (MBFC) is a volunteer-run comparative rating system for profiling the political bias and factual reliability of news outlets. MBFC performs systematic, source-level annotation of media organizations along ordinal bias and factuality axes, furnishing scores and qualitative background reports that serve as de facto ground truth for evaluating automated reliability and bias classification systems, and as input labels for large-scale studies of misinformation, source trustworthiness, and media diversity. MBFC’s detailed textual reports (background checks), curated label datasets, and methodological precedents underpin a substantial body of automated credibility and bias detection research, and the MBFC labeling schema remains a standard for source-level media profiling in both manual and automated pipelines.

1. MBFC Dataset Structure and Annotation Protocol

MBFC provides for each profiled outlet both free-text background descriptions and structured ordinal labels. Detailed “background check” pages are the principal artifact, each comprising approximately 17 itemized lines (mean), with a typical length of 303 tokens. These reports enumerate key signals about an outlet, including founding/ownership (e.g., “Founded in 2005 by Mike Adams”), funding and revenue model (government, ad-driven, subscription, or donation-based), editorial scope, documented ideological or partisan stances, incidence of failed fact-checks/retractions, and multi-hop control structures such as umbrella ownership networks. Derived from historiographic and fact-checking best practices (e.g., Howell & Prevenier, International Fact-Checking Network), MBFC’s MBCs serve as operationalizations of source criticism for automated systems (Schlichtkrull, 2024).

MBFC’s structured labels appear in two main forms:

Political bias: encoded on a 7-point ordinal scale, from “far-left” (–3) to “far-right” (+3), with intermediates for “left,” “center-left,” “no bias,” “center-right,” “right.”
Factuality/credibility: assigned as an integer 0–5, mapping to “very low,” “low,” “mixed,” “mostly factual,” “high factuality,” and “very high factuality.”

As used in NELA-GT-2020, MBFC labels enforce three derived classes for research use: “unreliable” (factuality ≤2), “mixed” (=3), “reliable” (≥4), with an additional conspiracy flag for sources in MBFC’s curated conspiracy or pseudoscience lists (Gruppi et al., 2021).

2. MBFC as Evaluation Benchmark and Data Source

MBFC’s expert-assigned bias and factuality labels constitute the reference standard in computational media profiling research:

Dataset scale: ~6,700 detailed source-level reports, with structured labels attached to 1,000–4,000 outlets depending on the version and label granularity (Schlichtkrull, 2024, Gruppi et al., 2021, Mujahid et al., 14 Jun 2025).
Aggregation and workflow: Automated pipelines download, normalize, and join MBFC’s data against scraped source lists by outlet name, after lowercasing and removing punctuation or formatting (to maximize matching accuracy). Reliability labels and scores are stored in both flat files (CSV, JSON) and database tables for downstream reproducibility (Gruppi et al., 2021).
Label coverage and class statistics: In NELA-GT-2020, of 519 sources, 31.2% were “unreliable,” 38.3% “mixed,” and 30.4% “reliable,” with the full MBFC coverage extending to thousands of U.S. and non-U.S. outlets (Gruppi et al., 2021, Mujahid et al., 14 Jun 2025).

MBFC remains the ground truth for numerous bias detection models, source credibility studies, and fact-checking system evaluations (Baly et al., 2018, Mujahid et al., 14 Jun 2025, Hernandes et al., 2024).

3. Methodological Foundations and Task Formalization

MBFC labels and background checks are both inputs and targets for NLP and information-retrieval systems designed to automate source critical reasoning:

Task definition: Given a source identifier (e.g., domain name) and optional retrieved evidence, generate an itemized summary of trustworthiness and bias signals (an MBC). The generative objective for an LLM-based model is

$\hat{y} = \arg\max_{y} \mathrm{Score}(y \mid x),$

where $x$ is the source and contextual evidence, and $\mathrm{Score}$ is the LLM-derived generative probability (Schlichtkrull, 2024).

Itemization protocol: 42 “atomic-fact” templates distill key claim types (ownership, funding, editorial leaning, failed fact-checks, etc.), enabling automatic fact recall and entailment-based evaluation (Schlichtkrull, 2024).
Multi-dimensional signal representation: Each gold MBC is decomposed into instantiated templates; these serve as the fine-grained unit for supervised metric computation, supporting metrics such as fact recall ( $\mathrm{Recall} = \frac{|G \cap P|}{|G|}$ , with $G$ the set of gold atomic facts and $P$ those entailed by a generated MBC).

MBFC’s explicit separation of structured and free-text evidence, and its focus on both bias and factuality (rather than claims alone), provide a comprehensive basis for system design and evaluation.

4. Computational Approaches Leveraging MBFC

MBFC has directly enabled the development and assessment of a range of algorithmic frameworks:

Feature-based SVMs and multi-task models: Early systems concatenate article-based linguistic markers, Wikipedia profile embeddings, Twitter metadata, URL features, and traffic signals (total $\approx$ 1,950 dimensions) for SVM or ordinal regression. Ablation shows Wikipedia and article-body features are most critical for factuality and bias, respectively (Baly et al., 2018).
LLM Prompt Aggregation: Sophisticated pipelines prompt LLMs with hand-crafted and systematic (per-MBFC-criterion) queries on stance, failed fact-checks, and topic-specific leanings; aggregator models (e.g., SVMs or fine-tuned transformers on concatenated response text) achieve state-of-the-art source-level bias and factuality classification (accuracy $>$ 90% and MAE $<$ 0.1 for three-way bias) using MBFC training/test splits (Mujahid et al., 14 Jun 2025).
Network and hyperlink-based RL: Graph-based propagators use longitudinal hyperlink graphs to diffuse factuality and bias labels across the media web, using MBFC as training supervision. Iterative investment-style propagation achieves macro-F1 $\sim$ 0.88 for factuality and 0.78 for political bias, outperforming content-only baselines (Sánchez-Cortés et al., 2024).
Evaluation of LLMs as raters: Zero-shot LLMs (e.g., GPT-4) exhibit high correlation with MBFC labels ( $x$ 0=0.89, $x$ 1=5,877, $x$ 2) but with a bias toward polarized labeling and a strong tendency to abstain on low-popularity and centrist outlets (Hernandes et al., 2024).
Fine-grained and real-time analysis: Systems such as IndiTag (Lin et al., 2024) and Media Bias Detector (Haider et al., 30 Sep 2025, Wang et al., 9 Feb 2025) extend MBFC’s source-level paradigm to article- and sentence-level automated bias and factuality attributions, but often retain MBFC as the ground-truth calibration anchor and reference label set.

MBFC’s datamodel is thus central not only as ground truth but as the methodological template for automated bias and credibility pipelines.

5. Evaluation, Human Perception, and Systemic Impact

MBFC-labeled datasets are the basis for both automatic and human-centered assessments of bias and reliability interventions in information retrieval and social media:

Human evaluation protocols: Controlled studies demonstrate that supplementing users or AI assistants with MBCs (generated from MBFC-style templates) significantly decreases “difficulty of establishing trust” (p=0.004), increases answer preference rate (answers with background checks preferred in 57% of cases, $x$ 3), and reduces perceived misleadingness (Schlichtkrull, 2024).
Perceived neutrality in external audiences: Survey evidence (N=655 US adults) indicates that MBFC-style “fact-checking organization” labels (e.g., Poynter.org) are among the only entity types with no significant trust or bias differential by political affiliation; news organization and government labels are frequently treated as partisan (Habib et al., 2024). MBFC’s positioning as a nominally neutral, methodologically explicit rater is thus particularly salient for intervention design.
Influence on misinformation sharing: The mere presence of a (MBFC-style) fact-check label discourages sharing of questionable headlines regardless of the perceived bias of the labeling entity (Habib et al., 2024). However, choice of label source and clarity of presentation remain critical for maximizing trust and effectiveness.

This evidence underscores MBFC’s role as a reference entity for neutral, trusted source annotations, while highlighting the limitations and downstream effects of how such information is operationalized.

6. Limitations, Coverage, and Open Challenges

MBFC’s workflow and its adoption in both manual and automated systems expose fundamental technical, operational, and epistemological boundaries:

Partial universe: MBFC manually covers approximately 7,000 media outlets, a fraction of the global digital news domain space. Less mainstream, foreign-language, or emergent outlets are underrepresented; real-time coverage lag is inherent (Schlichtkrull, 2024, Mujahid et al., 14 Jun 2025, Haider et al., 30 Sep 2025).
Static snapshot vs. temporal drift: MBFC labels at time $x$ 4 may degrade as outlets change orientation, ownership, or editorial standards. Automated systems, including those trained and validated against MBFC data, inherit this lag unless explicit retraining/updating or dynamic propagation (e.g., hyperlink-based) is implemented (Sánchez-Cortés et al., 2024, Nakov et al., 2021).
U.S.-centrism and monolinguality: Empirical and error analyses reveal that MBFC-based models have lower accuracy on non-U.S. outlets, both due to the underlying label distribution and to the U.S.-centric training bias in both expert and LLM scorers (Mujahid et al., 14 Jun 2025).
Inherent epistemic regress: MBCs and label attributions can themselves be of uncertain reliability, requiring “background checks on the checker” (infinite regress, unless a ground layer of trust is posited) (Schlichtkrull, 2024).
Evaluation blind spots: Current atomic-fact and entailment metrics fail to fully capture complex, multi-hop factual relations (e.g., networked ownership), and may miss subtle forms of agenda-setting and framing bias (Schlichtkrull, 2024, Haider et al., 30 Sep 2025).
Ethical caution: MBFC’s determinations and auto-generated MBCs can encode unexamined or systemic bias; neither are substitutes for direct investigative or forensic fact-checking, especially in high-stakes contexts (Schlichtkrull, 2024, Habib et al., 2024).

These constraints are the subject of active methodological work, including dynamic pipeline development, hybrid human–AI workflows, and research into cross-lingual reliability transfer.

7. Significance and Research Integration

MBFC’s background checks, labeling conventions, and expert protocols inform not only technical systems but social and behavioral research on misinformation:

Label calibration and scale: MBFC guides coverage in datasets such as NELA-GT-2020 and in evaluation splits for major supervised learning corpora, enabling reproducible stratification on reliability and bias (Gruppi et al., 2021).
Theoretical framework: MBFC’s annotation reflects broader source criticism traditions in journalism and historiography, forming a bridge from human expert assessment to automated, explainable AI pipelines (Schlichtkrull, 2024, Mujahid et al., 14 Jun 2025, Lin et al., 2024).
Design recommendations: Research recommends MBFC and similar entities emphasize multi-source consensus badges, transparent micro-references, and user feedback loops to counteract perceived bias and maximize end-user trust (Habib et al., 2024).

MBFC remains the dominant operationalization of source-level reliability and bias, forming a backbone for scalable analysis of news legitimacy and trust—both as static ground truth and as a continually referenced scheme in methodological innovation.