TabFact: A Large-scale Dataset for Table-based Fact Verification

Published 5 Sep 2019 in cs.CL and cs.AI | (1909.02164v5)

Abstract: The problem of verifying whether a textual hypothesis holds based on the given evidence, also known as fact verification, plays an important role in the study of natural language understanding and semantic representation. However, existing studies are mainly restricted to dealing with unstructured evidence (e.g., natural language sentences and documents, news, etc), while verification under structured evidence, such as tables, graphs, and databases, remains under-explored. This paper specifically aims to study the fact verification given semi-structured data as evidence. To this end, we construct a large-scale dataset called TabFact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED. TabFact is challenging since it involves both soft linguistic reasoning and hard symbolic reasoning. To address these reasoning challenges, we design two different models: Table-BERT and Latent Program Algorithm (LPA). Table-BERT leverages the state-of-the-art pre-trained LLM to encode the linearized tables and statements into continuous vectors for verification. LPA parses statements into programs and executes them against the tables to obtain the returned binary value for verification. Both methods achieve similar accuracy but still lag far behind human performance. We also perform a comprehensive analysis to demonstrate great future opportunities. The data and code of the dataset are provided in \url{https://github.com/wenhuchen/Table-Fact-Checking}.

Abstract PDF Upgrade to Chat

Citations (424)

View on Semantic Scholar

Summary

The paper introduces TabFact, a dataset of 16K Wikipedia tables and 118K human-annotated statements that challenges models with both linguistic and symbolic reasoning.
It compares two approaches—Table-BERT for natural language inference and the Latent Program Algorithm for symbolic reasoning—highlighting each method's strengths and limitations.
The dataset incorporates rigorous quality controls through crowd-sourced annotations and methods like positive two-channel annotation and negative statement rewriting to reduce bias.

TabFact: A Large-scale Dataset for Table-based Fact Verification

The paper presents "TabFact," a significant advancement in the domain of table-based fact verification, a key aspect of natural language understanding. Traditionally, fact verification has predominantly focused on unstructured text data. This research extends the exploration to semi-structured datasets, specifically using tables as evidence. The study introduces TabFact, a robust dataset constructed from approximately 16,000 Wikipedia tables and 118,000 human-annotated statements classified as either ENTAILED or REFUTED. This dataset challenges models to leverage both linguistic and symbolic reasoning, highlighting the complexity and nuanced understanding required for table-based verification tasks.

The authors developed two distinct approaches to address these challenges: Table-BERT and the Latent Program Algorithm (LPA). Table-BERT capitalizes on state-of-the-art pre-trained LLMs to transform tables and statements into linear sequences, allowing the model to process them similarly to text-based tasks. Despite this innovative approach, Table-BERT primarily exhibits strength in linguistic reasoning while potentially lacking in symbolic inference capabilities.

Conversely, LPA employs a more structured methodology by parsing statements into executable programs, which are then evaluated against the table data. This method excels in symbolic reasoning by using predefined operations (e.g., argmax, count) and provides enhanced interpretability through explicit logic execution. However, both systems, despite their methodological strengths, do not achieve human-level performance, underlining the complexity of this verification task.

The paper details the rigorous dataset creation process, including the use of crowd-sourced annotations and various data quality control measures. By employing mechanisms like "positive two-channel annotation" and "negative statement rewriting," the authors ensure a reduced occurrence of annotation biases that may affect the results. The comprehensive dataset statistics and inter-annotator agreement rates further affirm the quality and reliability of the dataset.

In examining the models' performance, the results indicate that while LPA achieves reasonable accuracy through program synthesis and ranking, Table-BERT's natural language inference capabilities offer advantages in linguistic reasoning portions of the task. Nonetheless, the performance disparity between models and human annotators signals significant room for advancement in this area.

The implications of this research are far-reaching, providing a new benchmark for evaluating AI systems capable of handling both linguistic and symbolic reasoning. Practically, this could enhance systems used in misinformation detection and information retrieval on structured data. Theoretically, it stimulates further exploration into hybrid models that integrate linguistic prowess with the precision of symbolic reasoning. Future developments could focus on improving entity-linking accuracy, expanding function libraries, and integrating more sophisticated reasoning capabilities.

In summary, the TabFact dataset and the accompanying models contribute substantially to the growing field of table-based fact verification, marking a critical step towards developing AI with advanced reasoning capabilities over structured data formats. This work sets the stage for future innovations that might bridge the gap between human and machine performance in complex reasoning tasks.

Markdown Report Issue