Unsupervised Multi-hop Question Answering by Question Generation

Published 23 Oct 2020 in cs.CL and cs.AI | (2010.12623v2)

Abstract: Obtaining training data for multi-hop question answering (QA) is time-consuming and resource-intensive. We explore the possibility to train a well-performed multi-hop QA model without referencing any human-labeled multi-hop question-answer pairs, i.e., unsupervised multi-hop QA. We propose MQA-QG, an unsupervised framework that can generate human-like multi-hop training data from both homogeneous and heterogeneous data sources. MQA-QG generates questions by first selecting/generating relevant information from each data source and then integrating the multiple information to form a multi-hop question. Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance for the HybridQA and the HotpotQA dataset, respectively. We also show that pretraining the QA system with the generated data would greatly reduce the demand for human-annotated training data. Our codes are publicly available at https://github.com/teacherpeterpan/Unsupervised-Multi-hop-QA.

Abstract PDF Upgrade to Chat

Citations (54)

View on Semantic Scholar

Summary

The paper introduces MQA-QG, an unsupervised framework that synthesizes multi-hop questions from heterogeneous data sources.
It achieves 83% and 61% of supervised performance on HotpotQA and HybridQA, and markedly improves few-shot learning with significant F1 gains.
The study underscores that synthetic data generation can reduce labeling costs and enable efficient QA systems in low-resource domains.

Unsupervised Multi-hop Question Answering by Question Generation

The paper introduces MQA-QG, a novel unsupervised framework designed to train multi-hop question answering models without the need for human-labeled multi-hop question-answer pairs. Recognizing the challenge of data scarcity, where annotating multi-hop QA datasets is resource-intensive owing to their complexity, the authors propose a method to generate training data automatically.

Framework Overview

MQA-QG operates over both homogeneous and heterogeneous data sources to synthetically create training datasets. The process involves a two-step approach: initially selecting or generating relevant information from different data sources, and subsequently integrating these pieces to structure coherent multi-hop questions. For training efficiency, MQA-QG employs a series of operators that handle tasks like entity selection (FindBridge), entity description (DescribeEnt), and question generation with specific conditions (QGwithAns and QGwithEnt). The synthesis of multi-hop questions is facilitated by operators such as BridgeBlend and CompBlend which blend single-hop questions into composite multi-hop forms.

Experimental Evaluation

The framework was evaluated on two distinct multi-hop QA datasets: HotpotQA, which involves text-only reasoning, and HybridQA, which combines both table and text data sources. The study demonstrates that using only generated data, MQA-QG achieves 61% and 83% of the fully supervised performance on HybridQA and HotpotQA respectively, indicating that synthetic data can effectively pretrain models to reduce reliance on human annotations.

Additionally, the framework is found to be beneficial in few-shot learning scenarios, significantly boosting model performance when only a handful of labeled samples are available. For example, combining MQA-QG pretraining with 50 labeled examples on the HotpotQA dataset raised the F1 score from 21.6 to 64.6, showing a substantial reduction in data requirements.

Implications and Future Research

The implications of MQA-QG are profound for the development of QA systems, especially in scenarios where data labeling is prohibitive. By assembling robust training datasets with minimal human intervention, this framework paves the way for deploying QA systems in low-resource domains or with new document types.

Future research could explore expanding the framework to incorporate additional modalities beyond text and tables, such as integrating visual data for richer reasoning tasks. Moreover, refining the question generation process to further enhance semantic coherence and naturalness of the generated questions could provide even greater alignment with human intuition, enhancing the utility of the synthetically generated datasets.

In summary, the research suggests a promising direction towards reducing the bottleneck of labeled data in multi-hop QA through automated, unsupervised methodologies.