KG^2: Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings

Published 31 May 2018 in cs.LG, cs.AI, cs.CL, and stat.ML | (1805.12393v1)

Abstract: The AI2 Reasoning Challenge (ARC), a new benchmark dataset for question answering (QA) has been recently released. ARC only contains natural science questions authored for human exams, which are hard to answer and require advanced logic reasoning. On the ARC Challenge Set, existing state-of-the-art QA systems fail to significantly outperform random baseline, reflecting the difficult nature of this task. In this paper, we propose a novel framework for answering science exam questions, which mimics human solving process in an open-book exam. To address the reasoning challenge, we construct contextual knowledge graphs respectively for the question itself and supporting sentences. Our model learns to reason with neural embeddings of both knowledge graphs. Experiments on the ARC Challenge Set show that our model outperforms the previous state-of-the-art QA systems.

Abstract PDF Upgrade to Chat

Citations (28)

View on Semantic Scholar

Summary

The paper presents KG², a framework that transforms exam questions and supporting sentences into contextual knowledge graphs for relational reasoning.
KG² constructs pairwise graphs from hypotheses and evidence, learning neural embeddings that capture intricate relational patterns.
Experiments show KG² improves ARC Challenge scores from 26.97 to 31.70, highlighting its effectiveness in enhancing QA system performance.

Introduction

The paper "KG^2: Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings" (1805.12393) addresses the formidable challenge of constructing QA systems capable of answering complex science exam questions presented in the AI2 Reasoning Challenge (ARC). Current QA systems excel in datasets requiring only surface-level reasoning, but falter with questions demanding advanced logical inference as demonstrated by their performance barely exceeding random baselines on the ARC Challenge Set. To overcome these hurdles, the authors propose KG^2, a sophisticated framework designed to mimic human reasoning processes by leveraging contextual knowledge graphs.

Methodology

At the core of KG² is the construction of pairwise contextual knowledge graphs for each science question and its corresponding supporting sentences. The hypothesis—composed of the question stem and answer option—is articulated as a graph, serving as the focal point for subsequent reasoning. Likewise, supporting sentences from a large science-related corpus are transformed into graphs. KG^2’s reasoning relies on neural embeddings of these graphs, facilitating inference drawn from learned relational patterns.

The process involves several distinct phases:

Hypothesis Generation: A hypothesis is generated by synthesizing the semantic content of the question stem with each potential answer. This serves as the query to search the corpus for supporting evidence.
Support Retrieval: Potentially relevant sentences are retrieved from the corpus based on the generated hypothesis. Sentences with negation or unexpected characters are filtered to enhance coherence.
Knowledge Graph Construction: Relation triples extracted via Open IE are aggregated to form contextual knowledge graphs. These graphs capture the predicate relationships essential for inferential reasoning.
Graph Embedding Learning: To rank candidate answers, embeddings are learned for the hypothesis-supporting graph pairs, using a neural network approach. The scoring function assesses relational consistency between pairs.

Experimental Results

Experiments demonstrate that KG² significantly outperforms existing baseline models on the ARC Challenge Set. In particular, KG² achieves a test score of 31.70, a substantial improvement over the previous best result of 26.97. This indicates effective integration of relational reasoning and contextual embedding techniques.

The study highlights an estimated upper bound of 36.25, denoting the score achievable if all learnable questions were answered correctly. Analysis reveals that limitations stem from a lack of supporting evidence in the corpus and the inefficacy of current extraction methods for complex reasoning questions.

Implications and Future Directions

KG² offers compelling insights into enhancing QA systems through relational reasoning and knowledge graph embeddings. The ability to transform textual support into an embedded graph format for reasoning parallels cognitive synthesis, a hallmark of human exam-taking processes. Practical applications extend to educational tools and automated tutoring systems, where nuanced understanding and inference are paramount.

Future research will likely focus on harnessing external knowledge sources to enrich support for complex reasoning questions. Additionally, refining IE techniques for more comprehensive parsing and exploration of complex graph structures remains a promising endeavor. With ongoing advancements, AI systems could increasingly emulate human-like reasoning, marking progress toward the ultimate goal of robust, generalizable AI.

Conclusion

The authors of "KG^2: Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings" have introduced a robust framework that advances QA capabilities through relational graphs and neural embeddings. The promising results on the ARC Challenge Set not only set a new benchmark but also pave the way for future exploration in the domain of AI-based reasoning, highlighting key areas for ongoing research and technological refinement.