Learning Better Sentence Representation with Syntax Information

Published 9 Jan 2021 in cs.CL | (2101.03343v1)

Abstract: Sentence semantic understanding is a key topic in the field of natural language processing. Recently, contextualized word representations derived from pre-trained LLMs such as ELMO and BERT have shown significant improvements for a wide range of semantic tasks, e.g. question answering, text classification and sentiment analysis. However, how to add external knowledge to further improve the semantic modeling capability of model is worth probing. In this paper, we propose a novel approach to combining syntax information with a pre-trained LLM. In order to evaluate the effect of the pre-training model, first, we introduce RNN-based and Transformer-based pre-trained LLMs; secondly, to better integrate external knowledge, such as syntactic information integrate with the pre-training model, we propose a dependency syntax expansion (DSE) model. For evaluation, we have selected two subtasks: sentence completion task and biological relation extraction task. The experimental results show that our model achieves 91.2\% accuracy, outperforming the baseline model by 37.8\% on sentence completion task. And it also gets competitive performance by 75.1\% $F_{1}$ score on relation extraction task.