Dependency-based Mixture Language Models

Published 19 Mar 2022 in cs.CL | (2203.10256v1)

Abstract: Various models have been proposed to incorporate knowledge of syntactic structures into neural LLMs. However, previous works have relied heavily on elaborate components for a specific LLM, usually recurrent neural network (RNN), which makes themselves unwieldy in practice to fit into other neural LLMs, such as Transformer and GPT-2. In this paper, we introduce the Dependency-based Mixture LLMs. In detail, we first train neural LLMs with a novel dependency modeling objective to learn the probability distribution of future dependent tokens given context. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention. Extensive experiments and human evaluations show that our method can be easily and effectively applied to different neural LLMs while improving neural text generation on various tasks.