Injecting structural hints: Using language models to study inductive biases in language learning

Published 25 Apr 2023 in cs.CL | (2304.13060v2)

Abstract: Both humans and LLMs are able to learn language without explicit structural supervision. What inductive biases make this learning possible? We address this fundamental cognitive question by leveraging transformer LLMs: we inject inductive bias into LLMs by pretraining on formally-structured data, and then evaluate the biased learners' ability to learn typologically-diverse natural languages. Our experimental setup creates a testbed for hypotheses about inductive bias in human language learning. We investigate the effect of injecting models with three types of inductive bias: 1) recursive, hierarchical processing, 2) crossing token-token relationships that can't be modeled by context-free grammars, and 3) a Zipfian power-law vocabulary distribution. We show that non-context-free relationships form the best inductive biases. Our study leverages the capabilities of transformer models to run controlled language learning experiments that are not possible to run on humans, and surfaces hypotheses about the structures that facilitate language learning in both humans and machines.