Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts

Published 5 Dec 2024 in cs.LG and cs.CL | (2412.04614v3)

Abstract: Pretrained LMs can generalize to implications of facts that they are finetuned on. For example, if finetuned on John Doe lives in Tokyo," LMs can correctly answerWhat language do the people in John Doe's city speak?'' with ``Japanese''. However, little is known about the mechanisms that enable this generalization or how they are learned during pretraining. We introduce extractive structures as a framework for describing how components in LMs (e.g., MLPs or attention heads) coordinate to enable this generalization. The structures consist of informative components that store training facts as weight changes, and upstream and downstream extractive components that query and process the stored information to produce the correct implication. We hypothesize that extractive structures are learned during pretraining when encountering implications of previously known facts. This yields two predictions: a data ordering effect where extractive structures can be learned only if facts precede their implications, and a weight grafting effect where extractive structures can be transferred to predict counterfactual implications. We empirically demonstrate these phenomena in the OLMo-7b, Llama 3-8b, Gemma 2-9b, and Qwen 2-7b models. Of independent interest, our results also indicate that fact learning can occur at both early and late layers, which lead to different forms of generalization.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces extractive structures to show how pretrained models generalize finetuned facts through out-of-context reasoning.
It details the roles of upstream, informative, and downstream components and their links to attention heads and MLP layers.
Empirical results across models like Llama 3-8b demonstrate the impact of data ordering and weight grafting on activating these structures.

Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts

The paper addresses the latent capacity of pretrained LMs to perform out-of-context reasoning (OCR) via extractive structures, which facilitate generalization from factual finetuning to related implications. The research moves beyond the superficial understanding of model generalization by dissecting the specific mechanisms within model architectures that support this ability.

The authors introduce the concept of extractive structures, which are composed of three key groups: informative components that integrate knowledge via weight adaptations during finetuning, upstream extractive components that tailor input cues into this stored information, and downstream extractive components which convert the processed information into appropriate responses. This structuring allows LMs, when trained on a fact such as "John Doe lives in Tokyo," to later infer responses to questions like "What language do the people in John Doe’s city speak?" with "Japanese."

Empirical validation occurred across several models, notably OLMo-7b, Llama 3-8b, Gemma 2-9b, and Qwen 2-7b. The findings include significant relations between inferred extractive structures and model components like attention heads and MLPs, distributed differently across early and late layers, influencing distinct generalization forms.

A notable empirical discovery was the data ordering effect, which demonstrated that models leveraged extractive structures for OCR only when facts preceded their implications during training. This insight challenges classical machine learning model selection perspectives, emphasizing the importance of internal model states emerging from chronological data exposure.

Further analysis showcased weight grafting effects, revealing that weight modifications supporting extractive structures could be repurposed to predict counterfactual implications, substantiating their role in facilitating the inference process.

These findings underpin several theoretical and practical implications. Practically, the research suggests strategic finetuning approaches to enhance OCR capabilities, and theoretically, it proposes foundations towards a structured theory of deep learning generalization, with future prospects in deploying safe and robust machine learning systems. Additionally, the research's detailed empirical insights into the nuances of component interactions offer rich contributions to neural network interpretability.

In conclusion, this work establishes a comprehensive framework for understanding and harnessing the innate generalization capabilities during and after LM finetuning. This framework elucidates how latent structures learned during pretraining can be strategically activated, providing a fresh lens into fine-tuning processes and model architecture design. Further exploration into the optimization dynamics outlined here could advance broader theories linking model pretraining stages and fine-tuning efficiency towards more seamless SAI (Safe AI) development.

Markdown Report Issue