Papers
Topics
Authors
Recent
Search
2000 character limit reached

Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts

Published 5 Dec 2024 in cs.LG and cs.CL | (2412.04614v3)

Abstract: Pretrained LMs can generalize to implications of facts that they are finetuned on. For example, if finetuned on John Doe lives in Tokyo," LMs can correctly answerWhat language do the people in John Doe's city speak?'' with ``Japanese''. However, little is known about the mechanisms that enable this generalization or how they are learned during pretraining. We introduce extractive structures as a framework for describing how components in LMs (e.g., MLPs or attention heads) coordinate to enable this generalization. The structures consist of informative components that store training facts as weight changes, and upstream and downstream extractive components that query and process the stored information to produce the correct implication. We hypothesize that extractive structures are learned during pretraining when encountering implications of previously known facts. This yields two predictions: a data ordering effect where extractive structures can be learned only if facts precede their implications, and a weight grafting effect where extractive structures can be transferred to predict counterfactual implications. We empirically demonstrate these phenomena in the OLMo-7b, Llama 3-8b, Gemma 2-9b, and Qwen 2-7b models. Of independent interest, our results also indicate that fact learning can occur at both early and late layers, which lead to different forms of generalization.

Summary

  • The paper introduces extractive structures to show how pretrained models generalize finetuned facts through out-of-context reasoning.
  • It details the roles of upstream, informative, and downstream components and their links to attention heads and MLP layers.
  • Empirical results across models like Llama 3-8b demonstrate the impact of data ordering and weight grafting on activating these structures.

Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts

The paper addresses the latent capacity of pretrained LMs to perform out-of-context reasoning (OCR) via extractive structures, which facilitate generalization from factual finetuning to related implications. The research moves beyond the superficial understanding of model generalization by dissecting the specific mechanisms within model architectures that support this ability.

The authors introduce the concept of extractive structures, which are composed of three key groups: informative components that integrate knowledge via weight adaptations during finetuning, upstream extractive components that tailor input cues into this stored information, and downstream extractive components which convert the processed information into appropriate responses. This structuring allows LMs, when trained on a fact such as "John Doe lives in Tokyo," to later infer responses to questions like "What language do the people in John Doe’s city speak?" with "Japanese."

Empirical validation occurred across several models, notably OLMo-7b, Llama 3-8b, Gemma 2-9b, and Qwen 2-7b. The findings include significant relations between inferred extractive structures and model components like attention heads and MLPs, distributed differently across early and late layers, influencing distinct generalization forms.

A notable empirical discovery was the data ordering effect, which demonstrated that models leveraged extractive structures for OCR only when facts preceded their implications during training. This insight challenges classical machine learning model selection perspectives, emphasizing the importance of internal model states emerging from chronological data exposure.

Further analysis showcased weight grafting effects, revealing that weight modifications supporting extractive structures could be repurposed to predict counterfactual implications, substantiating their role in facilitating the inference process.

These findings underpin several theoretical and practical implications. Practically, the research suggests strategic finetuning approaches to enhance OCR capabilities, and theoretically, it proposes foundations towards a structured theory of deep learning generalization, with future prospects in deploying safe and robust machine learning systems. Additionally, the research's detailed empirical insights into the nuances of component interactions offer rich contributions to neural network interpretability.

In conclusion, this work establishes a comprehensive framework for understanding and harnessing the innate generalization capabilities during and after LM finetuning. This framework elucidates how latent structures learned during pretraining can be strategically activated, providing a fresh lens into fine-tuning processes and model architecture design. Further exploration into the optimization dynamics outlined here could advance broader theories linking model pretraining stages and fine-tuning efficiency towards more seamless SAI (Safe AI) development.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 154 likes about this paper.