Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers

Published 14 May 2025 in cs.LG, cs.AI, and cs.CL | (2505.09855v1)

Abstract: Transformer models learn in two distinct modes: in-weights learning (IWL), encoding knowledge into model weights, and in-context learning (ICL), adapting flexibly to context without weight modification. To better understand the interplay between these learning modes, we draw inspiration from evolutionary biology's analogous adaptive strategies: genetic encoding (akin to IWL, adapting over generations and fixed within an individual's lifetime) and phenotypic plasticity (akin to ICL, enabling flexible behavioral responses to environmental cues). In evolutionary biology, environmental predictability dictates the balance between these strategies: stability favors genetic encoding, while reliable predictive cues promote phenotypic plasticity. We experimentally operationalize these dimensions of predictability and systematically investigate their influence on the ICL/IWL balance in Transformers. Using regression and classification tasks, we show that high environmental stability decisively favors IWL, as predicted, with a sharp transition at maximal stability. Conversely, high cue reliability enhances ICL efficacy, particularly when stability is low. Furthermore, learning dynamics reveal task-contingent temporal evolution: while a canonical ICL-to-IWL shift occurs in some settings (e.g., classification with many classes), we demonstrate that scenarios with easier IWL (e.g., fewer classes) or slower ICL acquisition (e.g., regression) can exhibit an initial IWL phase later yielding to ICL dominance. These findings support a relative-cost hypothesis for explaining these learning mode transitions, establishing predictability as a critical factor governing adaptive strategies in Transformers, and offering novel insights for understanding ICL and guiding training methodologies.

Abstract PDF Upgrade to Chat

Summary

Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers

The paper "Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers" presents a nuanced exploration of learning strategies in Transformer models, drawing parallels with biological adaptation mechanisms. Transformers exhibit two primary modes of learning: in-context learning (ICL), where models adjust behavior to inputs without weight updates, and in-weights learning (IWL), representing gradual knowledge accumulation during training. The authors propose that, akin to evolutionary biology where predictability governs adaptive strategies, task predictability similarly influences the ICL/IWL balance in Transformers.

Key Findings

The research operationalizes dimensions of predictability—environmental stability and cue reliability—and investigates their impact on learning in Transformers via systematically designed tasks. Two experimental tasks were used: sinusoid regression and Omniglot few-shot classification, each embedding different predictability conditions.

Environmental Stability and IWL Dominance: The study reveals that high environmental stability favors IWL. This phenomenon is akin to genetic encoding in stable biological environments, where fixed traits are advantageous due to predictable conditions.
Cue Reliability and ICL Efficacy: Conversely, high cue reliability enhances ICL effectiveness, especially when environmental stability is low. This mirrors phenotypic plasticity's dependence on reliable environmental signals for adaptive responses.
Dynamic Learning Trajectories: The paper addresses the temporal dynamics of ICL and IWL preference. In some settings, a canonical ICL-to-IWL shift occurs, notably in tasks with stable conditions, demonstrating a transience reminiscent of the genetic assimilation process where plastic responses eventually become fixed.
Relative-Cost Hypothesis: The authors propose that the computational cost of acquiring ICL versus IWL solutions largely dictates dynamic preferences. Simpler strategies may be initially favored, with more complex strategies developing as the training progresses, paralleling biological evolution which initially prioritizes viability before allowing for later refinement.

Implications

These findings have significant implications for both practical AI applications and the theoretical understanding of learning dynamics in neural networks. The insights can guide the development of training methodologies, optimizing the balance between flexibility and stability in Transformers.

Practical Applications: By adjusting task predictability, developers can steer Transformers towards desired flexibility or stability, optimizing performance for specific applications like language modeling, where context adaptation vs. weight encoding could drastically affect output quality and efficiency.
Theoretical Insights: The evolutionary framework provides a compelling lens through which to understand and predict learning dynamics in artificial systems, suggesting that computational cost and environmental predictability could serve as fundamental principles shaping model behavior.

Future Work

Future research could extend these findings by exploring more complex real-world tasks and diversifying the types of predictability manipulated. Further investigation into how these principles apply across different Transformer architectures and configurations could yield broader applications and deeper theoretical insights. Additionally, exploring the implications of the relative-cost hypothesis in different computational paradigms may enrich understanding of strategy shifts.

In summary, the paper provides a robust analysis linking evolutionary biology and Transformer learning modes, offering valuable insights into the governances of adaptive strategies in AI systems.