TRACE for Tracking the Emergence of Semantic Representations in Transformers

Published 23 May 2025 in cs.CL | (2505.17998v1)

Abstract: Modern transformer models exhibit phase transitions during training, distinct shifts from memorisation to abstraction, but the mechanisms underlying these transitions remain poorly understood. Prior work has often focused on endpoint representations or isolated signals like curvature or mutual information, typically in symbolic or arithmetic domains, overlooking the emergence of linguistic structure. We introduce TRACE (Tracking Representation Abstraction and Compositional Emergence), a diagnostic framework combining geometric, informational, and linguistic signals to detect phase transitions in Transformer-based LMs. TRACE leverages a frame-semantic data generation method, ABSynth, that produces annotated synthetic corpora with controllable complexity, lexical distributions, and structural entropy, while being fully annotated with linguistic categories, enabling precise analysis of abstraction emergence. Experiments reveal that (i) phase transitions align with clear intersections between curvature collapse and dimension stabilisation; (ii) these geometric shifts coincide with emerging syntactic and semantic accuracy; (iii) abstraction patterns persist across architectural variants, with components like feedforward networks affecting optimisation stability rather than fundamentally altering trajectories. This work advances our understanding of how linguistic abstractions emerge in LMs, offering insights into model interpretability, training efficiency, and compositional generalisation that could inform more principled approaches to LM development.

Abstract PDF Upgrade to Chat

Summary

The paper introduces TRACE, a framework that detects phase transitions where transformers shift from memorization to forming abstract semantic representations.
It employs geometric measures, like intrinsic dimensionality and curvature, alongside linguistic probes to analyze model training dynamics.
Findings reveal that transformer components, such as feed-forward networks and attention heads, play critical roles in stabilizing abstraction and syntactic alignment.

Understanding TRACE: A Framework for Semantic Representation Tracking

Introduction to TRACE

The paper "TRACE for Tracking the Emergence of Semantic Representations in Transformers" introduces TRACE, a diagnostic framework designed to analyze the emergence of abstraction in transformer models. This approach combines geometric, informational, and linguistic signals to detect phase transitions, which are crucial reorganization points in model training where transformers shift from memorizing input data to forming abstract representations. The study presents ABSynth, a novel synthetic corpus generation framework based on frame semantics, facilitating precise examination of how linguistic abstractions arise within transformer models.

Phase Transitions in Transformers

TRACE identifies a characteristic pattern of phase transitions during training. These transitions are marked by clear geometric changes, including a rise followed by stabilization in intrinsic dimensionality and spikes in loss curvature. Such shifts are synchronized with improvements in syntactic and semantic accuracy. Key observations include:

Dimensionality and Curvature Dynamics: As models train, intrinsic dimensionality initially increases as they accommodate entangled features, then stabilizes or decreases as abstraction phases emerge. Curvature dynamics show transient spikes, indicating phases of structural reorganization before the model settles into efficient generalization patterns.

Figure 1: Coordinated dynamics of Hessian Curvature Score (blue) and Average Intrinsic Dimension (red) across training steps for different model architectures.

Architectural Influence on Abstraction

The paper examines how architectural modifications affect abstraction dynamics. Ablations of key transformer components such as feed-forward networks and attention heads provide insights into their roles:

Feed-Forward Networks: Their removal results in increased curvature volatility and persistent oscillations in medium and small models, underscoring their role in smoothing optimization and supporting stable abstract representation development.
Attention Heads: Reducing attention heads impacts models differently; smaller models experience delayed phase transitions and lower representational complexity, whereas larger models maintain abstraction capacity with minor stability trade-offs.

Linguistic Emerging Patterns

Linguistic alignment with geometric shifts reveals how semantic and syntactic categories emerge:

Probe Analysis: Probes applied to hidden states demonstrate evolving semantic role and part-of-speech tag alignments. The transition from memorization to abstraction is highlighted through layer-specific confidence score fluctuations.
Figure 2: Probe confidence scores across training steps for the large model, showing linguistic tags alignment.

Synthetic Corpus and Mutual Information Challenges

ABSynth provides the synthetic corpus used to avoid natural data confounds, enabling the precise tracking of linguistic abstractions via controlled complexity and transparent annotation.

Mutual Information Instability: Although theoretically insightful, mutual information failed to consistently mark phase transitions due to its volatility and lack of alignment with other measurable signals.
Figure 3: Overview of the TRACE framework integrating various monitoring aspects, including intrinsic dimensionality, spectral curvature, and linguistic alignment.

Conclusion and Implications

TRACE advances understanding of abstraction emergence in LLMs, offering a principled approach to diagnosing phase transitions. The insights gained can inform the design of more interpretive, efficient, and generalizable LLMs, potentially transforming LM development strategies. Future exploration may involve extending TRACE to real-world data scenarios and integrating it with mechanistic interpretability tools.

The presented framework provides a robust basis for future analysis of LLMs, particularly in contexts requiring detailed interpretability and phase transition detection for enhanced training efficiency.

Markdown Report Issue