Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Published 27 Sep 2025 in cs.LG, cs.AI, and cs.CL | (2509.23024v1)

Abstract: Standard training metrics like loss fail to explain the emergence of complex capabilities in LLMs. We take a spectral approach to investigate the geometry of learned representations across pretraining and post-training, measuring effective rank (RankMe) and eigenspectrum decay ($\alpha$-ReQ). With OLMo (1B-7B) and Pythia (160M-12B) models, we uncover a consistent non-monotonic sequence of three geometric phases during autoregressive pretraining. The initial "warmup" phase exhibits rapid representational collapse. This is followed by an "entropy-seeking" phase, where the manifold's dimensionality expands substantially, coinciding with peak n-gram memorization. Subsequently, a "compression-seeking" phase imposes anisotropic consolidation, selectively preserving variance along dominant eigendirections while contracting others, a transition marked with significant improvement in downstream task performance. We show these phases can emerge from a fundamental interplay of cross-entropy optimization under skewed token frequencies and representational bottlenecks ($d \ll |V|$). Post-training further transforms geometry: SFT and DPO drive "entropy-seeking" dynamics to integrate specific instructional or preferential data, improving in-distribution performance while degrading out-of-distribution robustness. Conversely, RLVR induces "compression-seeking", enhancing reward alignment but reducing generation diversity.

Summary

  • The paper identifies three universal geometric phases—Gray, Maroon, and BlueViolet—that correlate with evolving LLM capabilities.
  • It leverages spectral metrics, including RankMe and eigenspectrum decay, to quantify non-monotonic changes in representation geometry during pretraining and post-training.
  • The study shows that post-training techniques mirror pretraining dynamics, influencing model memorization, generalization, and alignment.

Spectral Phases in the Geometric Evolution of LLM Representations

Introduction

This paper presents a comprehensive spectral analysis of the geometric evolution of representations in LLMs throughout both pretraining and post-training stages. By leveraging spectral metrics—effective rank (RankMeRankMe) and eigenspectrum decay (α\alpha)—the authors reveal a consistent, non-monotonic sequence of three universal geometric phases in LLM training: Gray, Maroon, and BlueViolet. These phases are shown to be robust across model families (OLMo, Pythia, T\"ulu), scales (160M–12B parameters), and layers, and are tightly linked to the emergence of distinct model capabilities, including memorization and generalization. The work further demonstrates that post-training strategies (SFT, DPO, RLVR) induce mirrored geometric transformations, with practical implications for model alignment and exploration. Figure 1

Figure 1: Spectral framework reveals three universal phases in LLM training, characterized by distinct changes in representation geometry.

Spectral Metrics and Geometric Analysis

The analysis centers on the covariance matrix of last-token representations in autoregressive LLMs. Two complementary spectral metrics are employed:

  • Effective Rank (RankMeRankMe): Derived from the Von Neumann entropy of the covariance matrix, quantifies the utilized dimensionality of the representation manifold.
  • Eigenspectrum Decay (α\alpha): Measures the concentration of variance along principal axes, with slower decay indicating higher-dimensional, more isotropic representations.

These metrics provide a quantitative lens for tracking the expressive capacity and compression of LLM representations, moving beyond traditional loss curves which fail to capture qualitative shifts in model behavior.

Three-Phase Dynamics in Pretraining

Empirical analysis across OLMo and Pythia models reveals a non-monotonic evolution of representation geometry, consistently manifesting as three distinct phases:

  1. Gray Phase: Rapid collapse of representations onto dominant data manifold directions, coinciding with learning rate ramp-up. Outputs are repetitive and non-contextual.
  2. Maroon Phase: Manifold expansion in many directions, marked by increased RankMeRankMe and decreased α\alpha. This phase aligns with peak n-gram memorization, as measured by Spearman correlation with \infty-gram models.
  3. BlueViolet Phase: Anisotropic consolidation, with selective preservation of variance along dominant eigendirections and contraction of others. RankMeRankMe decreases, α\alpha increases, and long-context generalization capabilities emerge. Figure 2

    Figure 2: Loss decreases monotonically, but representation geometry exhibits non-monotonic transitions through Gray, Maroon, and BlueViolet phases across model families and scales.

    Figure 3

    Figure 3: Layerwise evolution mirrors the three-phase pattern, confirming global geometric dynamics across network depth.

Linking Geometry to Model Capabilities

The Maroon phase is associated with short-context memorization, as evidenced by increased alignment with n-gram statistics. In contrast, the BlueViolet phase correlates with the emergence of long-context generalization, as demonstrated by improved performance on factual QA tasks (TriviaQA) and multiple-choice tasks (SciQ). Figure 4

Figure 4: Distinct learning phases are linked to different LLM capabilities; memorization peaks in Maroon, while generalization and task accuracy surge in BlueViolet.

Ablation experiments show that retaining only top eigen-directions severely degrades task accuracy, indicating that full-spectrum information is essential for robust language understanding.

Mechanistic Insights: Optimization and Bottlenecks

Analytically tractable models reveal that the observed multiphase dynamics arise from the interplay of cross-entropy optimization, skewed token frequencies, and representational bottlenecks (dVd \ll |\mathcal{V}|). Gradient descent exhibits primacy and selection biases, leading to initial collapse, expansion, and subsequent anisotropic consolidation of representations. Figure 5

Figure 5: Learning dynamics of cross-entropy loss replicate multiphase geometric evolution, contingent on skewed class distribution and information bottleneck.

Negative controls (uniform labels, no bottleneck, MSE loss) eliminate the BlueViolet phase, isolating necessary conditions for the observed dynamics.

Post-Training: Alignment and Exploration Trade-offs

Post-training strategies induce distinct geometric transformations:

  • Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO): Both drive Maroon-like expansion, increasing RankMeRankMe and enhancing in-distribution fit but increasing sensitivity to dataset idiosyncrasies and reducing out-of-distribution robustness.
  • Reinforcement Learning from Verifiable Rewards (RLVR): Induces BlueViolet-like contraction, consolidating reward-aligned behaviors and narrowing generative diversity, as evidenced by reduced pass@kk performance at high kk. Figure 6

    Figure 6: Post-training induces distinct geometric transformations in model representations, with SFT/DPO expanding and RLVR contracting the representation manifold.

These mirrored spectral transformations have practical implications for model selection, checkpointing, and the design of training pipelines tailored to desired downstream outcomes.

Implications and Future Directions

The identification of universal geometric phases provides a quantitative framework for understanding the emergence of memorization and generalization in LLMs. The necessity of full-spectrum information for downstream performance underscores the limitations of top-kk proxies and motivates the use of comprehensive spectral metrics. The geometric perspective offers mechanistic explanations for phenomena such as grokking and staged learning, and informs the design of post-training interventions for alignment and exploration.

Limitations include computational constraints (analysis up to 12B parameters), reliance on linearized theoretical models, and focus on English-LLMs. Future work should extend these findings to larger scales, multilingual settings, and more complex architectures, and establish causal links between geometric dynamics and emergent capabilities.

Conclusion

This work demonstrates that LLMs undergo non-monotonic, multiphasic changes in representation geometry during both pretraining and post-training, often masked by monotonically decreasing loss. Spectral metrics (RankMeRankMe, α\alpha) delineate three universal phases—Gray, Maroon, BlueViolet—each linked to distinct model capabilities. Post-training strategies induce mirrored geometric transformations, with practical trade-offs for alignment and exploration. These insights provide a principled foundation for guiding future advancements in LLM development, checkpoint selection, and training strategy design.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, concrete list of what remains missing, uncertain, or unexplored, framed to be actionable for future research.

  • Measurement scope: The analysis focuses on last-token, last-layer representations; the extent to which phase dynamics hold for mid-sequence tokens, multi-token aggregates, and alternative readouts (e.g., residual streams, attention outputs, logits) is not established.
  • Phase boundary detection: The paper qualitatively names Gray, Maroon, and BlueViolet phases, but lacks a reproducible, quantitative criterion (e.g., thresholding rules on RankMe/α, change-point detection) for automatically segmenting training into phases across runs.
  • Architectural dependence: How phase emergence and spectral metrics vary with architecture (depth/width scaling, attention head count, rotary vs absolute position encodings, normalization schemes, activation functions, mixture-of-experts, encoder-decoder vs decoder-only) is not systematically studied.
  • Optimizer and hyperparameters: The role of optimizer choice (AdamW vs SGD variants), weight decay, gradient clipping, batch size, learning-rate schedules (including constant LR and different warmups), dropout, and tokenization in shaping the phases is not disentangled.
  • Data distribution effects: Universality of phases across pretraining corpora (FineWeb vs Pile vs other mixtures), document deduplication, domain balance, multilingual settings, and curriculum/data-ordering is not tested; confounds due to skewed token frequencies are asserted but not exhaustively validated.
  • Scale limits: Results are capped at ~12B parameters; whether the same phase dynamics persist, change, or bifurcate for frontier-scale models (≥70B, ≥100B) remains unanswered.
  • Layerwise generality: While some layerwise results are shown, a systematic mapping of phase timing and magnitude across all layers and modules (MLP vs attention, early vs late layers, heads) is missing.
  • Alternative geometry measures: The study uses RankMe and power-law decay α; robustness to alternative similarity/geometry metrics (e.g., CKA, singular value distribution stability, participation ratio, mutual information, Fisher information, NTK spectra) is unexplored.
  • Covariance estimation: The effective-rank estimates rely on ~10k samples and quadratic scaling with hidden dimension; statistical efficiency, sample size sensitivity, subsampling strategies, and bias/variance trade-offs of spectral estimation are not quantified.
  • Centering and preprocessing: Choices around centering, whitening, sequence-length normalization, and feature preprocessing for covariance estimation are not ablated; their impact on RankMe/α trajectories is unclear.
  • Task coverage: Links between phases and capabilities are shown for SciQ and TriviaQA; generality to broader task families (reasoning, code, math beyond AMC-23, long-context retrieval, multilingual QA, safety/factuality) is not demonstrated.
  • Long-context rigor: Claims about BlueViolet aiding long-range dependencies lack controlled long-context benchmarks (e.g., needle-in-a-haystack, book-level coherence, retrieval with varying context windows) and ablations on context length.
  • Memorization metric validity: The infinity-gram alignment (Spearman correlation) primarily captures short/mid-context statistics; complementary memorization indicators (e.g., exact regurgitation rates, near-duplicate generation, suffix/prefix leakage) and their relation to phases remain unmeasured.
  • Causality vs correlation: The paper’s phase-capability relationships are correlational; causal tests (e.g., controlled interventions that manipulate spectral geometry to observe capability shifts) are not performed.
  • Toy-model realism: The mechanistic explanation uses a linear feature extractor and classifier; extension to non-linear transformers (attention, residual connections) and formal conditions guaranteeing phase transitions in such models is missing.
  • Bottleneck condition: The theoretical bottleneck assumption (dVd \ll |\mathcal{V}|) is asserted as necessary for BlueViolet; quantitative tests varying dd (width scaling) in real LLMs to confirm necessity and sufficiency are absent.
  • Eigenvector reuse and anisotropy: Direct empirical evidence in LLMs for “selection bias” (Δσ_i ∝ σ_i) and eigenvector alignment over training is limited; tracking eigenvectors over time and verifying rotation/reuse dynamics would strengthen the mechanism.
  • Full-spectrum necessity: Eigenvector ablations show performance depends on the full spectrum, but do not test targeted removal strategies, layer-specific spectra, or interactions between subspaces (e.g., top-k vs mid-spectrum vs tail) across diverse tasks.
  • Phase cycling: Observations suggest possible repeated Maroon/BlueViolet cycles with extended pretraining; conditions under which cycles repeat, dampen, or change character are not characterized.
  • Post-training generality: The conclusion that SFT/DPO induce Maroon and RLVR induces BlueViolet is drawn from Tülu-3.1 and OLMo-2-1B; generality across other post-training recipes (PAIR, RLAIF, iterative DPO, multi-objective RL), datasets, and base models is unknown.
  • RLVR diversity vs reward alignment: Declines in pass@256 after RLVR imply reduced exploration, but the specific mechanisms (entropy regularization, policy collapse, reward shaping) and trade-offs between diversity and correctness are not dissected; dependence on sampling parameters (temperature, top-p) is not ablated.
  • In-distribution vs OOD robustness: The observed ID/OOD trade-off under SFT is shown for AH vs AF; systematic OOD evaluations (distribution shifts in style, topic, instruction format, difficulty) and how geometry mediates robustness are not provided.
  • Safety and bias: How geometry phases affect safety (toxicity, jailbreak robustness), bias, and hallucinations is not addressed; whether BlueViolet consolidation aids or harms safety remains an open question.
  • Evaluation judges and win-rates: The AlpacaEval win-rate interpretation may be confounded by judge biases; cross-judge validation and consistency checks (e.g., different LLM judges, human evaluation) are not presented.
  • Training interventions: Concrete recipes to steer geometry (e.g., schedule designs to delay/advance BlueViolet, spectral regularizers, representation bottleneck tuning, controlled noise injection) and their downstream payoff remain to be developed and validated.
  • Generalization bounds: The link between RankMe/α and generalization is motivated by prior theory, but explicit predictive models (e.g., mapping spectral metrics to expected task accuracy or robustness bounds) are not instantiated.
  • Multilingual and modality extension: Whether similar phases occur in multilingual LLMs and multimodal transformers (text–image, text–code) is untested; cross-lingual and cross-modal geometry comparisons are missing.
  • Tokenization effects: Influence of tokenizer vocabulary, BPE merges, and subword segmentation on token frequency skew and phase dynamics is not explored.
  • Reproducibility across seeds: Sensitivity of phase detection and spectral trajectories to random seeds, data-order seeds, and initialization schemes is not reported.
  • Practical compute cost: The feasibility of tracking geometry online during large-scale training (compute/memory overhead, approximate estimators) and its utility for checkpoint selection or early stopping are not evaluated.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 618 likes about this paper.