Papers
Topics
Authors
Recent
Search
2000 character limit reached

Log-Linearity in LLMs

Updated 7 February 2026
  • Log-linearity in LLMs is the nearly linear evolution of model weights and logits, enabling efficient extrapolation and precise behavioral control.
  • Extrapolation techniques, including weight and logits prediction, allow skipping RL steps and achieving up to 6.1× speedup in training efficiency.
  • Logit-Linear Selection leverages subtle logit shifts to fine-tune models for consistent behavior, raising implications for model security and dataset auditing.

Log-linearity in LLMs refers to the empirical and theoretical observation that both the internal parameters (weights) and output statistics (log-probabilities or logits) of an LLM evolve in a strongly linear fashion under a variety of training protocols, particularly @@@@1@@@@ with verifiable rewards (RLVR) and preference-based fine-tuning. This linearity manifests not only in the time evolution of model weights but also in the geometric structure of context-dependent log-probabilities, enabling novel algorithms for model extrapolation, subset selection, and behavioral control. Log-linearity has important implications for model efficiency, interpretability, dataset auditing, and both intentional and unintentional behavioral manipulation of LLMs (Wang et al., 8 Jan 2026, Aden-Ali et al., 4 Feb 2026).

1. Mathematical Characterization of Log-Linearity

Log-linearity in LLMs can be formalized in two principal senses: trajectory linearity during training and low-rank logit structure across contexts.

Timewise Linearity in RLVR Training

Let yi(t)y_i(t) denote a model quantity at training step tt—for example, a single model weight or token log-probability. The linearity is measured by fitting a regression yi(t)=ait+biy_i(t) = a_i t + b_i and computing the coefficient of determination:

Ri2=1k(yi(tk)y^i(tk))2k(yi(tk)yiˉ)2R^2_i = 1 - \frac{\sum_k (y_i(t_k) - \hat{y}_i(t_k))^2}{\sum_k (y_i(t_k) - \bar{y_i})^2}

with y^i(t)=ait+bi\hat{y}_i(t) = a_i t + b_i and yˉi\bar{y}_i the mean. For centered variables, Ri2R^2_i coincides with the squared Pearson correlation between tt and yiy_i.

Empirically, over 80% of sampled weights and logits across LLMs exhibit R2>0.7R^2 > 0.7, with distributions tightly concentrated around 0.9, indicating nearly perfect linear evolution with training steps (Wang et al., 8 Jan 2026).

Log-Linear Context Representation

For a model MM, a context (s,p,r)(s,p,r)—composed of system prompt ss, user prompt pp, and response rr—admits an ϵ\epsilon-approximately log-linear representation if vector embeddings ψM(s)Rd\psi_M(s) \in \mathbb{R}^d and ϕ(p,r)Rd\phi(p,r) \in \mathbb{R}^d exist such that

logPrM[rs,p]ψM(s),ϕ(p,r)ϵ,|\log \Pr_M[r\,|\,s,p] - \langle \psi_M(s), \phi(p,r)\rangle| \leq \epsilon,

uniformly over (s,p,r)(s,p,r) (Aden-Ali et al., 4 Feb 2026). In matrix notation, the output log-probability matrix XMX_M satisfies XMΨΦX_M \approx \Psi \Phi^\top, i.e., is approximately low-rank.

Mixtures of system prompts correspond to linear addition of their embeddings in predictor space:

logPrM[rs1s2,p]ψM(s1)+ψM(s2),ϕ(p,r).\log \Pr_M[r\,|\,s_1\|s_2,p] \approx \langle \psi_M(s_1) + \psi_M(s_2), \phi(p,r)\rangle.

2. Extrapolation Algorithms Leveraging Log-Linearity

The strict temporal linearity of weights and logits in RLVR post-training enables efficient step-skipping and lookahead through extrapolation:

Logits Extrapolation

Given output logits lt0l_{t_0}, lt1l_{t_1} at steps t0<t1t_0 < t_1, the logits at future step tt' are estimated as

lt=lt0+α(lt1lt0),α=tt0t1t0.l_{t'} = l_{t_0} + \alpha (l_{t_1} - l_{t_0}), \quad \alpha = \frac{t'-t_0}{t_1 - t_0}.

Sampling from softmax(lt)\operatorname{softmax}(l_{t'}) at inference time yields the desired extrapolated behavior.

Weight Extrapolation

Let Wt0,Wt1W_{t_0}, W_{t_1} be full model parameter tensors at steps t0,t1t_0, t_1. Extrapolated parameters are computed via

Wt=Wt0+β(Wt1Wt0),β=tt0t1t0.W_{t'} = W_{t_0} + \beta (W_{t_1} - W_{t_0}), \quad \beta = \frac{t'-t_0}{t_1 - t_0}.

The resulting model WtW_{t'} can be directly evaluated or used as an initial checkpoint for further fine-tuning (Wang et al., 8 Jan 2026).

RL-Extra (Interleaved Extrapolation)

RL-Extra interleaves mm real RL steps with nn weight extrapolation steps in cycles of length C=m+nC = m+n, updating by gradient descent if (kmodC)<m(k \bmod C) < m, else by extrapolation. This augments RLVR compute efficiency while mitigating long-range drift.

3. Subset Selection and Behavioral Control via Log-Linearity

Log-linearity extends beyond temporal trajectories to the interaction between dataset structure and context-elicited behavior. The Logit-Linear Selection (LLS) method operationalizes this:

LLS Algorithm

Given a teacher model MTM_T, preference dataset D={(pi,ri+,ri)}D = \{(p_i, r^+_i, r^-_i)\}, and target system prompt ss^*, each example is scored by the logit shift:

wi=[logPrMT(ri+s,pi)logPrMT(ris,pi)][logPrMT(ri+pi)logPrMT(ripi)].w_i = \left[ \log \Pr_{M_T}(r^+_i\,|\,s^*,p_i) - \log \Pr_{M_T}(r^-_i\,|\,s^*,p_i) \right] - \left[ \log \Pr_{M_T}(r^+_i\,|\,p_i) - \log \Pr_{M_T}(r^-_i\,|\,p_i) \right].

The top γ\gamma quantile of wiw_i defines a subset D^\widehat{D}. Fine-tuning a student model MSM_S on D^\widehat{D} (using DPO or similar) causes MSM_S, even when queried without system prompt, to mimic the behavior induced in MTM_T by ss^*.

A summary of LLS steps is provided below:

Step Operation Notes
1 Compute wiw_i for all examples Requires two forward passes per example
2 Normalize (optional) Adjust for varying token lengths
3 Filter and sort Keep wi>0w_i > 0, select top γ\gamma fraction
4 Fine-tune on subset D^\widehat{D} Using standard preference alignment

4. Experimental Verification and Quantitative Results

Linearity and its algorithmic consequences are validated across diverse LLMs, datasets, and tasks.

Trajectory Linearity in RLVR

  • For DeepSeek-R1-Distilled-Qwen-1.5B on four RLVR benchmarks (AIME '24, AIME '25, MATH-500, LiveCodeBench), R2R^2 for both weights and logits is sharply concentrated at 0.9\sim 0.9, with >80%>80\% of sampled tokens exhibiting R2>0.7R^2 > 0.7 (Wang et al., 8 Jan 2026).
  • Median R2>0.8R^2 > 0.8 generalizes across model sizes (1.5B–8B), architectures (Qwen, Llama), and RL algorithms (GRPO, GSPO, Reinforce++).

Efficiency Gains from Extrapolation

  • Weight extrapolation, projecting from t0=0t_0=0, t1=300t_1=300 to t=900t'=900 matches the baseline RL performance on AIME-24 (Avg@64 0.36\approx 0.36) at fraction of compute.
  • Logits extrapolation, e.g. from steps 03000 \to 300 to 12001\,200, yields an absolute \sim3% gain over direct RL at 12001\,200 steps.
  • RL-Extra with (m=20,n=100)(m=20, n=100) achieves a 6.1×6.1\times speedup with no loss in final accuracy.

Logit-Linear Selection Behavioral Experiments

  • Animal preference induction: Models fine-tuned via LLS reference “You really love [animal]s,” mentioning the [animal] at rates comparable to system-prompted baselines, despite zero [animal] mentions in the fine-tuning subset.
  • Instruction-following and translation: LLS fine-tuning for translation elicits pure target-language outputs (e.g., Spanish, Chinese), with no source-language leakage in D^\widehat{D}.
  • Persona shift: LLS using “You are an evil ruler…” induces “evil” completions at rates of $60$–80%80\%, equivalent to or exceeding system-prompted reference models. Random subsets confer no effect (Aden-Ali et al., 4 Feb 2026).

5. Implications for Model Understanding, Dataset Design, and Security

The emergence of log-linearity as a fundamental regularity in LLMs substantially alters the landscape of post-training, behavioral auditing, and safety.

  • In RLVR, most training steps merely amplify trajectories set early, with minimal qualitative novelty beyond initial trends.
  • Linear extrapolation—either in logits or weights—can reliably “skip” hundreds of RLVR steps, or extend to steps where direct RLVR becomes unstable due to entropy collapse.
  • LLS demonstrates that coherent behaviors (preferences, personas, instruction compliance) can be implanted through carefully chosen subsets lacking obvious semantic correlates—posing both opportunities (e.g., efficient watermarking, fine-grained control) and risks (e.g., stealth backdoors, dataset poisoning).
  • Standard heuristics like keyword removal may fail, as ensemble linear effects from innocuous-appearing records accumulate in model logits.

A plausible implication is the need for new linear-algebraic dataset auditing tools to detect and attribute subliminal logit-space behaviors, as well as the development of defenses such as null-space regularization or adversarial prompt probing targeting undesired linear shifts (Aden-Ali et al., 4 Feb 2026).

6. Universality and Theoretical Perspective

Log-linearity appears to be a universal phenomenon across architectures (Qwen, Llama), model sizes (1.5B–8B), and post-training RL algorithms, particularly under low-noise optimizers such as AdamW with small learning rates and large batch sizes. Both mechanistic interpretability and the literature on spurious correlation in deep models intersect with the observed low-rank, log-linear structure: small, distributed correlations in high-dimensional data can produce macroscopic behavioral changes under gradient-based fine-tuning. Exploiting log-linearity is poised to become standard practice in efficient, robust LLM development and curation (Wang et al., 8 Jan 2026, Aden-Ali et al., 4 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Log-Linearity in LLMs.