Papers
Topics
Authors
Recent
Search
2000 character limit reached

P-ALIGN: Adaptive Prefix Distillation

Updated 22 January 2026
  • The paper introduces P-ALIGN, a framework that adaptively selects minimal reasoning prefixes to distill teacher LLM capabilities into smaller student models.
  • It employs a binary search mechanism with in-model sufficiency checks to identify optimal prefix lengths, reducing redundant and unlearnable content.
  • Experiments show that P-ALIGN boosts Pass@3 performance by up to 3.7% on math benchmarks, outperforming fixed-length truncation methods.

Prefix-ALIGNment Distillation (P-ALIGN) is a framework for distilling the mathematical reasoning abilities of LLMs into smaller student models by adaptively leveraging teacher-generated chain-of-thought (CoT) trajectories. P-ALIGN introduces an adaptive mechanism for prefix selection, aiming to maximize the informativeness and learnability of supervision signals, while mitigating the detrimental effect of lengthy, complex, or redundant reasoning paths often produced by high-capacity teacher models (Liu et al., 15 Jan 2026).

1. Motivation and Problem Setting

The main challenge addressed by P-ALIGN is the capacity mismatch between large teacher LLMs, which produce long and structurally complex CoTs, and smaller student models, which lack the parameters to effectively emulate these detailed trajectories. In the standard supervised-fine-tuning (SFT) paradigm, the objective is to minimize the negative log-likelihood loss over all reasoning tokens:

J=i=1nt=1RilogP(Ri,tRi,<t,InstructQA(qi);Mstudent)J = -\sum_{i=1}^n \sum_{t=1}^{|R_i|} \log P(R_{i,t}\mid R_{i,<t},\mathrm{InstructQA}(q_i);M_\text{student})

where qiq_i is a mathematical problem, and RiR_i is the full teacher-generated CoT for qiq_i. Teacher CoTs can be hundreds of tokens and often include exploratory, repetitive, or uncertain reasoning segments. Empirically, small models struggle to learn from the full trajectory, often failing to generalize or converging slowly due to the overabundance of unlearnable content and entangled logical structures in late-stage reasoning.

2. P-ALIGN Methodology

P-ALIGN reframes long-chain reasoning distillation by partitioning the teacher trajectory Ri=(ri,1,ri,2,,ri,m)R_i = (r_{i,1}, r_{i,2}, \ldots, r_{i,m}) into sentence-level prefixes and suffixes. The core methodological contributions are:

  • Adaptive Prefix Selection: For each question qiq_i, P-ALIGN seeks the minimal prefix length pp^* such that the prefix Ri[1..p]R_i[1..p^*] is sufficient for solution synthesis, as judged by the student model itself.
  • Self-Judging via InstructEval: The student model is prompted with “InstructEval” to assess if a prefix contains enough information. For each candidate prefix, sufficiency is labeled as L=Mstudent(InstructEval(qi,Ri[1..p])){ENOUGH,NOT_ENOUGH}L = M_\text{student}(\mathrm{InstructEval}(q_i, R_i[1..p])) \in \{\text{ENOUGH}, \text{NOT\_ENOUGH}\}.
  • Binary Search Optimization: P-ALIGN locates pp^* with O(logm)O(\log m) inference calls, substantially reducing overhead relative to linear scans.
  • Alignment Dataset Construction: Given the selected prefix Pi=Ri[1..p]P_i = R_i[1..p^*], the student is prompted to generate a full reasoning continuation (y^i=Mstudent(InstructAlign(qi,Pi))\hat{y}_i = M_\text{student}(\mathrm{InstructAlign}(q_i, P_i))). Only completions yielding a correct final answer aia_i are retained, forming Dalign={(qi,Piy^i)Ans(y^i)=ai}D_\text{align} = \{(q_i, P_i \mathbin\Vert \hat{y}_i) \mid \mathrm{Ans}(\hat{y}_i) = a_i\}.

Fine-tuning is then performed on DalignD_\text{align} using the same token-level cross-entropy objective, with the student required to reproduce the teacher’s prefix and then generate the remainder unaided.

3. Implementation Specifics

P-ALIGN is implemented with Qwen2.5-7B-Instruct and Qwen3-8B as student models, using DeepSeek-R1 as the teacher. Fine-tuning is conducted for 3 epochs with a learning rate of 5×1055 \times 10^{-5} using LoRA adapters for parameter-efficient adaptation. Binary search yields an average of six sufficiency evaluations per problem (in contrast to twelve for linear scanning). No fixed prefix length or truncation ratio is used; pp^* is determined adaptively for each training instance.

4. Experimental Evaluation

P-ALIGN is evaluated on four mathematical reasoning benchmarks of varying complexity: AIME25, AIME24, AMC12, and MATH500. Metrics are Pass@1 and Pass@3, measuring the proportion of problems solved with the correct answer among the top-1 and top-3 completions, respectively.

Key empirical findings:

  • On Qwen2.5-7B, P-ALIGN improves Pass@3 by +3.7% (from 47.68% to 50.60%) relative to standard SFT on full teacher CoTs, with similar gains observed on Qwen3-8B.
  • P-ALIGN surpasses UPFT (fixed-length prefix truncation + SFT) by over 2% on average.
  • On MATH500, Pass@3 increases from 47.68% (SFT(Long-CoT)) to 50.60% (P-ALIGN).
  • Strong results are obtained without hyperparameter tuning of prefix lengths, underscoring the effectiveness of adaptive truncation.

Baselines

Method Core Approach Pass@3 (Qwen2.5-7B)
SFT(Label) Supervised ground-truth labels only <50.60%
SFT(Long-CoT) SFT on full teacher reasoning chains (Eq. 2) 47.68%
UPFT 32-token fixed prefix truncation + SFT <48.60%
P-ALIGN Adaptive prefix + alignment (in-model sufficiency check) 50.60%

A detailed breakdown confirms P-ALIGN’s consistent superiority over baselines across benchmarks (Liu et al., 15 Jan 2026).

5. Ablation and Analysis

Ablation studies highlight the importance of adaptive prefix selection and prefix-based alignment:

  • Teacher Prefix only: 45.09% Pass@3 (Qwen2.5-7B)
  • Student-generated CoT only: 44.28%
  • Entropy-based truncation (P-ALIGN(InfoGain)): 45.33%
  • Without Adaptive Read: 46.25%
  • Without Binary Search: 49.82%
  • Full P-ALIGN: 50.60%

These results indicate that neither teacher-only prefixes nor fixed-length truncations suffice. Adaptive reading and binary-search-driven selection are essential for maximal knowledge transfer.

Prefix quality analysis shows that fixed-ratio prefixes provide mixed results: longer prefixes help on difficult tasks but induce overthinking on simpler ones. P-ALIGN adaptively balances this, yielding outputs that are shorter than SFT(Long-CoT) but longer than UPFT, and rated highest for reasoning quality by external judges (GLM-4.5). Token entropy analyses demonstrate rising prediction uncertainty toward the CoT suffix, suggesting that later reasoning steps are less learnable and less informative for distillation.

6. Insights and Practical Guidelines

P-ALIGN is effective because prefixes encapsulate the “core logical skeleton” of reasoning necessary for solution construction, while suffixes often contain noisy, speculative, or redundant information. Binary-search-driven self-judging by the student ensures that prefixes are minimal yet sufficient, aligning with the student's learning capacity.

Practical recommendations include:

  • Employing an in-model InstructEval sufficiency check for adaptive per-instance prefix selection.
  • Using binary search to constrain computational overhead to O(logm)O(\log m) evaluations per example.
  • Aligning via SFT only on examples where the student's continuation culminates in the correct answer.
  • Adopting parameter-efficient fine-tuning methods such as LoRA.
  • Avoiding manual tuning of truncation ratios, as P-ALIGN’s adaptive mechanism obviates this need.

A plausible implication is that adaptive prefix alignment may generalize to other domains where reasoning paths are long and heterogeneous, and that in-model evaluation of prefix sufficiency is a robust mechanism for extracting learnable supervision signals (Liu et al., 15 Jan 2026).

7. Broader Significance

P-ALIGN demonstrates that strategic truncation and selection of teacher-generated reasoning prefixes facilitate more effective transfer of reasoning skills from large to small LLMs. By discarding unlearnable CoT segments and focusing on the essential logical skeleton, P-ALIGN outperforms prior approaches—both fixed-ratio truncation and full-chain SFT—across several challenging mathematical benchmarks. These insights inform both the theory and practice of reasoning transfer in LLMs and support the broader application of adaptive alignment techniques in model distillation and beyond (Liu et al., 15 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Prefix-ALIGNment Distillation (P-ALIGN).