Papers
Topics
Authors
Recent
Search
2000 character limit reached

ConPress: Learning from Contextual Pressure

Updated 8 February 2026
  • The paper demonstrates that multi-question prompts induce chain-of-thought self-compression, reducing reasoning tokens by up to 70% with minimal accuracy loss.
  • It employs a self-supervised fine-tuning method that extracts and filters compressed reasoning traces using rejection sampling and token-level cross-entropy loss.
  • The study extends to LLM safety by using contextual extraction to reduce harmful outputs, achieving improvements in compliance metrics by up to 22.9%.

ConPress (Learning from Contextual Pressure) encompasses a family of approaches that exploit emergent behaviors in LLMs and @@@@1@@@@ (LRMs) related to how answer generation is shaped by context. Notably, this includes (1) the self-compression of chain-of-thought (CoT) reasoning traces under multi-question contextual pressure (Deng et al., 1 Feb 2026), and (2) the extraction of safety-relevant context for improving compliance and reducing harmful responses in LLMs (Kim et al., 12 Dec 2025). These methods leverage context-driven pressures, either at inference or via explicit architectural or training modifications, to induce more efficient or safer model behaviors.

1. Self-Compression Under Contextual Pressure

A reproducible inference-time phenomenon termed Self-Compression arises when an LRM is presented with multiple independent, simultaneously answerable questions in a single prompt. Instead of producing verbose CoT traces per question, the model systematically compresses its reasoning for each. Formally, for a single question qq with model output thinkr/thinko\langle\text{think}\rangle r \langle/\text{think}\rangle o, the reasoning length is L(1)=rL^{(1)} = |r|. For NN-question prompts Q=(q1,...,qN)Q = (q_1, ..., q_N), per-question trace lengths Li(N)=ri(N)L_i^{(N)} = |r_i^{(N)}| decrease with NN, producing an empirically measured compression rate: ρi(N)=1Li(N)Li(1)\rho_i^{(N)} = 1 - \frac{L_i^{(N)}}{L_i^{(1)}} For instance, on MATH500 with Qwen3-4B, increasing NN from 1 to 8 reduces LL per question from approximately 300 to 90 tokens (∼70% compression).

This self-compression effect generalizes across model scales and reasoning benchmarks, and does not require explicit constraints or additional supervision at inference (Deng et al., 1 Feb 2026). A plausible implication is that competitive context occupancy in the prompt incentivizes the model to internalize more succinct reasoning steps per item.

2. Methodology of ConPress: Learning Concise Reasoning

ConPress capitalizes on self-compression by constructing multi-question prompts to induce concise outputs, which are then parsed and filtered for correctness and directly used to fine-tune the model. For a pool Q\mathcal{Q} of single-question items, prompts are sampled as: {q1,,qN}Q,P=Pack(q1,,qN)\{q_1,\dots,q_N\}\sim\mathcal{Q},\quad P = \mathrm{Pack}(q_1,\ldots,q_N) Each generated CoT trace is extracted using explicit markers (e.g., > … ), split by question, and filtered through rejection sampling, retaining only those (qi,ri(N),o^i)(q_i, r_i^{(N)}, \hat{o}_i) where the model’s answer o^i\hat{o}_i matches the ground truth oio_i.

The resulting dataset DCP\mathcal{D}_{\mathrm{CP}} of concise, accurate traces is used to fine-tune the LRM with token-level cross-entropy loss: L(θ)=(qi,ri(N))DCP  t=1ri(N)logpθ ⁣(ri,t(N)qi,ri,<t(N))\mathcal{L}(\theta) = -\sum_{(q_i, r_i^{(N)}) \in \mathcal{D}_{\mathrm{CP}}}\;\sum_{t=1}^{|r_i^{(N)}|} \log p_{\theta}\!\left(r_{i,t}^{(N)}\mid q_i,\, r_{i,<t}^{(N)}\right) This approach is self-supervised; no external teacher, manual editing, or RL is employed (Deng et al., 1 Feb 2026).

3. Experimental Results and Ablations

ConPress demonstrates substantial reductions in reasoning token usage while maintaining accuracy. Summarizing representative results with Qwen3-4B-Thinking:

Dataset Baseline Tokens ConPress Tokens Compression Ratio (CR) Baseline Acc ConPress Acc ΔAcc
MATH500 6634 2661 0.60 95.6% 96.0% +0.4
AIME25 21442 14258 0.33 72.5% 70.1% –2.4

Other distilled models (R1-7B, R1-1.5B) show consistent 34–40% token reductions with negligible (<0.3%) accuracy loss. Ablation over NN shows a trade-off between efficiency and performance; N=3N=3 is often optimal, while very large NN yields diminishing returns. Notably, compression improves most on easier problems, but remains stable at higher difficulty (Deng et al., 1 Feb 2026).

4. Context Extraction for LLM Safety: ContextLens

A related but distinct paradigm explores "learning from contextual pressure" in the domain of LLM safety. The ContextLens framework formalizes the extraction and utilization of implicit user context for risk-sensitive inference (Kim et al., 12 Dec 2025). It introduces a context generator policy πθ\pi_\theta that, given a prompt xXx\in X, autoregressively outputs a context snippet cCc\in C: πθ(cx)=t=1cπθ(atx,a<t)\pi_\theta(c|x) = \prod_{t=1}^{|c|} \pi_\theta(a_t | x, a_{<t}) A frozen foundation LLM (parameters ϕ\phi) then conditions on both xx and the generated cc at inference: pϕ(yx,c)=t=1ypϕ(ytx,c,y<t)p_\phi(y|x, c) = \prod_{t=1}^{|y|} p_\phi(y_t | x, c, y_{<t})

ContextLens is trained with a Generalized Reward Policy Optimization objective, combining an autoencoder reconstruction loss (recovering xx from corrupted xx' and generated cc) with LLM-judged similarity and safety compliance. The reward penalizes verbatim copying, enforcing the learning of nontrivial, intent-exposing context.

Empirical results indicate that inserting learned context reduces harmful outputs by 5.6% (SafetyInstruct, average across foundation models) and offers up to 22.9% improvement in harmonic mean on adversarial datasets. The method is modular and transferable across LLM architectures (Kim et al., 12 Dec 2025).

5. Connections, Limitations, and Future Directions

Both ConPress and ContextLens exploit contextual pressure but differ in objective: ConPress internalizes efficient, compressed reasoning, while ContextLens extracts signal for safer, policy-compliant inference. Current limitations include minor accuracy reductions on difficult tasks (ConPress, AIME), reliance upon external LLM-based judges for reward computation (ContextLens), and the extra inference cost for context generation.

Open research directions include theoretical accounts of why CoT self-compression arises under multi-question pressure, adaptive tuning of compression targets, and extending self-compression and learned context extraction beyond mathematics to domains such as code generation, planning, or multi-modal inference. For safety, incorporation of retrieval mechanisms, multi-module pipelines, and decode-time dynamic context adaptation are under exploration (Deng et al., 1 Feb 2026, Kim et al., 12 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ConPress (Learning from Contextual Pressure).