ConPress: Learning from Contextual Pressure

Updated 8 February 2026

The paper demonstrates that multi-question prompts induce chain-of-thought self-compression, reducing reasoning tokens by up to 70% with minimal accuracy loss.
It employs a self-supervised fine-tuning method that extracts and filters compressed reasoning traces using rejection sampling and token-level cross-entropy loss.
The study extends to LLM safety by using contextual extraction to reduce harmful outputs, achieving improvements in compliance metrics by up to 22.9%.

ConPress (Learning from Contextual Pressure) encompasses a family of approaches that exploit emergent behaviors in LLMs and large reasoning models (LRMs) related to how answer generation is shaped by context. Notably, this includes (1) the self-compression of chain-of-thought (CoT) reasoning traces under multi-question contextual pressure (Deng et al., 1 Feb 2026), and (2) the extraction of safety-relevant context for improving compliance and reducing harmful responses in LLMs (Kim et al., 12 Dec 2025). These methods leverage context-driven pressures, either at inference or via explicit architectural or training modifications, to induce more efficient or safer model behaviors.

1. Self-Compression Under Contextual Pressure

A reproducible inference-time phenomenon termed Self-Compression arises when an LRM is presented with multiple independent, simultaneously answerable questions in a single prompt. Instead of producing verbose CoT traces per question, the model systematically compresses its reasoning for each. Formally, for a single question $q$ with model output $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ , the reasoning length is $L^{(1)} = |r|$ . For $N$ -question prompts $Q = (q_1, ..., q_N)$ , per-question trace lengths $L_i^{(N)} = |r_i^{(N)}|$ decrease with $N$ , producing an empirically measured compression rate: $\rho_i^{(N)} = 1 - \frac{L_i^{(N)}}{L_i^{(1)}}$ For instance, on MATH500 with Qwen3-4B, increasing $N$ from 1 to 8 reduces $L$ per question from approximately 300 to 90 tokens (∼70% compression).

This self-compression effect generalizes across model scales and reasoning benchmarks, and does not require explicit constraints or additional supervision at inference (Deng et al., 1 Feb 2026). A plausible implication is that competitive context occupancy in the prompt incentivizes the model to internalize more succinct reasoning steps per item.

2. Methodology of ConPress: Learning Concise Reasoning

ConPress capitalizes on self-compression by constructing multi-question prompts to induce concise outputs, which are then parsed and filtered for correctness and directly used to fine-tune the model. For a pool $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 0 of single-question items, prompts are sampled as: $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 1 Each generated CoT trace is extracted using explicit markers (e.g., > … ), split by question, and filtered through rejection sampling, retaining only those $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 2 where the model’s answer $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 3 matches the ground truth $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 4.

The resulting dataset $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 5 of concise, accurate traces is used to fine-tune the LRM with token-level cross-entropy loss: $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 6 This approach is self-supervised; no external teacher, manual editing, or RL is employed (Deng et al., 1 Feb 2026).

3. Experimental Results and Ablations

ConPress demonstrates substantial reductions in reasoning token usage while maintaining accuracy. Summarizing representative results with Qwen3-4B-Thinking:

Dataset	Baseline Tokens	ConPress Tokens	Compression Ratio (CR)	Baseline Acc	ConPress Acc	ΔAcc
MATH500	6634	2661	0.60	95.6%	96.0%	+0.4
AIME25	21442	14258	0.33	72.5%	70.1%	–2.4

Other distilled models (R1-7B, R1-1.5B) show consistent 34–40% token reductions with negligible (<0.3%) accuracy loss. Ablation over $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 7 shows a trade-off between efficiency and performance; $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 8 is often optimal, while very large $\langle\text{think}\rangle r \langle/\text{think}\rangle o$ 9 yields diminishing returns. Notably, compression improves most on easier problems, but remains stable at higher difficulty (Deng et al., 1 Feb 2026).

4. Context Extraction for LLM Safety: ContextLens

A related but distinct paradigm explores "learning from contextual pressure" in the domain of LLM safety. The ContextLens framework formalizes the extraction and utilization of implicit user context for risk-sensitive inference (Kim et al., 12 Dec 2025). It introduces a context generator policy $L^{(1)} = |r|$ 0 that, given a prompt $L^{(1)} = |r|$ 1, autoregressively outputs a context snippet $L^{(1)} = |r|$ 2: $L^{(1)} = |r|$ 3 A frozen foundation LLM (parameters $L^{(1)} = |r|$ 4) then conditions on both $L^{(1)} = |r|$ 5 and the generated $L^{(1)} = |r|$ 6 at inference: $L^{(1)} = |r|$ 7

ContextLens is trained with a Generalized Reward Policy Optimization objective, combining an autoencoder reconstruction loss (recovering $L^{(1)} = |r|$ 8 from corrupted $L^{(1)} = |r|$ 9 and generated $N$ 0) with LLM-judged similarity and safety compliance. The reward penalizes verbatim copying, enforcing the learning of nontrivial, intent-exposing context.

Empirical results indicate that inserting learned context reduces harmful outputs by 5.6% (SafetyInstruct, average across foundation models) and offers up to 22.9% improvement in harmonic mean on adversarial datasets. The method is modular and transferable across LLM architectures (Kim et al., 12 Dec 2025).

5. Connections, Limitations, and Future Directions

Both ConPress and ContextLens exploit contextual pressure but differ in objective: ConPress internalizes efficient, compressed reasoning, while ContextLens extracts signal for safer, policy-compliant inference. Current limitations include minor accuracy reductions on difficult tasks (ConPress, AIME), reliance upon external LLM-based judges for reward computation (ContextLens), and the extra inference cost for context generation.

Open research directions include theoretical accounts of why CoT self-compression arises under multi-question pressure, adaptive tuning of compression targets, and extending self-compression and learned context extraction beyond mathematics to domains such as code generation, planning, or multi-modal inference. For safety, incorporation of retrieval mechanisms, multi-module pipelines, and decode-time dynamic context adaptation are under exploration (Deng et al., 1 Feb 2026, Kim et al., 12 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (2)

ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure (2026)

Learning to Extract Context for Context-Aware LLM Inference (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ConPress (Learning from Contextual Pressure).