Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

Published 18 Jun 2025 in cs.LG, cs.AI, cs.CL, and eess.SP | (2506.15882v1)

Abstract: Test-time compute has emerged as a powerful paradigm for improving the performance of LLMs, where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of each input. This supports two key modes of test-time scaling: (1) improving output quality in breadth-based strategies (e.g., Best-of-N, majority voting), and (2) enhancing the correctness of individual reasoning chains in depth-based strategies (e.g., self-reflection). Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a novel inference framework that uses latent steering vectors to enable adaptive reasoning intensity in LLMs.
It employs a tunable scalar, α, to parameterize the shift in latent states, allowing for dynamically controlled reasoning depth.
Experimental evaluations on GSM8K, MATH500, and GPQA confirm significant accuracy improvements over conventional inference methods.

Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

Introduction

The study introduces Fractional Reasoning, a novel inference-time framework designed to augment LLMs by allowing fine-grained control of reasoning intensity. This method leverages latent vectors to tailor the model's reasoning depth to the complexity of input queries. Existing methods such as Best-of-N and majority voting often lack this adaptability, applying uniform reasoning across diverse inputs.

Fractional Reasoning Framework

The core proposition of Fractional Reasoning lies in its ability to control reasoning intensity via latent steering vectors:

Latent State Shift: The framework interprets the addition of reasoning-inducing prompts as causing a directional shift in the model's latent state. This shift can be parameterized using a tunable scalar, $\alpha$ , that defines the strength of the reasoning prompt.
Figure 1: Example illustrating how model behavior changes with the scale of instructional strength $\alpha$ controlling the ``fraction" of reasoning, applied to both Chain-of-Thought and Reflection prompting.
Steering Vector Construction: Positive and negative prompt pairs are used to extract a latent steering vector, effectively capturing the model's shift in behavior due to reasoning prompts.
Application: The model's latent states are dynamically modulated at inference time by scaling the steering vector, allowing for adjustable depth in reasoning.

Experimental Evaluation

Extensive experiments across GSM8K, MATH500, and GPQA validate the method's effectiveness:

Performance Gains: The framework consistently improves accuracy over traditional inference-time compute methods, as demonstrated in multiple benchmarks.
Figure 2: Averaged accuracy across MATH500, GSM8K, and GPQA. Blue bars represent standard test-time scaling methods, purple bars show these methods enhanced by our Fractional Reasoning.
Adaptable Depth Control: Through variable $\alpha$ , models exhibit controlled reasoning, balancing between concise direct responses and detailed multi-step reasoning.
Scalability: The method demonstrates robustness across varied compute budgets, showcasing linear improvements with increased sample sizes.
Figure 3: Accuracy on GSM8K and GPQA as a function of the number of generations.

Fractional Reasoning in Reflection

Beyond chain-of-thought prompting, Fractional Reasoning enhances reflection-based tasks:

Reflection Control: The framework regulates reflection by adjusting the influence of steering vectors, thus facilitating optimal correction levels.
Figure 4: Sentence-level control dynamically adjusts reflection strength $\alpha$ at each generation step, enabling correction of errors missed by instance-level control.
Error Mitigation: By dynamically steering reflection strength, the model avoids excessive reflections on correct outputs and insufficient analysis on errors.

Discussion and Conclusion

Fractional Reasoning provides a versatile, interpretable approach to inference time LLMs, addressing the limitations of uniform reasoning. The study outlines the potential for adaptive policies, suggesting future research to automate dynamic $\alpha$ tuning. This could further enhance model efficiency and precision, promoting comprehensive adaptability across reasoning and reflection contexts.

Endnote: Implementing these strategies allows for efficient, targeted reasoning scaling in LLMs, enhancing interpretability and execution at test time without additional training. The research supports ongoing developments in AI, potentially impacting diverse applications where reasoning precision is critical.

Markdown