0

Fractional reasoning in LLMS: A new way to control the depth of reasoning

What is included in this article:
Limitations of current test time calculation LLMS strategy.
Introducing Fractional Inference (FR) As an inappropriate framework for untrained models.
Potential state manipulation technology Use reasoning tips and adjustable zoom.
Scaling advantages based on breadth and depth Shown in GSM8K, MATH500 and GPQA.
Evaluation results Shows that FR is better than the best N and most votes.
Analysis of FR behavior In different models including DeepSeek-R1.

Introduction: The Challenges of Unified Reasoning During Reasoning

LLM shows improvements in various fields, and test time calculations play a crucial role in its performance. This approach enhances inference during inference by allocating additional computing resources, such as generating multiple candidate responses and selecting the most appropriate response, or iterating through self-reflection. However, the current test time calculation strategy handles all problems evenly, applying the same depth of reasoning regardless of query difficulty or structure. In fact, the inference needs are highly variable and through insufficient, overthinking or reflecting reasoning can lead to degradation of the answer or unnecessary computational costs. Therefore, the LLM must be able to dynamically adjust its inference depth or reflection level.

Previous work: potential steering and representative control

Existing research explores various methods for enhancing LLM inference through inference time scaling and latent state control. This chain (COT) prompts the technical guidance model to break down complex problems into intermediate steps to improve inference performance. The result reward model (ORM) and the process reward model (PRM) respond based on the correctness of internal reasoning or quality assessment. Furthermore, the representation engineering method uses the steering vector in the latent space of the LLM for controlled generation, while methods such as the Internal Follower Vector (ICV) extract the latent vector from the inference time to the guiding internal state, while the representation naming (REFT) learns task-specific low-level interventions.

The proposed framework: fractional reasoning for adaptive reasoning

Researchers at Stanford University proposed fractional inference (FR), a framework for improving test time calculations through adaptive inference control. FR adjusts the inference behavior by directly modifying the internal representation of the model, thereby extracting the potential shift caused by inference inputs such as COT or reflection prompts, and again applies this shift using the adjustable scaling coefficient. This allows the model to adjust the depth of reasoning during the inference process without modifying the input text or requiring fine-tuning. FR supports and enhances two critical forms of test time scaling: (a) breadth-based scaling, such as best n and majority votes, and (b) depth-based scaling, such as self-reflection.

Benchmark: Performance improvement in inference tasks

Evaluate FR on three benchmarks that require multi-step reasoning: GSM8K, MATH500, and GPQA. This evaluation used a test set of GSM8K and MATH500 when using diamond splitting using GPQA. The main experiments used two competitive open source guidance adjustment models: QWEN2.5-7B-INSTRUCT and LLAMA-3.1-8B-INSTRUCTION, both of which showed strong inference capabilities and provided access to the potential state representations required by the proposed method. FR outperforms standard test time calculation methods on all benchmarks and models, showing that it can strongly improve performance. The impact of adjustment prompts allows a wider exploration of solution space, thereby increasing the efficiency of traditional test time calculation methods.

The behavior of fractional reasoning and the generality of insufficient model

The researchers further analyzed FR to understand its behavioral dynamics, universality across models, and other metrics. Analysis shows that increasing scaling parameters results in longer outputs and with more detailed multi-step reasoning, which allows predictive, continuous confirmation of model behavior. FR is still valid, even if applied to inference-specific models, such as DeepSeek-R1-Distill-Qwen-7b, improves the accuracy of baselines with standard cues and shows its versatility in general and professional LLM. Performance scaling analysis shows that the number of generations passed down is becoming more stable, with most sampling budgets being more accurate than most voting baselines.

Conclusion: Moving towards a more dynamic and efficient LLM inference

In summary, Stanford researchers introduced fractional reasoning (FR), a framework without training and model inappropriate modeling to improve test time calculations by adaptively controlling inference behavior in LLMS. It provides a general and interpretable method to allocate computational work more accurately and efficiently in the inference process, thus overcoming the limitations of unified inference applications in current test time computing strategies. However, the framework currently depends on predefined inference directions and lacks automatic selection of scaling factors, suggesting that future research directions are complete dynamic inferences for adaptive strategies.

Check the paper. All credits for this study are to the researchers on the project. Ready to connect with 1 million+ AI development/engineers/researchers? See how NVIDIA, LG AI Research and Advanced AI companies leverage Marktechpost to reach target audiences [Learn More]


Sajjad Ansari is a final year undergraduate student from IIT Kharagpur. As a technology enthusiast, he delves into the practical application of AI, focusing on understanding AI technology and its real-world impact. He aims to express complex AI concepts in a clear and easy way.