Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework

Published 9 Jul 2025 in cs.CL | (2507.06829v1)

Abstract: Recent advances in LLMs have accelerated progress toward artificial general intelligence, with inference-time scaling emerging as a key technique. Contemporary approaches leverage either sequential reasoning (iteratively extending chains of thought) or parallel reasoning (generating multiple solutions simultaneously) to scale inference. However, both paradigms face fundamental limitations: sequential scaling typically relies on arbitrary token budgets for termination, leading to inefficiency or premature cutoff; while parallel scaling often lacks coordination among parallel branches and requires intrusive fine-tuning to perform effectively. In light of these challenges, we aim to design a flexible test-time collaborative inference framework that exploits the complementary strengths of both sequential and parallel reasoning paradigms. Towards this goal, the core challenge lies in developing an efficient and accurate intrinsic quality metric to assess model responses during collaborative inference, enabling dynamic control and early termination of the reasoning trace. To address this challenge, we introduce semantic entropy (SE), which quantifies the semantic diversity of parallel model responses and serves as a robust indicator of reasoning quality due to its strong negative correlation with accuracy...

Abstract PDF Upgrade to Chat

Summary

The paper introduces SEAT, a framework that leverages semantic entropy to dynamically terminate multi-round parallel reasoning for efficient performance.
It employs Monte Carlo approximation and optimal stopping theory to quantify semantic diversity and guide early stopping decisions.
Extensive experiments across benchmarks show significant accuracy improvements (up to 85.67%) and better resource allocation without extra fine-tuning.

Adaptive Termination for Multi-round Parallel Reasoning

Introduction

The paper "Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework" (2507.06829) addresses the limitations of current LLMs in both sequential and parallel reasoning. While LLMs have advanced artificial general intelligence, they encounter constraints such as inefficient termination and lack of coordination among parallel branches. The paper proposes a novel framework named SEAT (Semantic Entropy-Adaptive Termination), which aims to mediate these issues by integrating the strengths of both reasoning paradigms through a dynamic quality metric: semantic entropy (SE).

Semantic Entropy as a Quality Metric

A fundamental insight of the paper is the introduction of semantic entropy (SE) as a robust metric for assessing reasoning quality. SE quantifies the semantic diversity of model responses, showing a strong negative correlation with response accuracy: higher SE often indicates lower model accuracy. This relationship serves as the cornerstone of the proposed framework, enabling adaptive termination by dynamically adjusting the reasoning process based on SE metrics.

Figure 1: Strong negative correlation between semantic entropy and model accuracy on Math-500 benchmark.

SEAT Framework: Design and Implementation

SEAT comprises a plug-and-play framework leveraging parallel and sequential reasoning. It dynamically adjusts parallelization degrees and employs SE-based stopping criteria, which can either be statistical or inspired by optimal stopping theory. This versatility ensures maximized efficiency without sacrificing performance. The framework's architecture is illustrated in (Figure 2).

Figure 2: The overview of our proposed SEAT.

Key Components

Multi-round Parallel (MRP) Inference: SEAT constructs an $N \times M$ reasoning structure, combining diverse parallel explorations and sequential refinements, thereby facilitating error correction and robust performance without additional fine-tuning.
Semantic Entropy Calculation: By employing a Monte Carlo approximation, SE quantifies the uncertainty in responses, providing a signal for potential early stopping to avoid unnecessary computational expenses if the model output is unlikely to improve.
Adaptive Termination Mechanism: This is implemented in two forms:
- Pre-defined SE Threshold: Utilizes prior empirical SE distributions to establish stopping thresholds. SE below the 20th percentile of the distribution prompts termination.
- Adaptive Threshold-free Mechanism: Inspired by optimal stopping theory, this approach continually evaluates SE against a dynamic baseline, terminating when SE falls below it, thereby eliminating pre-sampling needs.
  Figure 3: SE and accuracy evolution across inference rounds in the R7B model.

Experimental Results

Comprehensive evaluations were conducted across multiple benchmarks (AIME-2024, AIME-2025, MATH-500, MINERVA, and GPQA) using 7B and 32B models. The experiments demonstrated substantial accuracy improvements with SEAT. For instance, the R32B model showed an accuracy increase from 70.83% to 85.67% on AIME-2024 and similar gains for the R7B model. Notably, SEAT effectively mitigated performance degradation in smaller models by preventing "semantic entropy collapse," a situation where smaller models output overly confident, incorrect answers (Figure 4).

Figure 4: Semantic entropy distribution highlighting correct answer proportion within low entropy regions.

Future Implications

This research opens paths for LLMs to manage computational resources more effectively through test-time scaling. By employing semantic entropy as an intrinsic reasoning quality indicator, LLMs can more adaptively allocate effort to difficult problems. Future work may expand the SEAT framework to accommodate other unsupervised indicators and integrate complementary techniques like majority voting to further augment reasoning performance.

Conclusion

The SEAT framework represents a significant advance in test-time scaling strategies by integrating semantic entropy as a metric for adaptive termination in reasoning tasks. Through its innovative merge of parallel and sequential strategies, SEAT enables more efficient and effective LLM reasoning. This approach mitigates the risk of semantic entropy collapse, particularly vital for maintaining performance in smaller models, thus broadening the scope of practical applications in AI.

Markdown Report Issue