Learning Adaptive Parallel Execution for Efficient Code Localization

Published 27 Jan 2026 in cs.AI and cs.SE | (2601.19568v1)

Abstract: Code localization constitutes a key bottleneck in automated software development pipelines. While concurrent tool execution can enhance discovery speed, current agents demonstrate a 34.9\% redundant invocation rate, which negates parallelism benefits. We propose \textbf{FuseSearch}, reformulating parallel code localization as a \textbf{joint quality-efficiency optimization} task. Through defining \textbf{tool efficiency} -- the ratio of unique information gain to invocation count -- we utilize a two-phase SFT and RL training approach for learning adaptive parallel strategies. Different from fixed-breadth approaches, FuseSearch dynamically modulates search breadth according to task context, evolving from exploration phases to refinement stages. Evaluated on SWE-bench Verified, FuseSearch-4B achieves SOTA-level performance (84.7\% file-level and 56.4\% function-level $F_1$ scores) with 93.6\% speedup, utilizing 67.7\% fewer turns and 68.9\% fewer tokens. Results indicate that efficiency-aware training naturally improves quality through eliminating noisy redundant signals, enabling high-performance cost-effective localization agents.

Abstract PDF Upgrade to Chat

Summary

The paper introduces FuseSearch, a reinforcement learning-based framework for adaptive parallel tool execution that improves code localization accuracy while reducing redundant tool usage.
The paper demonstrates that dynamic parallel execution with dual-metric optimization boosts file-level and function-level F1 scores and achieves significant speedups.
The paper shows that efficiency-driven training and adaptive tool invocation reduce inference turns and token usage, enhancing automated software engineering pipelines.

Learning Adaptive Parallel Execution for Efficient Code Localization

Introduction

Code localization remains a primary bottleneck in automated software engineering pipelines due to the need for agents to identify relevant source code entities for subsequent issue resolution. Existing LLM-based localization agents predominantly rely on sequential tool execution, which, under constrained interaction budgets, leads to either incomplete contextual acquisition (information starvation) or significant computational redundancy. The study introduces FuseSearch, a code localization framework that fundamentally shifts the quality-efficiency trade-off by learning adaptive parallel execution via reinforcement learning (RL), explicitly balancing localization accuracy and tool usage efficiency. Unlike prior approaches that enforce fixed parallelism or sequential search strategies, FuseSearch dynamically modulates the degree of parallel tool invocation per turn based on evolving context.

Figure 1: Parallel execution mitigates information starvation, but naive parallelization yields a 34.9% redundancy rate across tools.

FuseSearch Framework

Minimalist Toolset and Parallelization

FuseSearch adopts three language-agnostic, read-only tools (grep, glob, read_file), eschewing code graph construction or language-dependent parsing. This minimalism ensures immediate applicability across diverse codebases and focuses the agent policy on execution strategy rather than machinery.

Parallel execution is enabled by allowing multiple tool invocations per turn, all executed concurrently and aggregated before the next action. In prevailing sequential paradigms, excessive turn limits lead to poor context gathering, while fixed parallel tool enforcement yields redundant, uninformative tool calls (as high as 34.9% redundancy).

Dual-Metric Optimization: Quality and Efficiency

A central innovation is the tool efficiency metric ( $e$ ), defined as the mean ratio of novel information gain per tool call. Agents are jointly optimized for file-level and function-level $F_1$ (reflecting localization quality) and $e$ , which quantifies information novelty. The dual-objective RL reward function is:

$R(\tau) = \alpha \cdot F_1(\tau) + \gamma \cdot (F_1(\tau) \cdot e(\tau))$

This objective captures both the necessity for high accuracy and the reduction of redundant exploration, penalizing empty or duplicate tool invocations.

Training Pipeline

FuseSearch is trained through a two-stage approach:

Supervised Fine-Tuning (SFT): Trajectories for SFT are generated with explicit parallel tool usage (2-8 tools/turn) and filtered to ensure both high $F_1$ and high $e$ (joint metric filtering).
Reinforcement Learning (RL): Group Relative Policy Optimization (GRPO) further refines the policy, using the dual-objective reward to encourage strategic adaptive parallelism.

This pipeline yields compact Qwen3 models (4B and 30B), which, after RL, demonstrate both strong localization and efficiency.

Figure 2: FuseSearch system: parallel tool execution at inference, dual-metric trajectory filtering for SFT, and RL optimization with the joint reward.

Experimental Results

Benchmark and Baseline Comparison

Evaluation on SWE-bench Verified demonstrates that FuseSearch-4B achieves 84.7% file-level $F_1$ and 56.4% function-level $F_1$ , on par with much larger proprietary models and specialized agent systems. Notably, it realizes a 93.6% speedup, requiring 67.7% fewer turns and 68.9% fewer tokens than sequential LLM agents. Minimalist sequential agents are sound, but the addition of adaptive parallelism realized by efficiency-aware RL is necessary for optimal cost-accuracy balance.

Strong claims validated by the results include:

Adaptive parallelization, not fixed parallel breadth, is critical for achieving both high $F_1$ and minimal redundancy.
Efficiency-driven training not only cuts cost but also improves quality by eliminating redundant or noisy context, a counterintuitive finding since typically, parallelization or cost reduction can degrade precision.
Figure 3: Number of tools per turn across training, with RL learning initial broad exploration followed by focused refinement.

Training Analysis

SFT bootstraps effective parallel tool usage, enhancing recall but with increased redundant accesses. RL using the joint reward transitions the agent from overly aggressive parallelism toward context-sensitive, adaptive parallelism: broad initial exploration yielding to single-stream, high-precision refinement in later turns. Only the joint objective reliably optimizes both $F_1$ and $e$ ; single-metric optimization degrades one for the other. The effect generalizes across models (4B and 30B) and benchmarks (LocBench).

Downstream and Practical Implications

Incorporating FuseSearch as a preprocessing or sub-agent in end-to-end code repair pipelines further reduces inference time and token usage for downstream agents (e.g., Kimi-K2-Instruct), cutting task completion time by up to 28.5% without harming final pass rates. The framework thus functions as a turn/key-token accelerator for production systems, with high utility for agent orchestration in automated software engineering.

Figure 4: System prompt structure guiding the agent’s two-part output, separating must-modify locations from auxiliary context.

Ablation and Analysis

Parallel vs. Sequential: Parallel agents with adaptive breadth outperform sequential ones in both cost and $F_1$ , even controlling for total tool use.
Joint SFT Filtering: Demonstration filtering on both $F_1$ and $e$ produces data distributions essential for robust parallel policy learning.
Reward Design: Only a multiplicative $F_1 \cdot e$ term reliably produces models displaying both high quality and high efficiency.

Limitations

Current evaluations rely on gold patch ground truths, limiting the assessment of alternative correct localizations. The experiments focus on Python-centric benchmarks; future benchmarks covering statically typed or non-Python repositories are needed. Generalization to non-issue-driven code search (e.g., QA, comprehension, documentation generation) is not explored.

Conclusion

FuseSearch establishes a minimalist yet robust paradigm for code localization via RL-driven adaptive parallel execution, achieving SOTA-level quality with substantial search acceleration and computational savings. The explicit integration of tool efficiency into the reward function enables strong localization performance, minimal redundancy, and practical deployment on modest model scales. This approach provides theoretical and pragmatic foundations for multi-tool agent orchestration in automated software engineering, and opens avenues for efficiency-aware RL in other large search-space reasoning tasks.

Reference: "Learning Adaptive Parallel Execution for Efficient Code Localization" (2601.19568)

Markdown Report Issue