Effective Strategies for Asynchronous Software Engineering Agents

Published 23 Mar 2026 in cs.CL and cs.AI | (2603.21489v1)

Abstract: AI agents have become increasingly capable at isolated software engineering (SWE) tasks such as resolving issues on Github. Yet long-horizon tasks involving multiple interdependent subtasks still pose challenges both with respect to accuracy, and with respect to timely completion. A natural approach to solving these long-horizon tasks in a timely manner is asynchronous multi-agent collaboration, where multiple agents work on different parts of the task at the same time. But effective application of multi-agent systems has proven surprisingly difficult: concurrent edits by multiple agents interfere with each other, dependencies are difficult to synchronize, and combining partial progress into a coherent whole is challenging. On the other hand, human developers have long relied on mature collaboration infrastructure to manage these challenges in large software projects. Inspired by these collaboration primitives, we introduce Centralized Asynchronous Isolated Delegation (CAID), a structured multi-agent coordination paradigm grounded in three core SWE primitives: centralized task delegation, asynchronous execution, and isolated workspaces. CAID constructs dependency-aware task plans through a central manager, executes subtasks concurrently in isolated workspaces, and consolidates progress via structured integration with executable test-based verification. In empirical evaluation, we find that CAID improves accuracy over single-agent baselines by 26.7% absolute on paper reproduction tasks (PaperBench) and 14.3% on Python library development tasks (Commit0). Through systematic analysis, we find that branch-and-merge is a central coordination mechanism for multi-agent collaboration, and that SWE primitives such as git worktree, git commit, and git merge enable it to be realized in a reliable and executable manner.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper demonstrates that CAID achieves substantial accuracy gains over single-agent baselines using branch-and-merge coordination.
It shows that isolated workspaces with git worktrees effectively mitigate conflicts, ensuring robust parallel agent execution.
Empirical results on Commit0 and PaperBench benchmarks highlight CAID's efficiency in managing long-horizon software engineering tasks.

Effective Strategies for Asynchronous Software Engineering Agents: The CAID Paradigm

Introduction

The paper "Effective Strategies for Asynchronous Software Engineering Agents" (2603.21489) presents CAID (Centralized Asynchronous Isolated Delegation), a multi-agent coordination architecture designed for long-horizon software engineering (SWE) tasks. The methodology leverages established SWE primitives—centralized delegation, asynchronous agent execution, and isolated workspaces via git worktrees—to enable multiple LLM-based agents to operate concurrently while mitigating interference, ensuring robust integration, and maximizing progress on shared codebases.

CAID is motivated by the persistent challenges in multi-agent SWE: concurrent edits often lead to silent conflicts, inconsistent repository states, and integration failures. Unlike prior research that predominantly focuses on role-based or conversational decomposition, CAID systematically maps human developer workflows (branch-and-merge, dependency management, test-centric validation) into the agent coordination paradigm. The architecture's empirical evaluation spans two benchmarks—Commit0 (Python library generation) and PaperBench (research paper reproduction)—demonstrating substantial accuracy gains over single-agent baselines.

CAID Architecture and Methodology

SWE Primitives as Coordination Mechanisms

CAID operationalizes SWE primitives directly as agent coordination constructs:

Dependency Graph Modeling: Task decomposition is formalized via dependency graphs, enabling safe parallelization only when inter-file/function dependencies are satisfied.
Workspace Isolation (Git Worktree): Each engineer agent is assigned an isolated git worktree, preventing cross-agent overwrites and guaranteeing physical separation of concurrent edits.
Structured Communication (JSON + Git Commit): Manager-engineer interactions avoid language-based ambiguity by using machine-parsable JSON instructions and explicit commit signals.
Branch-and-Merge Integration: Progress from individual engineers is merged into the main branch through standard git operations, surfacing conflicts for resolution by the responsible agent.
Figure 1: Overview of CAID Workflow illustrating task decomposition, workspace allocation, asynchronous engineer execution, and branch-based integration.

Task Delegation and Execution

The central manager exploits repository-level and paper-level dependency analyses to partition the implementation into discrete, parallelizable units. Delegation decisions prioritize test-executable, high-impact tasks and adapt dynamically as engineers complete subtasks.

Engineers self-verify implementations locally, running isolated test suites and resolving errors before upstream integration.
Merge conflicts, surfaced during integration, are resolved by the initiating engineer, maintaining main branch consistency.
The asynchronous event loop enables manager reactivity: task reassignment occurs as soon as any engineer completes their current unit, avoiding idle time.

Empirical Results and Findings

Baseline Comparisons and Branch-and-Merge Impact

CAID delivers robust accuracy improvements relative to single-agent baselines:

PaperBench: CAID yields a 26.7% absolute improvement, with weaker models (e.g., MiniMax 2.5) jumping from 10.4% to 36.7% pass rate under multi-agent execution.
Commit0: Gains are similarly pronounced (14.3% absolute for Python library tasks), with both strong (Claude 4.5 Sonnet) and weak models benefiting from explicit coordination.

CAID's superiority is not attributable to increased agent iteration budgets; simply extending single-agent runtime provides negligible improvements and may even regress performance. Instead, explicit parallelism, isolation, and integration are critical (Table results).

Figure 2: CAID's iteration utilization and final score outpace single-agent systems across varying iteration budgets.

Scalability and Coordination Tradeoffs

Algorithmic scaling with more engineers does not yield monotonic performance increases. Over-parallelization beyond intrinsic task modularity and manager delegation capacity induces integration instability and overhead.

Figure 3: Performance versus number of engineers; excessive parallelism leads to diminished returns and increased cost.

Ablation shows that physical workspace isolation (via git worktree) is superior to context-level, instruction-based isolation. When repository structure is implicit or delegation is coarse-grained, shared workspaces exacerbate miscoordination.

Execution Trajectories and Manager Delegation

CAID's manager-driven delegation determines execution outcomes: targeting critical dependencies (high test-impact files) maximizes pass rates and overall integration quality. Failure modes arise when manager assignments neglect key modules, regardless of agent activity levels.

Figure 4: Divergent execution trajectories (Gantt plots) underline the impact of manager delegation decisions on outcome stability.

Parallelism Limits and Delegation Quality

Scaling up engineers increases theoretical parallelism, but practical progress depends on disciplined task partitioning and manager capacity. Delegation that neglects workspace ownership boundaries creates merge conflicts or fragmented states.

Figure 5: Overlapping engineer assignments on shared files (N=8) induce integration risk, highlighting delegation's centrality.

Practical and Theoretical Implications

CAID demonstrates that SWE primitives (branch-and-merge, workspace isolation, structured delegation) are not only sufficient but necessary for reliable multi-agent collaboration on long-horizon codebases. The approach outperforms naive sequential execution and ad hoc fallback strategies, which incur additive runtime and cost with negligible gains. Coordination overhead (API cost, wall-clock time) is an acknowledged trade-off; however, for tasks with explicit dependency and integration requirements, such overhead is necessary for correctness and efficiency.

Generalizing beyond SWE, CAID's architectural principles can extend to any artifact-oriented, dependency-rich domain (document synthesis, research planning), contingent on the availability of isolation and structured integration mechanisms.

Limitations and Future Directions

CAID's effectiveness is bounded by the delegation proficiency of the central manager and intrinsic task modularity. Scaling agent populations requires advances in adaptive, learned task assignment and dependency analysis, potentially via RL-based planning modules. Non-SWE domains lacking explicit version control or test-based validation will demand alternative forms of integration and workspace isolation.

For SWE, optimizing the cost-performance frontier (minimizing redundant verification, streamlining merge decision boundaries) is a promising avenue, as is integrating architectural insights from production-scale agent orchestration frameworks.

Conclusion

CAID establishes branch-and-merge coordination, grounded in SWE primitives, as a default paradigm for multi-agent SWE agents on complex, long-horizon tasks. Explicit task decomposition, workspace isolation, and test-gated integration enable parallel, coordinated agent execution, yielding substantially higher accuracy and robustness than single-agent or loosely coordinated baselines. The methodology addresses fundamental coordination failures in shared-artifact environments and informs the design of future AI workflows requiring scalable, disciplined collaboration.

Markdown Report Issue