- The paper introduces a novel dynamic framework that adaptively routes communications via semantic matching, enhancing multi-agent reasoning.
- It employs lightweight need and offer descriptors with a fixed sentence encoder to construct sparse, goal-aligned communication graphs per round.
- Empirical results show up to +17.14 point accuracy gains on challenging benchmarks while significantly reducing token usage and latency.
DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching
Motivation and Background
LLM-based multi-agent systems hold promise for multi-round reasoning tasks, but most existing frameworks rely on trajectory-wide fixed or scripted communication topologies that fail to reflect the stage-dependent requirements of iterative problem solving. Existing approaches in both MARL and LLM agent orchestration often ignore the need for dynamically adaptive communication pathways, especially as task context and subgoals evolve interactivel. This limitation induces inefficient exploration in early rounds and burdens later stages with unnecessary message traffic or context overload.
DyTopo Framework and Methodology
DyTopo addresses these deficiencies by introducing a manager-guided, round-adaptive dynamic topology for multi-agent LLM-based teams. At each communication round, the manager issues a round-level goal, which directly conditions the reasoning trajectory. Each agent autonomously generates lightweight natural-language query ("need") and key ("offer") descriptors describing its information requirements and offered capabilities, respectively. These descriptors are then mapped into a semantic embedding space using a fixed sentence encoder.
A per-round directed agent communication graph is induced via hard-thresholded cosine similarity between queries and keys, yielding a sparse, goal-conditioned topology. Only those agents whose offered key semantically aligns with another’s query will be permitted to route private messages. This approach decouples the agents’ generative outputs from message routing, efficiently aligning information flow with the collaboration’s current subgoal. The full manager–agents workflow incorporates a strict synchronization barrier, ensuring single-pass inference per agent and consistent context/buffer updates.
The DyTopo protocol operates under strict efficiency constraints: each agent is limited to a single forward pass per round with context capped by explicit topology-induced message routing. The manager orchestrates rounds, updating the context and deciding on early halting. Topological ordering for memory update is determined via topological sort (if acyclic) or a greedy in-degree heuristic (if cyclic), establishing deterministic and reproducible traceability.
Empirical Evaluation and Results
DyTopo was evaluated on a battery of code generation and mathematical reasoning benchmarks (HumanEval, APPS-Competition, MATH-500, Omni-MATH) using multiple open and proprietary LLMs, including MiMo-V2-Flash, GPT-oss-120B, Llama-3-8B, and Qwen3-8B. Across all model–task pairs (16 in total), DyTopo yielded the highest accuracy, exceeding the best non-DyTopo baseline by an average of +6.09 points and up to +17.14 points on the more challenging mathematical datasets. Notable improvements on high-difficulty tasks corroborate the hypothesis that dynamic, content-driven routing is crucial for multi-hop, error-prone reasoning (2602.06039).
Qualitative analyses demonstrate that DyTopo’s evolving communication graphs are strongly goal-aligned: early rounds yield denser, highly interconnected graphs supporting exploration and decomposition, while later rounds collapse to sparse, selective routing for targeted verification and solution assembly. Importantly, round-level traceability enables fine-grained inspection of information flow, error localization, and critical agent–agent dependencies at each reasoning stage.
Ablation studies on the semantic similarity threshold reveal task sensitivity: lower thresholds risk over-communication and context bloat, while excessive sparsity impedes effective coordination. Results show an empirically optimal threshold varies between tasks (0.3–0.4), reinforcing the need for fine-grained, task-adaptive configuration.
DyTopo's manager-controlled halting mechanism enables rapid convergence: on HumanEval, DyTopo converges in an average of 2.6 rounds, yielding 48% of the token consumption of the AgentScope baseline with superior accuracy. Latency is also sharply reduced by the minimization of irrelevant context and early termination.
Theoretical and Practical Implications
The structure of the DyTopo framework offers several theoretical and practical advances:
- Separation of content generation and communication routing: Agents describe their needs and offerings, while topology is induced externally and adaptively, providing modularity, interpretability, and scalability for larger agent teams.
- Stage-aware communication structure: Task- and round-conditioned topology adapts seamlessly to the differing informational requirements of exploration, decomposition, verification, and assembly phases within multi-step reasoning.
- Explicit coordination trace: By making the communication graph a first-class, externally observable object, DyTopo facilitates debugging, error analysis, and topology-aware performance optimization.
- Computational efficiency: Token usage and wall-clock time are significantly reduced due to sparse, need-aligned routing and manager-mediated early halting, critical for practical deployment in resource-constrained environments.
Limitations and Failure Modes
While DyTopo demonstrates strong and robust improvements, several caveats merit mention. Performance depends on the quality, relevance, and honesty of the natural-language need/offer descriptors: misleading or underspecified descriptors can result in routing failures and error propagation. Security and privacy concerns may arise if coordination traces and agent outputs contain sensitive information. Dynamic topology introduces hyperparameters (thresholds, max in-degree) that require tuning for optimal performance.
Outlook and Future Directions
DyTopo establishes a new paradigm in multi-agent LLM reasoning systems, shifting emphasis from fixed infrastructure to round-level, task-adaptive, semantically grounded communication. Future research may explore: learned, context-adaptive descriptor generation; tighter integration of the topology induction process with end-task reward signals; scaling to larger, heterogeneous agent pools; and combinatorial optimization of both agent skills and topology formation. An important avenue is extending DyTopo to agent–tool–human collaboration, where agents dynamically invoke tools or seek clarifications with human-in-the-loop, coordinated by adaptive routing.
Conclusion
DyTopo introduces a robust, interpretable, and efficient approach for orchestrating multi-round LLM-based agent collaboration. By leveraging semantic matching of lightweight agent descriptors, it aligns information flow with fine-grained task and reasoning stage requirements, demonstrating superior empirical results across diverse problem domains and LLM backbones. The framework’s capacity for transparent, analyzable coordination offers new possibilities for debugging, optimization, and scaling of agentic reasoning systems in AI.