Papers
Topics
Authors
Recent
Search
2000 character limit reached

MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch

Published 1 Jun 2025 in cs.AI and cs.SE | (2506.01056v1)

Abstract: Function-calling has enabled LLMs to act as tool-using agents, but injecting thousands of tool schemas into the prompt is costly and error-prone. We introduce MCP-Zero, a proactive agent framework that lets the LLM itself decide when and which external tools to retrieve, thereby assembling a task-specific toolchain from scratch. The framework is built upon three components: (1) Proactive Tool Request, where the model emits a structured $\left<\operatorname{tool_assistant}\right>$ block that explicitly specifies the desired server and task; (2) Hierarchical Vector Routing, a coarse-to-fine retrieval algorithm that first selects candidate servers and then ranks tools within each server based on the semantic similarity; (3) Iterative Proactive Invocation, enabling multi-round, cross-domain toolchain construction with minimal context overhead, and allowing the model to iteratively revise its request when the returned tools are insufficient. To evaluate our approach we also compile MCP-tools, a retrieval dataset comprising 308 MCP servers and 2,797 tools extracted from the official Model-Context-Protocol repository and normalized into a unified JSON schema. Experiments show that MCP-Zero (i) effectively addresses the context overhead problem of existing methods and accurately selects the correct tool from a pool of nearly 3,000 candidates (248.1k tokens); (ii) reduces token consumption by 98\% on the APIbank while maintaining high accuracy; and (iii) supports multi-turn tool invocation with consistent accuracy across rounds. The code and dataset will be released soon.

Summary

  • The paper presents an active tool discovery framework that allows LLM agents to autonomously request and retrieve contextually relevant tools.
  • It employs hierarchical semantic routing with OpenAI text embeddings to efficiently rank candidates, significantly reducing search complexity.
  • The framework achieves remarkable token efficiency and robust scalability, demonstrated by a 98% reduction in token usage and high accuracy in extreme scenarios.

MCP-Zero: Active Tool Discovery for Autonomous LLM Agents

Introduction and Motivation

The MCP-Zero framework addresses a critical bottleneck in the design of autonomous LLM agents: the inefficiency and lack of autonomy in current tool-calling architectures. Existing paradigms either inject comprehensive tool schemas into the prompt—leading to excessive context length and cognitive overload—or rely on static retrieval based on initial user queries, which fails to adapt to evolving task requirements. MCP-Zero proposes a shift from passive tool selection to active tool discovery, restoring decision authority to the agent and enabling dynamic, on-demand capability acquisition.

Framework Architecture

MCP-Zero is built on three synergistic mechanisms:

  1. Active Tool Request: The agent autonomously generates structured requests specifying the required server domain and tool operation, rather than relying on user queries or pre-selected tool sets. This ensures semantic alignment between agent needs and tool documentation.
  2. Hierarchical Semantic Routing: A two-stage retrieval algorithm first filters candidate servers by semantic similarity, then ranks tools within selected servers. Embeddings are computed using OpenAI text-embedding-3-large, and matching leverages both original and LLM-generated server summaries for improved precision.
  3. Iterative Capability Extension: The agent can iteratively refine its tool requests throughout task execution, enabling dynamic construction of cross-domain toolchains and natural fault tolerance. This process continues until suitable tools are found or the agent determines that no external assistance is required. Figure 1

    Figure 1: MCP-Zero's iterative invocation enables the agent to identify capability gaps and request tools across multiple domains as the task evolves.

This architecture fundamentally reduces context overhead and search complexity, shifting from O(n)O(n) exhaustive search over all tools to O(m+k)O(m + k), where mm is the number of servers and kk is the number of tools per server.

Theoretical Analysis

MCP-Zero's active paradigm is formalized as an information acquisition process, where the agent generates requests to maximize information gain about task completion. The mutual information between the optimal tool set and the request is maximized, reducing uncertainty and focusing attention on relevant subsets of the tool ecosystem. Semantic alignment is improved by operating in the same embedding space as tool documentation, yielding higher retrieval precision compared to user-query-based matching.

Key theoretical advantages include:

  • Complexity Reduction: Hierarchical routing reduces the search space from all tools to relevant servers and tools.
  • Semantic Consistency: Agent-generated requests are more closely aligned with tool documentation.
  • Adaptive Capability: The agent's tool discovery evolves with its understanding of the task, supporting progressive breakdown and cross-domain coordination.

MCP-Tools Dataset

To support evaluation, the authors introduce MCP-tools, a retrieval-oriented dataset comprising 308 servers and 2,797 tools from the official Model Context Protocol repository. The dataset includes both original and LLM-generated summaries, pre-computed embeddings, and a standardized schema for semantic matching. This infrastructure enables reproducible evaluation of tool discovery systems and complements performance-focused frameworks such as MCPBench.

Experimental Evaluation

Needle-in-a-Haystack Experiments

The framework is evaluated under extreme scale conditions, where the agent must retrieve relevant tools from collections ranging from 1 to 2,797 candidates. MCP-Zero demonstrates substantial gains in both accuracy and token efficiency compared to baseline methods. Figure 2

Figure 2: MCP-Zero achieves strong performance in needle-in-a-haystack tests, outperforming baselines on Claude-3.5-Sonnet and Gemini-2.5-Flash, with GPT-4.1 maintaining high baseline accuracy.

Figure 3

Figure 3: MCP-Zero maintains low average token cost per successful retrieval, even as tool collection size increases.

APIBank Evaluation

On the APIBank dataset, MCP-Zero achieves a 98% reduction in token consumption while maintaining high accuracy across both single-turn and multi-turn scenarios. Notably, MCP-Zero remains robust as the tool pool scales, with accuracy dropping only marginally compared to severe degradation in baseline methods. Query-retrieval baselines stall at 65–72% accuracy, confirming the necessity of agent-authored, semantically aligned requests.

Implementation Guidance

The framework is straightforward to integrate into existing agent systems:

  1. Prompt the LLM to actively request tools using a structured output block.
  2. Curate a lightweight tool index with semantic descriptions and pre-computed embeddings.
  3. Match agent requests to tool documentation via hierarchical semantic routing, feeding top candidates back to the agent for invocation.

A single in-context learning example further improves semantic alignment and retrieval precision, acting as a stylistic and semantic anchor for the agent's requests.

Implications and Future Directions

MCP-Zero establishes active tool discovery as a fundamental design pattern for scalable, autonomous agent systems. By restoring decision autonomy and enabling dynamic capability acquisition, the framework addresses both practical and theoretical limitations of existing architectures. The synergy with systems such as Alita—where agents can not only discover but also synthesize new tools—points toward self-evolving, cost-aware agentic AI.

Future work should explore:

  • Enhanced matching algorithms incorporating multi-modal descriptions and usage patterns.
  • Packaging MCP-Zero as a dedicated MCP server for meta-discovery.
  • Multi-agent orchestration for collaborative tool sharing and dynamic capability extension.

Conclusion

MCP-Zero represents a significant advancement in autonomous agent design, demonstrating that active, iterative tool discovery yields substantial efficiency gains and robust scalability. The framework's theoretical foundations, empirical validation, and supporting dataset provide a comprehensive infrastructure for future research in agentic AI and tool-augmented reasoning.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper is about teaching AI assistants (LLMs, or LLMs) to be more independent when using tools. Instead of dumping a huge list of possible tools into the AI’s prompt and asking it to pick one, the authors propose “MCP‑Zero,” a system that lets the AI actively ask for the exact tool it needs, when it needs it. This makes the AI faster, more accurate, and better at handling complex, multi-step tasks.

“MCP” stands for Model Context Protocol, a standard way for AIs to connect to many external tools (like file systems, web services, databases). The problem is: the more tools you add, the longer and heavier the AI’s prompt becomes. MCP‑Zero fixes this by letting the AI discover tools on demand, instead of loading everything up front.

What questions were the researchers trying to answer?

In simple terms, the paper asks:

  • Can an AI figure out what it’s missing and ask for the right tools by itself, instead of being given a giant, confusing toolkit?
  • Can it quickly find the best tool out of thousands by searching smartly, not just once at the start?
  • Will this approach still work well in long, multi-turn conversations as tasks change and grow?

How did they do it?

Think of a big “tool mall” with 2,797 tools across 308 “stores” (servers). You don’t want to carry every instruction manual around. Instead, you ask for what you need when you hit a roadblock, and use a smart directory to find the right store and tool. MCP‑Zero does exactly that, using three main ideas:

1) Active Tool Requests

When the AI realizes it needs help (like “I need to read a file” or “I need to run a command”), it writes a short, structured request that says:

  • which “store” (server) it wants
  • what kind of operation (tool) it needs

This is like the AI saying: “I need access to the File System server, and a tool to read files.” Because the AI describes its need clearly, it’s easier to match that request to the right tool documentation.

Example:

1
2
3
4
<tool_assistant>
server: filesystem
tool: read_file at path src/train.py
</tool_assistant>

Instead of searching all 2,797 tools at once, MCP‑Zero first:

  • narrows down the right servers (like picking the right store in the mall)
  • then ranks tools within those servers to find the best match

“Semantic” means it uses meaning, not exact words, so “read code” can match “open file” if they’re related. This two-step search is faster and more precise, like checking the floor directory to pick the right shop, then finding the right shelf in that shop.

3) Iterative Capability Extension

The AI doesn’t have to decide everything upfront. As the task unfolds, it can request new tools when needed, and build a multi-step chain:

  • read a file → edit code → run a command to test → maybe search the web, etc.

If a tool turns out not to be enough, the AI refines its request and tries again. This makes the system naturally fault-tolerant and adaptable.

Supporting dataset: MCP‑tools

To test MCP‑Zero properly, the authors built a dataset called MCP‑tools:

  • 308 MCP servers and 2,797 tools collected from the official MCP repository
  • Cleaned and summarized to improve matching
  • Pre-computed “embeddings” (numerical meaning representations) so searches are fast

Experiments they ran

  • Needle-in-a-haystack: Can the AI find the right tool when it’s hidden among thousands?
  • APIBank: A standard benchmark for tool calling, with single-turn and multi-turn conversations.

What did they find, and why does it matter?

Here are the main results:

  • Accurate selection at scale: MCP‑Zero can pick the right tool from nearly 3,000 choices while keeping prompts short.
  • Huge cost savings: Up to 98% fewer “tokens” (the tiny chunks of text AIs read), especially on APIBank tests, while keeping high accuracy. Fewer tokens means lower compute cost and faster responses.
  • Strong performance in multi-turn chats: It stays accurate even when the conversation gets longer and the tool ecosystem gets larger.
  • Better than “retrieve once” methods: Systems that pick tools only from the initial user query often miss later needs. MCP‑Zero’s iterative requests fix that.

In short: letting the AI actively ask for tools makes it smarter, cheaper, and more reliable.

What’s the impact of this research?

If we want truly autonomous AI agents, they need to do more than passively choose from a pre-loaded menu. They should notice their limits and actively get the tools they need—just like a good problem-solver does.

MCP‑Zero shows a practical way to:

  • Reduce prompt bloat in large tool ecosystems
  • Scale to thousands of tools without slowing down
  • Build flexible, cross-domain toolchains on the fly
  • Cut costs while keeping or improving accuracy

This could help developers build more capable AI assistants for coding, data work, research, and beyond. Future directions include combining MCP‑Zero with systems that can create brand-new tools when none exist, and turning MCP‑Zero into a “meta-server” that other agents can call to discover tools efficiently.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.