Numina-Lean-MCP: Multi-Chain Planner in Lean
- Numina-Lean-MCP is a multi-chain planner that manages asynchronous proof chains for formal and informal reasoning in Lean.
- It integrates specialized tools like Lean-LSP, semantic retrieval, and informal proof sketch generators to enhance automated theorem proving.
- The system employs a scoring model for dynamic chain selection and tool invocation, enabling flexible, modular agentic reasoning workflows.
Numina-Lean-MCP is the multi-chain planner and controller at the center of the Numina-Lean-Agent system, supporting robust, multi-paradigm, agentic reasoning workflows for formal mathematics within the Lean proof assistant. It acts as a dispatcher that orchestrates parallel chains of mathematical reasoning, dynamically schedules tool invocations, and maintains a unified context for tool outputs, thereby enabling the integration of formal and informal proof strategies, semantic retrieval, and collaborative agent interaction without reliance on task-specific pipelines or retrained provers (Liu et al., 20 Jan 2026).
1. Role and Architectural Positioning
Numina-Lean-MCP ("Model-Context Protocol") functions as the conductor within the Numina-Lean-Agent architecture. Its primary responsibility is to maintain and orchestrate multiple active reasoning chains, where each chain can correspond to a subgoal, auxiliary lemma, proof strategy, or exploration avenue. Chains progress asynchronously and are dynamically reprioritized based on their states and urgency. MCP determines, at every iteration, which specialized tool to invoke—including Lean-LSP (Lean’s Language Server Protocol), the LeanDex semantic retrieval engine, an Informal Prover (generator-verifier loop), or an external Discussion Partner (an LLM interface for collaborative brainstorming). This orchestration supports:
- Autonomous, context-driven selection of specialized tools without hard-coded per-theorem scripts.
- Interleaving high-level, informal reasoning—including human-style sketching and outline verification—with formal Lean execution.
- Flexible extensibility: new tools and reasoning capabilities can be introduced under a coherent, unified planning interface.
By handling these tasks generically, Numina-Lean-MCP obviates the need for specialized, pipeline-centric frameworks and supports direct interaction between a general coding agent (such as Claude Code) and the Lean formal system (Liu et al., 20 Jan 2026).
2. Design Principles and Workflow
At its core, Numina-Lean-MCP maintains:
- A queue of active reasoning chains, each encoding its partial proof state, outstanding subgoals, and a "TODO" action specifying its immediate next requirement.
- A registry of tool interfaces:
- Lean-LSP-MCP: Functions for file outlining, goal state querying, code execution, parallel tactic attempts, local search, and Lean’s semantic search tool (lean_loogle).
- LeanDex: Cross-package, semantic retrieval of relevant theorems or definitions.
- Informal Prover: A generator-verifier LLM loop for producing and critically assessing informal human-style proof sketches.
- Discussion Partner: Access to external LLMs for overcoming strategic impasses or brainstorming.
- A scoring/prioritization module to determine which chain to advance and which tool to utilize based on the current "TODO".
The iterative workflow is as follows:
- Inspect the leading chain in the queue and extract its current TODO, which may target:
- Dispatching a tactic or proof step to Lean.
- Requesting retrieval of missing lemmas/theorems.
- Generating informal (human-like) sketch arguments.
- Consulting a discussion agent when blocked.
- Select and invoke the appropriate tool interface according to the type of TODO.
- Update the chain’s proof state and feedback results—formal tactics, semantic retrievals, or informal sketches—into the agentic context.
- Reprioritize chains based on updated progress and urgency.
- Mark the chain as complete once all subgoals are discharged, merging results into any parent chain if relevant.
- Iterate until all chains are completed and a unified proof script is assembled (Liu et al., 20 Jan 2026).
3. Formal Definitions and Planning/Scoring Model
The planning and scoring regime for chain selection and tool invocation is formally defined as follows:
Let
The score of each chain is given by
where:
- measures urgency or recency for the TODO,
- is the number of remaining subgoals,
- quantifies task completion proximity (e.g., fraction of Lean goals resolved).
Chain selection:
Tool selection policy:
The coefficients can be hand-designed or, in principle, learned from prior proof runs (Liu et al., 20 Jan 2026).
4. Main Execution Loop
A stylized pseudocode for the MCP’s execution follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Initialize 𝒞 ← {initial goal chain}; register all tool interfaces
while some c ∈ 𝒞 is not complete:
For each chain c_i ∈ 𝒞, compute Score(c_i)
c* ← argmax_{c_i} Score(c_i)
τ ← next action type of c* # Lean / Retrieve / Informal / Discuss
tool ← SelectTool(c*)
o ← tool.Call(state(c*))
state(c*) ← UpdateChain(c*, o)
if new subchains were spawned (e.g., from blueprint decomposition):
add them to 𝒞
if chain c* is complete:
remove c* from 𝒞; merge into parent if needed
return concatenation of all completed chain proofs |
5. Tool-Orchestrated Reasoning: Case Examples
Numina-Lean-MCP's flexible orchestration is evidenced in several scenarios:
a) Theorem Retrieval
A subgoal requiring a sum-of-squares simplification triggers MCP to invoke lean_loogle("sum_of_squares formula") and lean_local_search, yielding library facts such as Finset.sum_range_succ and the closed formula for , which are then injected into the formal context.
b) Informal Proof Sketching
Encountering a gap in the induction strategy (as in Putnam B4), MCP invokes the Informal Prover for outline generation:
- The Generator suggests “Use strong induction and rewrite .”
- The Verifier detects a mis-index, prompting refinement.
- After several generator-verifier loops, a correct sketch emerges. Claude Code then translates this outline into formal Lean tactics.
c) Formal Proof Execution
Given all required lemmas, MCP calls lean_run_code("by induction n; simp; rw …; field_simp; ring"). Upon receipt of a diagnostic “🏷 goal closed” from Lean-LSP-MCP, the corresponding chain is marked as discharged, and remaining subgoals are reprioritized (Liu et al., 20 Jan 2026).
6. Significance for Agentic Formal Mathematics
Numina-Lean-MCP enables agentic mathematical reasoning that is modular, tool-agnostic, and responsive to the idiosyncrasies of mathematical discovery—including the alternation between informal human-inspired sketches and machine-verifiable Lean proofs. By supporting queues of asynchronous proof chains, flexible tool invocation, and context-aware scheduling, MCP allows the agent to:
- Seamlessly bridge formal and informal reasoning without manual intervention.
- Integrate evolving reasoning tools or LLM backends via a unified control plane.
- Scale from single-proof strategies (e.g., Putnam 2025) to interactive projects (e.g., formalizing the Brascamp–Lieb theorem).
This approach sidesteps the rigidity of pre-baked pipelines and enables open-ended exploration, robust to the unpredictability of mathematical problem solving. A plausible implication is that such a multi-chain, tool-orchestrating dispatcher could become a standard interface pattern for future collaborative theorem proving agents and complex mathematical assistants (Liu et al., 20 Jan 2026).
7. Relation to Broader Formalization Efforts
While Numina-Lean-MCP centers on Lean and formal mathematics, its architectural paradigm—multi-chain planning, dynamic tool orchestration, and autonomous agent-tool interface—addresses general challenges found in symbolic reasoning across domains. In geometry, for instance, systems such as LeanGeo (Song et al., 20 Aug 2025) address integrative workflows by merging analytic and synthetic reasoning. Numina-Lean-MCP’s design is extensible to such cases, allowing the interleaving of synthetic tactics, external SMT solvers, and analytic libraries under a single agentic controller—a capability essential for tackling diverse mathematical domains and enhancing LLM-based automated reasoning (Liu et al., 20 Jan 2026, Song et al., 20 Aug 2025).