LEO-RobotAgent Framework
- LEO-RobotAgent is a language-driven robotic framework that integrates LLMs for automated task planning, modular tool invocation, and high-level reasoning.
- The architecture uses a self-cycling agent model with dynamic history updates and tool evaluations to achieve efficient and adaptive operation.
- Experimental evaluations demonstrate significant improvements in success rate, sim-to-real transfer, and efficiency across platforms like UAVs and wheeled robots.
LEO-RobotAgent is a general-purpose language-driven robotic agent framework that enables LLMs to perform automated operation, task planning, and high-level reasoning across a diverse range of robot types and environments, with a focus on modularity, generalization, efficiency, and robust human-robot interaction (Chen et al., 11 Dec 2025).
1. Formal Structure and Operational Cycle
LEO-RobotAgent is realized as a self-cycling agent, defined as the tuple :
- : The LLM, prompted to produce structured JSON outputs and equipped to handle both planning and reasoning.
- : The set of registered tools, where each tool is a quadruple .
- : The dynamic history at iteration , containing the user’s task description, all agent messages, tool observations, and human feedback.
- : The implicit agent policy, realized through the LLM and driven by the current history and tool set .
At each step, the system operates as follows:
- LLM processes system prompt and , outputting an object .
- The Executor dispatches .
- The history buffer is updated: .
- The loop halts if the agent explicitly declares either “Task Completed” or “Cannot Proceed”.
The agent’s planning objective is to generate action-parameter sequences that maximize expected cumulative reward:
subject to system dynamics and state/action constraints , where is a sparse success reward, are low-level tool parameters, and denotes the current (implicitly represented) world state (Chen et al., 11 Dec 2025).
2. Modular Toolset and Invocation Mechanics
LEO-RobotAgent’s extensible tool system decomposes perception, manipulation, planning, and communication actions into callable modules. Each tool is defined as:
Registration employs the following API:
1 2 |
def register_tool(name, fn, description, active=True): Tools[name] = { "fn": fn, "desc": description, "active": active } |
1 |
{ "Action": "<tool_name>", "Action_Input": { ... parameters ... } } |
3. Bidirectional Human–Robot Interaction
The framework integrates a real-time human–robot interaction loop supporting:
- History-mediated intent grounding: User utterances are appended directly to , with the LLM webbed as a “partner,” explicitly instructed to monitor, interpret, and immediately react to human input.
- Interruptibility and clarification: The policy is modified on-the-fly by user corrections, formally expressed as:
where is the embedded latest human correction. If ambiguity is detected, the agent may issue an “ask_clarification” meta-action, explicitly routing control to the human for further direction.
- Interleaved pipeline: Human input can preempt LLM-initiated actions, and the LLM will re-plan in context.
This architectural strategy systematically lowers barriers for non-expert users in embodied contexts and enables robust mixed-initiative collaboration (Chen et al., 11 Dec 2025).
4. Task Planning and Execution Pipeline
The end-to-end cycle progresses as:
- Task Input: Free-form, natural language instruction from the user.
- LLM Reasoning: Using the locked-down system prompt , the LLM generates both explanatory “Message” fields and prescriptive “Action”/“Action_Input” pairs as JSON.
- Execution: Action forwarded to the relevant tool, returning an observation.
- History Update: Augmentation of with all new reasoning, actions, parameters, and observations.
- Iteration: Looping continues until “Task Completed” or “Cannot Proceed”.
For motion tools, path generation solves the optimization:
with typically weighting travel and effort (Chen et al., 11 Dec 2025).
5. Experimental Evaluation and Comparative Performance
LEO-RobotAgent is validated across UAVs (simulation and real flights), wheeled mobile robots equipped with articulated arms, and complex maps (café-style, urban). The evaluation suite includes delivery, search, and handover tasks, with metrics:
- Success Rate (%)
- Average Time to Completion (s)
- Token Usage (number of LLM tokens)
- Task-specific Score (out of 10)
- Perfect Rate (% fully completed tasks)
Results for prompt engineering configurations are summarized below:
| Method | Success Rate | Token Usage | Time (s) | Time/item (s) |
|---|---|---|---|---|
| Zero-shot | 20% | 32,656 | 175.2 | 183.8 |
| One-shot | 50% | 32,048 | 156.7 | 123.4 |
| CoT | 60% | 37,791 | 155.0 | 126.9 |
| One-shot+CoT | 70% | 44,985 | 180.2 | 172.8 |
Agent architecture comparisons for key tasks:
| Task / Agent | DAS | CGE | DLLMs | TLLMs | LEO-Agent |
|---|---|---|---|---|---|
| Delivery (Score) | 9.16 | 9.34 | 7.94 | 8.38 | 9.16 |
| Searching (Score) | – | 5.38 | 3.13 | 5.88 | 7.88 |
| Handover (Score) | – | – | 4.93 | 4.87 | 7.87 |
| Perfect Rate (Handover) | – | – | 13.3% | 13.3% | 46.7% |
Ablation reveals chain-of-thought (CoT) and one-shot prompting drive a 50% relative increase in success rate. The single-LLM LEO-Agent structure is empirically more robust than decoupled (DLLMs/TLLMs) designs (Chen et al., 11 Dec 2025).
6. Generalization, Robustness, and Efficiency
The framework demonstrates substantial sim-to-real transfer: UAV search-and-drop achieves 90% success in simulation and 70% in the real world, with principal failures attributed to low-level tool or control errors, not to LLM planning. Performance plateaus at LLM parameter scales B, with inference latency in ROS deployments, indicating diminishing returns for further scaling (Chen et al., 11 Dec 2025).
7. Software Stack, Codebase, and Reproducibility
The implementation leverages ROS Noetic for agent-to-tool orchestration, with a modular structure:
agent_node.py: wraps LLM, maintains history, dispatches tool calls.tools/: each Python file implements a ROS node for perception, control, or summarization.web_ui/: front-end in React JS for registering tools, monitoring status, inputting tasks, and real-time video overlay.
Launch and reproduction:
- Clone
github.com/LegendLeoChen/LEO-RobotAgent - Install dependencies:
pip install -r requirements.txt; sudo apt install ros-noetic-rosbridge-server - Launch system:
roslaunch leo_agent agent_system.launch - Register tools and interact via web UI on localhost:8080 (Chen et al., 11 Dec 2025).
This architecture supports direct cross-platform application on UAVs, manipulators, and mobile platforms while maintaining extensibility, interpretability, and integration with modern LLMs.