LEO-RobotAgent Framework

Updated 3 February 2026

LEO-RobotAgent is a language-driven robotic framework that integrates LLMs for automated task planning, modular tool invocation, and high-level reasoning.
The architecture uses a self-cycling agent model with dynamic history updates and tool evaluations to achieve efficient and adaptive operation.
Experimental evaluations demonstrate significant improvements in success rate, sim-to-real transfer, and efficiency across platforms like UAVs and wheeled robots.

LEO-RobotAgent is a general-purpose language-driven robotic agent framework that enables LLMs to perform automated operation, task planning, and high-level reasoning across a diverse range of robot types and environments, with a focus on modularity, generalization, efficiency, and robust human-robot interaction (Chen et al., 11 Dec 2025).

1. Formal Structure and Operational Cycle

LEO-RobotAgent is realized as a self-cycling agent, defined as the tuple $A = \langle M, T, H, \pi \rangle$ :

$M$ : The LLM, prompted to produce structured JSON outputs and equipped to handle both planning and reasoning.
$T = \{\tau_1, \ldots, \tau_K\}$ : The set of registered tools, where each tool $\tau_i$ is a quadruple $\langle \text{name}, \text{fn}_i, \text{desc}_i, \text{status}_i \rangle$ .
$H_t$ : The dynamic history at iteration $t$ , containing the user’s task description, all agent messages, tool observations, and human feedback.
$\pi$ : The implicit agent policy, realized through the LLM and driven by the current history $H_t$ and tool set $T$ .

At each step, the system operates as follows:

LLM processes system prompt $P_\text{sys}$ and $H_t$ , outputting an object $o_t = \{ \text{“Message”}: m_t, \text{“Action”}: a_t, \text{“Action\_Input”}: p_t \}$ .
The Executor dispatches $\tau_{a_t}.\text{fn}(p_t) \rightarrow \text{obs}_t$ .
The history buffer is updated: $H_{t+1} = H_t \cup \{o_t, \text{obs}_t\}$ .
The loop halts if the agent explicitly declares either “Task Completed” or “Cannot Proceed”.

The agent’s planning objective is to generate action-parameter sequences $\{a_1, p_1; \ldots; a_N, p_N\}$ that maximize expected cumulative reward:

$J = \mathbb{E}_{\pi}\left[\sum_{t=1}^N r(s_t, a_t)\right]$

subject to system dynamics $f(x_t, u_t) = 0$ and state/action constraints $g(x_t, u_t) \leq 0$ , where $r(\cdot)$ is a sparse success reward, $u_t$ are low-level tool parameters, and $s_t$ denotes the current (implicitly represented) world state (Chen et al., 11 Dec 2025).

2. Modular Toolset and Invocation Mechanics

LEO-RobotAgent’s extensible tool system decomposes perception, manipulation, planning, and communication actions into callable modules. Each tool is defined as:

$\tau = \left\lbrace \text{name},\ \text{fn: Callable},\ \text{desc: InputSchema} \rightarrow \text{OutputSchema},\ \text{status: Boolean} \right\rbrace$

Registration employs the following API:

1 2	def register_tool(name, fn, description, active=True): Tools[name] = { "fn": fn, "desc": description, "active": active }

Tool invocation is determined by the LLM through structured JSON:

1	{ "Action": "<tool_name>", "Action_Input": { ... parameters ... } }

The Executor inspects tool status: if the selected action is available and active, the tool function is executed; otherwise, a standardized error observation is returned. This design allows straightforward extension and dynamic adaptation of the agent’s capabilities across hardware and software stacks (Chen et al., 11 Dec 2025).

3. Bidirectional Human–Robot Interaction

The framework integrates a real-time human–robot interaction loop supporting:

History-mediated intent grounding: User utterances $U_t$ are appended directly to $H_t$ , with the LLM webbed as a “partner,” explicitly instructed to monitor, interpret, and immediately react to human input.
Interruptibility and clarification: The policy $\pi$ is modified on-the-fly by user corrections, formally expressed as:

$\pi_{\text{eff}}(a_t \mid H_t) \propto \exp(\text{LLM}(\operatorname{softmax}(H_t \cup \delta U)))$

where $\delta U$ is the embedded latest human correction. If ambiguity is detected, the agent may issue an “ask_clarification” meta-action, explicitly routing control to the human for further direction.

Interleaved pipeline: Human input can preempt LLM-initiated actions, and the LLM will re-plan in context.

This architectural strategy systematically lowers barriers for non-expert users in embodied contexts and enables robust mixed-initiative collaboration (Chen et al., 11 Dec 2025).

4. Task Planning and Execution Pipeline

The end-to-end cycle progresses as:

Task Input: Free-form, natural language instruction $d$ from the user.
LLM Reasoning: Using the locked-down system prompt $P_\text{sys}$ , the LLM generates both explanatory “Message” fields and prescriptive “Action”/“Action_Input” pairs as JSON.
Execution: Action forwarded to the relevant tool, returning an observation.
History Update: Augmentation of $H$ with all new reasoning, actions, parameters, and observations.
Iteration: Looping continues until “Task Completed” or “Cannot Proceed”.

For motion tools, path generation solves the optimization:

$\min_{u_1 \ldots u_M} \sum_{i=1}^{M} c(u_i) \quad \text{s.t.} \quad x_{i+1} = f(x_i, u_i), \ h(x_i) \in \text{Safe},\ x_0 = \text{current},\ x_M = \text{goal}$

with $c(u) = \|\Delta \text{pos}\|_2 + \lambda \|\Delta \text{yaw}\|_1$ typically weighting travel and effort (Chen et al., 11 Dec 2025).

5. Experimental Evaluation and Comparative Performance

LEO-RobotAgent is validated across UAVs (simulation and real flights), wheeled mobile robots equipped with articulated arms, and complex maps (café-style, urban). The evaluation suite includes delivery, search, and handover tasks, with metrics:

Success Rate (%)
Average Time to Completion (s)
Token Usage (number of LLM tokens)
Task-specific Score (out of 10)
Perfect Rate (% fully completed tasks)

Results for prompt engineering configurations are summarized below:

Method	Success Rate	Token Usage	Time (s)	Time/item (s)
Zero-shot	20%	32,656	175.2	183.8
One-shot	50%	32,048	156.7	123.4
CoT	60%	37,791	155.0	126.9
One-shot+CoT	70%	44,985	180.2	172.8

Agent architecture comparisons for key tasks:

Task / Agent	DAS	CGE	DLLMs	TLLMs	LEO-Agent
Delivery (Score)	9.16	9.34	7.94	8.38	9.16
Searching (Score)	–	5.38	3.13	5.88	7.88
Handover (Score)	–	–	4.93	4.87	7.87
Perfect Rate (Handover)	–	–	13.3%	13.3%	46.7%

Ablation reveals chain-of-thought (CoT) and one-shot prompting drive a 50% relative increase in success rate. The single-LLM LEO-Agent structure is empirically more robust than decoupled (DLLMs/TLLMs) designs (Chen et al., 11 Dec 2025).

6. Generalization, Robustness, and Efficiency

The framework demonstrates substantial sim-to-real transfer: UAV search-and-drop achieves 90% success in simulation and 70% in the real world, with principal failures attributed to low-level tool or control errors, not to LLM planning. Performance plateaus at LLM parameter scales $P>20$ B, with inference latency $L\approx \alpha P^{0.6}$ in ROS deployments, indicating diminishing returns for further scaling (Chen et al., 11 Dec 2025).

7. Software Stack, Codebase, and Reproducibility

The implementation leverages ROS Noetic for agent-to-tool orchestration, with a modular structure:

agent_node.py: wraps LLM, maintains history, dispatches tool calls.
tools/: each Python file implements a ROS node for perception, control, or summarization.
web_ui/: front-end in React JS for registering tools, monitoring status, inputting tasks, and real-time video overlay.

Launch and reproduction:

Clone github.com/LegendLeoChen/LEO-RobotAgent
Install dependencies: pip install -r requirements.txt; sudo apt install ros-noetic-rosbridge-server
Launch system: roslaunch leo_agent agent_system.launch
Register tools and interact via web UI on localhost:8080 (Chen et al., 11 Dec 2025).

This architecture supports direct cross-platform application on UAVs, manipulators, and mobile platforms while maintaining extensibility, interpretability, and integration with modern LLMs.

Markdown Report Issue Upgrade to Chat

References (1)

LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LEO-RobotAgent.

LEO-RobotAgent Framework

1. Formal Structure and Operational Cycle

2. Modular Toolset and Invocation Mechanics

3. Bidirectional Human–Robot Interaction

4. Task Planning and Execution Pipeline

5. Experimental Evaluation and Comparative Performance

6. Generalization, Robustness, and Efficiency

7. Software Stack, Codebase, and Reproducibility

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

LEO-RobotAgent Framework

1. Formal Structure and Operational Cycle

2. Modular Toolset and Invocation Mechanics

3. Bidirectional Human–Robot Interaction

4. Task Planning and Execution Pipeline

5. Experimental Evaluation and Comparative Performance

6. Generalization, Robustness, and Efficiency

7. Software Stack, Codebase, and Reproducibility

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research