Papers
Topics
Authors
Recent
Search
2000 character limit reached

LEO-RobotAgent Framework

Updated 3 February 2026
  • LEO-RobotAgent is a language-driven robotic framework that integrates LLMs for automated task planning, modular tool invocation, and high-level reasoning.
  • The architecture uses a self-cycling agent model with dynamic history updates and tool evaluations to achieve efficient and adaptive operation.
  • Experimental evaluations demonstrate significant improvements in success rate, sim-to-real transfer, and efficiency across platforms like UAVs and wheeled robots.

LEO-RobotAgent is a general-purpose language-driven robotic agent framework that enables LLMs to perform automated operation, task planning, and high-level reasoning across a diverse range of robot types and environments, with a focus on modularity, generalization, efficiency, and robust human-robot interaction (Chen et al., 11 Dec 2025).

1. Formal Structure and Operational Cycle

LEO-RobotAgent is realized as a self-cycling agent, defined as the tuple A=M,T,H,πA = \langle M, T, H, \pi \rangle:

  • MM: The LLM, prompted to produce structured JSON outputs and equipped to handle both planning and reasoning.
  • T={τ1,,τK}T = \{\tau_1, \ldots, \tau_K\}: The set of registered tools, where each tool τi\tau_i is a quadruple name,fni,desci,statusi\langle \text{name}, \text{fn}_i, \text{desc}_i, \text{status}_i \rangle.
  • HtH_t: The dynamic history at iteration tt, containing the user’s task description, all agent messages, tool observations, and human feedback.
  • π\pi: The implicit agent policy, realized through the LLM and driven by the current history HtH_t and tool set TT.

At each step, the system operates as follows:

  1. LLM processes system prompt PsysP_\text{sys} and HtH_t, outputting an object ot={“Message”:mt,“Action”:at,“Action_Input”:pt}o_t = \{ \text{“Message”}: m_t, \text{“Action”}: a_t, \text{“Action\_Input”}: p_t \}.
  2. The Executor dispatches τat.fn(pt)obst\tau_{a_t}.\text{fn}(p_t) \rightarrow \text{obs}_t.
  3. The history buffer is updated: Ht+1=Ht{ot,obst}H_{t+1} = H_t \cup \{o_t, \text{obs}_t\}.
  4. The loop halts if the agent explicitly declares either “Task Completed” or “Cannot Proceed”.

The agent’s planning objective is to generate action-parameter sequences {a1,p1;;aN,pN}\{a_1, p_1; \ldots; a_N, p_N\} that maximize expected cumulative reward:

J=Eπ[t=1Nr(st,at)]J = \mathbb{E}_{\pi}\left[\sum_{t=1}^N r(s_t, a_t)\right]

subject to system dynamics f(xt,ut)=0f(x_t, u_t) = 0 and state/action constraints g(xt,ut)0g(x_t, u_t) \leq 0, where r()r(\cdot) is a sparse success reward, utu_t are low-level tool parameters, and sts_t denotes the current (implicitly represented) world state (Chen et al., 11 Dec 2025).

2. Modular Toolset and Invocation Mechanics

LEO-RobotAgent’s extensible tool system decomposes perception, manipulation, planning, and communication actions into callable modules. Each tool is defined as:

τ={name, fn: Callable, desc: InputSchemaOutputSchema, status: Boolean}\tau = \left\lbrace \text{name},\ \text{fn: Callable},\ \text{desc: InputSchema} \rightarrow \text{OutputSchema},\ \text{status: Boolean} \right\rbrace

Registration employs the following API:

1
2
def register_tool(name, fn, description, active=True):
    Tools[name] = { "fn": fn, "desc": description, "active": active }
Tool invocation is determined by the LLM through structured JSON:
1
{ "Action": "<tool_name>", "Action_Input": { ... parameters ... } }
The Executor inspects tool status: if the selected action is available and active, the tool function is executed; otherwise, a standardized error observation is returned. This design allows straightforward extension and dynamic adaptation of the agent’s capabilities across hardware and software stacks (Chen et al., 11 Dec 2025).

3. Bidirectional Human–Robot Interaction

The framework integrates a real-time human–robot interaction loop supporting:

  • History-mediated intent grounding: User utterances UtU_t are appended directly to HtH_t, with the LLM webbed as a “partner,” explicitly instructed to monitor, interpret, and immediately react to human input.
  • Interruptibility and clarification: The policy π\pi is modified on-the-fly by user corrections, formally expressed as:

πeff(atHt)exp(LLM(softmax(HtδU)))\pi_{\text{eff}}(a_t \mid H_t) \propto \exp(\text{LLM}(\operatorname{softmax}(H_t \cup \delta U)))

where δU\delta U is the embedded latest human correction. If ambiguity is detected, the agent may issue an “ask_clarification” meta-action, explicitly routing control to the human for further direction.

  • Interleaved pipeline: Human input can preempt LLM-initiated actions, and the LLM will re-plan in context.

This architectural strategy systematically lowers barriers for non-expert users in embodied contexts and enables robust mixed-initiative collaboration (Chen et al., 11 Dec 2025).

4. Task Planning and Execution Pipeline

The end-to-end cycle progresses as:

  1. Task Input: Free-form, natural language instruction dd from the user.
  2. LLM Reasoning: Using the locked-down system prompt PsysP_\text{sys}, the LLM generates both explanatory “Message” fields and prescriptive “Action”/“Action_Input” pairs as JSON.
  3. Execution: Action forwarded to the relevant tool, returning an observation.
  4. History Update: Augmentation of HH with all new reasoning, actions, parameters, and observations.
  5. Iteration: Looping continues until “Task Completed” or “Cannot Proceed”.

For motion tools, path generation solves the optimization:

minu1uMi=1Mc(ui)s.t.xi+1=f(xi,ui), h(xi)Safe, x0=current, xM=goal\min_{u_1 \ldots u_M} \sum_{i=1}^{M} c(u_i) \quad \text{s.t.} \quad x_{i+1} = f(x_i, u_i), \ h(x_i) \in \text{Safe},\ x_0 = \text{current},\ x_M = \text{goal}

with c(u)=Δpos2+λΔyaw1c(u) = \|\Delta \text{pos}\|_2 + \lambda \|\Delta \text{yaw}\|_1 typically weighting travel and effort (Chen et al., 11 Dec 2025).

5. Experimental Evaluation and Comparative Performance

LEO-RobotAgent is validated across UAVs (simulation and real flights), wheeled mobile robots equipped with articulated arms, and complex maps (café-style, urban). The evaluation suite includes delivery, search, and handover tasks, with metrics:

  • Success Rate (%)
  • Average Time to Completion (s)
  • Token Usage (number of LLM tokens)
  • Task-specific Score (out of 10)
  • Perfect Rate (% fully completed tasks)

Results for prompt engineering configurations are summarized below:

Method Success Rate Token Usage Time (s) Time/item (s)
Zero-shot 20% 32,656 175.2 183.8
One-shot 50% 32,048 156.7 123.4
CoT 60% 37,791 155.0 126.9
One-shot+CoT 70% 44,985 180.2 172.8

Agent architecture comparisons for key tasks:

Task / Agent DAS CGE DLLMs TLLMs LEO-Agent
Delivery (Score) 9.16 9.34 7.94 8.38 9.16
Searching (Score) 5.38 3.13 5.88 7.88
Handover (Score) 4.93 4.87 7.87
Perfect Rate (Handover) 13.3% 13.3% 46.7%

Ablation reveals chain-of-thought (CoT) and one-shot prompting drive a 50% relative increase in success rate. The single-LLM LEO-Agent structure is empirically more robust than decoupled (DLLMs/TLLMs) designs (Chen et al., 11 Dec 2025).

6. Generalization, Robustness, and Efficiency

The framework demonstrates substantial sim-to-real transfer: UAV search-and-drop achieves 90% success in simulation and 70% in the real world, with principal failures attributed to low-level tool or control errors, not to LLM planning. Performance plateaus at LLM parameter scales P>20P>20B, with inference latency LαP0.6L\approx \alpha P^{0.6} in ROS deployments, indicating diminishing returns for further scaling (Chen et al., 11 Dec 2025).

7. Software Stack, Codebase, and Reproducibility

The implementation leverages ROS Noetic for agent-to-tool orchestration, with a modular structure:

  • agent_node.py: wraps LLM, maintains history, dispatches tool calls.
  • tools/: each Python file implements a ROS node for perception, control, or summarization.
  • web_ui/: front-end in React JS for registering tools, monitoring status, inputting tasks, and real-time video overlay.

Launch and reproduction:

  1. Clone github.com/LegendLeoChen/LEO-RobotAgent
  2. Install dependencies: pip install -r requirements.txt; sudo apt install ros-noetic-rosbridge-server
  3. Launch system: roslaunch leo_agent agent_system.launch
  4. Register tools and interact via web UI on localhost:8080 (Chen et al., 11 Dec 2025).

This architecture supports direct cross-platform application on UAVs, manipulators, and mobile platforms while maintaining extensibility, interpretability, and integration with modern LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LEO-RobotAgent.