Five-Layer Agentic UAV Architecture

Updated 8 February 2026

The paper introduces a five-layer UAV system with Perception, Reasoning, Action, Integration, and Learning layers that enable decision-theoretic autonomy.
It details techniques like sensor fusion, LLM-driven planning, and MPC-guided motion planning to convert high-level goals into executable actions.
Quantitative metrics demonstrate improved detection confidence, action recommendation rates, and overall performance in complex autonomous missions.

Agentic Unmanned Aerial Vehicles (UAVs) are defined by their capacity for autonomous reasoning, real-time interaction with digital and physical environments, and continuous self-improvement. The Five-Layer Agentic UAVs Architecture operationalizes these capabilities through rigorously modular abstraction, loosely coupled interfaces, and deep integration of LLMs and modern control systems, transforming conventional UAV systems limited to rule-based automation into decision-theoretic cognitive agents (Koubaa et al., 14 Sep 2025). This article details the layered structure—Perception, Reasoning, Action, Integration, and Learning—summarizing core functionalities, quantitative workflows, interconnections, and implementation methodologies.

1. Perception Layer

The Perception Layer fuses heterogeneous sensor inputs into a compact, probabilistic world model suitable for downstream reasoning and control. Its primary inputs are high-frequency RGB video (e.g., RealSense D455 at 30 Hz), thermal imagery, 3D LiDAR point clouds, and IMU telemetry (acceleration, gyroscope readings). Sensor streams are processed through:

Object detection: Using a YOLOv11 node (ROS 2 package yolo_detector), bounding boxes, classes, and detection confidences $C_d$ are inferred at 30 Hz.
Semantic segmentation: Models such as Mask R-CNN extract contextual scene information.
Sensor fusion: Extended Kalman Filters (EKF) or factor-graph node fusion integrate multimodal detections and inertial measurements to produce a 3D semantic scene graph $G_t$ .

The output is encapsulated in /perception/world_model using a structured message schema (e.g., WorldModel.msg), which includes timestamped UAV pose, probabilistic object lists (including pose, velocity, confidence, and covariance matrices), inter-object semantic relationships, and sensor health status.

Quantitative state propagation follows linearized or nonlinear state update formulations:

$x_{t+1} = f(x_t, u_t) + w_t$

where $w_t$ is system process noise. Detection confidences $C_d^i$ and pose uncertainties $\Sigma_x$ propagate through the data flow (Koubaa et al., 14 Sep 2025). All components execute within a ROS 2 Humble workspace and integrate with simulation environments (e.g., Gazebo) via dedicated plugins, with sensor topics consistently abstracted for plug-and-play module replacement (Tian et al., 4 Jan 2025).

2. Reasoning Layer

The Reasoning Layer is the cognitive and decision-theoretic core, transforming high-level operator goals or mission objectives and perceptual state into explicit, tool-callable action plans. It employs both cloud-based (GPT-4) and locally deployable (Gemma-3 4B) LLMs orchestrated via a ReAct (Reason + Act) workflow on stateful computation graphs (LangGraph). The Reasoning Layer’s responsibilities include:

Hierarchical plan decomposition: Goals are recursively decomposed into actionable steps, each with preconditions, tool-association, and fallback/reflection logic for handling execution failures.
LLM-driven action selection: Plan steps are generated using LLM softmax probabilities $P_a(a_t|s_t)$ .
Reflection and replanning: Failure feedback triggers prompt augmentation and adaptive replanning, enabling robust operation under uncertainty.

Action plans are published in structured form (e.g., PlanGraph.msg) to downstream components and include dependencies and explicit “on_fail” triggers. Algorithmic metrics include plan-level success estimates $S = \prod_{i=1}^{N} P_s(\textrm{step}_i)$ , Action Recommendation Rate ( $\textrm{ARR}$ ), and Contextual Analysis Rate ( $\textrm{CAR}$ ). ReAct pseudocode implements bidirectional feedback, supporting continual reflection (Koubaa et al., 14 Sep 2025).

3. Action Layer

The Action Layer operationalizes action plans by converting symbolic task steps into executable commands targeting both the UAV flight stack and digital/datalink subsystems:

Motion planning: Model Predictive Control (MPC) and Rapidly-Exploring Random Tree (RRT*) planners generate safe, dynamically feasible trajectories.
Flight execution: The PX4 autopilot is interfaced via the mavros bridge, delivering real-time setpoint arrays.
Safety & monitoring: Collision avoidance modules (e.g., avoidance_node) provide envelope protection.

The Action Layer receives PlanGraph.msg from the Reasoning Layer, processes sequential plan steps (physical or digital), and returns status and diagnostic feedback on /action/status. If an action requires interaction with external systems, a digital tool request is published to /integration/tool_request (Koubaa et al., 14 Sep 2025).

4. Integration Layer

The Integration Layer performs secure, protocol-governed execution of external digital actions on behalf of the Reasoning and Action layers, enforcing interface compliance and auditability. Key functions and workflows:

Tool-calling and API interaction: Requests to external APIs (weather, navigation databases), multi-UAV message brokers, or digital twin simulators, are managed and routed via defined nodes (e.g., integration_node).
Protocol mediation: The layer mediates exchanges using Model Context Protocol (MCP), Agent Communication Protocol (ACP), and agent-to-agent (A2A) schemas for cloud, operator, or distributed swarm interaction.
Result feedback: API or remote system responses are wrapped, logged, and returned over /integration/tool_response.

Integration Layer modularity supports rapid adaptation to new datasources, decentralized peer-to-peer operations, and heterogeneous ecosystem integration (Koubaa et al., 14 Sep 2025, Tian et al., 4 Jan 2025).

5. Learning Layer

The Learning Layer implements continuous self-calibration and knowledge expansion using mission execution traces, human feedback, and external document corpora:

Online reinforcement learning: Updates low-level control policies (e.g., MPC cost $J = \sum (\textrm{tracking error}^2 + \textrm{control effort}^2)$ ) via policy gradient descent during mission execution.
LLM prompt/policy refinement: Operator feedback (RLHF) dynamically tunes LLM heuristics and model weights.
Retrieval-Augmented Generation (RAG): New mission-relevant documents or datasets are indexed for in-context augmentation.
Cross-mission memory store: The memory_db archives scene graphs and decision logs for recurrent use.

Refined models and prompt templates are deployed to both the PX4 autopilot and local LLM inference engines, closing the adaptive learning loop (Koubaa et al., 14 Sep 2025).

6. Inter-Layer Interfaces and System Integration

All layers are implemented as loosely coupled ROS 2 nodes with narrowly-scoped topic and service interfaces, promoting independent scalability, testability, and module replacement (e.g., swapping GPT-4 for Gemma-3). The explicit message-passing scheme facilitates both in-silico simulation and real-world deployment, with Gazebo plugins providing sensor emulation for the Perception Layer and mavros bridging to physical autopilots in the Action Layer (Koubaa et al., 14 Sep 2025).

Bidirectional feedback links (Action→Reasoning, Integration→Reasoning) operationalize the ReAct paradigm, while the Learning Layer supervises both low-level control and high-level policy heuristics based on accumulated mission performance metrics. Table-based message schemas and pseudocode enforce reproducibility and ease of integration. The architecture thus supports SAE Levels 4–5 autonomy, transitioning UAVs from semi-automated to agentic, context-adaptive operation.

7. Quantitative Performance and Representative Scenarios

Agentic UAVs frameworks, such as the prototype described by Koubaa & Gabr (2025), yield measurable improvements in complex search-and-rescue simulations: detection confidence increases from 0.72 to 0.79, person detection rates rise from 75% to 91%, and Action Recommendation Rates jump from 4.5% to 92%, confirming that integrated LLM reasoning delivers qualitatively new autonomy at modest computational cost (Koubaa et al., 14 Sep 2025).

Exemplar deployments leverage the five-layer pipeline for real-time coordination in swarms, robust navigation in dynamic disaster zones, and digital ecosystem integration, demonstrating principal advances over rule-based and narrowly specialized UAV autonomy (Sapkota et al., 8 Jun 2025, Tian et al., 4 Jan 2025).

References:

"Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning" (Koubaa et al., 14 Sep 2025)
"UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility" (Tian et al., 4 Jan 2025)
"UAVs Meet Agentic AI: A Multidomain Survey of Autonomous Aerial Intelligence and Agentic UAVs" (Sapkota et al., 8 Jun 2025)