IoTGPT Architectures
- IoTGPT architectures are LLM-driven frameworks that decompose natural language instructions into precise, device-specific commands using hierarchical memory and personalization.
- They employ a three-stage LLM inference engine and memory-assisted subtask reuse to significantly reduce latency and improve command accuracy.
- Empirical evaluations show superior performance in speed, cost, and user satisfaction compared to traditional smart home agents.
IoTGPT architectures are LLM-driven agent frameworks optimized for reliable, efficient, and personalized control of Internet of Things (IoT) devices. The architectural design addresses foundational limitations of previous LLM-based smart home agents, notably the high inference latency, non-determinism in device control, and poor personalization, by introducing hierarchical decomposition, memory-assisted reuse, and ontology-informed adaptation to user preferences (Yu et al., 8 Jan 2026).
1. System Overview and Data Flow
An IoTGPT system integrates client-side interfaces with a backend orchestration pipeline that processes natural-language (NL) instructions for IoT environments. The data flow is as follows:
- User Interaction: A mobile application collects a spoken or typed NL command.
- Backend Orchestration: The backend server sequentially processes the command through modules: Instruction Parser, Subtask Decomposer, Memory Module, LLM Inference Engine (Decompose/Derive/Refine), Personalization Module, Two-Step Correction, and Device Controller.
- Device Actuation: Finalized low-level JSON commands are dispatched to IoT devices via REST or WebSockets, enabling control over diverse hardware (e.g., lights, air conditioners, sensors).
The architecture emphasizes subtask decomposition of NL instructions, hierarchical task memory for reuse, and context-aware personalization, as described in detail below.
2. Core Architectural Modules
2.1 Instruction Parser
The Instruction Parser classifies incoming natural-language commands using a prompt to the LLM, distinguishing between Direct Control, Trigger-Action Rule, and Device Query classes. It then invokes the IoT platform API to obtain the current inventory of device names, capabilities, and specifications. The output is a structured data object containing the parsed command type, the original instruction, and the enumerated device list.
2.2 Subtask Decomposer
The Subtask Decomposer maps high-level or ambiguous instructions into granular, device-specific subtasks. The algorithm constructs a prompt encapsulating the user command and the device list, requesting the LLM to emit a JSON-formatted list of subtasks. A typical output would be:
1 2 3 4 5 |
[
{"subtask": "Adjust air conditioner temperature", "device": "air conditioner"},
{"subtask": "Set humidifier level", "device": "humidifier"},
{"subtask": "Dim the sleep light", "device": "sleep light"}
] |
This stage is invoked only if no matching task structure exists in memory.
2.3 LLM Inference Engine (“Decompose–Derive–Refine” Pipeline)
The LLM Inference Engine orchestrates three sequential stages, each defined by specialized prompts:
- Decompose: Performed only for new instructions, generating subtasks as above.
- Derive: For each subtask, and if not reusable from memory, the LLM translates the subtask into a device-compatible JSON command template, referencing IoT API documentation fetched via retrieval-augmented generation (RAG). Example template:
1 2 3 4 5 6
{ "desc": "Set temperature to [temperature_value]", "device": {"name": "air conditioner"}, "capability": {"command": "setCoolingSetpoint"}, "value": {"decimal": "[temperature_value]"} } - Refine: All pipelines invoke this stage to concretize parameters, filling placeholders with values drawn from user preference tables or defaults.
Memory lookup precedes each stage to maximize reuse, minimizing LLM calls and latency.
2.4 Hierarchical Memory Module
The Memory Module implements a hierarchical directed acyclic graph (DAG) of three node types:
- TaskNode: Encodes full instructions, storing text and embedding vectors.
- SubtaskNode: Stores subtask names and their command templates.
- ContextNode: Associates context keywords (e.g., “sleeping”) with specific parameter bindings.
Edges connect TaskNode → SubtaskNode → ContextNode. Retrieval employs cosine similarity thresholding (on instruction and subtask names) and context keyword string equality, enabling efficient reuse at multiple abstraction levels. Memory is continually updated after successful execution or human correction.
2.5 Personalization Module
Personalization unfolds in two phases:
- A. Preference Extraction: Offline or periodically, the LLM processes user device interaction logs, aided by the EUPont ontology, to map commands onto environmental properties (e.g., temperature, humidity). The LLM partitions each property's numeric range into discrete levels (e.g., “low”, “medium”, “high”), generating context-specific tables of preferred device settings.
- B. Preference Reflection: At runtime, context keywords select the relevant preference table. Placeholders in command templates are filled by mapping discrete preference levels to concrete values, e.g., mapping “low” AC temperature preference to 18°C within a permitted range. Subtasks may be dynamically injected based on inferred user priorities (e.g., increased security during absence).
2.6 Two-Step Correction and Device Controller
The correction mechanism involves simulated execution of commands (virtual simulation) followed by LLM-driven self-correction based on error logs. Persisting errors trigger optional human-in-the-loop review, after which the memory is updated. The Device Controller formats final JSON for execution against frameworks like Samsung SmartThings via REST.
3. End-to-End Workflow
The following pseudocode defines the primary workflow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
function HandleInstruction(instruction): parsed = InstructionParser.parse(instruction) if match = Memory.findTask(parsed.instructionText): subtasks = match.subtasks else: subtasks = Decompose(parsed.instructionText, parsed.deviceList) Memory.storeNewTask(instruction, subtasks) finalCommands = [] for subtask in subtasks: if stMatch = Memory.findSubtask(subtask.name): template = stMatch.commandTemplate else: template = Derive(instruction, subtask, API_DOC) Memory.storeNewSubtask(subtask.name, template) contextKey = extractContext(instruction) bindings = Memory.findContextBindings(subtask.name, contextKey) if not bindings: bindings = PersonalizationModule.getBindings(contextKey) Memory.storeContext(contextKey, bindings) command = applyBindings(template, bindings) finalCommands.append(command) for attempt in 1..MaxRetries: result = VirtualSimulator.execute(finalCommands) if result.success: break errors = result.errors finalCommands = LLM.selfCorrect(finalCommands, errors) if not result.success or userWantsReview: finalCommands = HumanInLoop.reviewAndEdit(finalCommands) Memory.updateAfterHumanFeedback(finalCommands) DeviceController.execute(finalCommands) return success |
4. Mathematical Performance Models
Inference latency for IoTGPT is modeled as:
where and are counts of new decompositions and derivations, and are average latencies for the Decompose, Derive, and Refine LLM calls, respectively.
Compared to a baseline monolithic agent with calls at average latency :
Because due to memory reuse, IoTGPT achieves lower runtime.
The cost model, assuming per-call cost :
IoTGPT with memory yields , lower than for baselines.
Reliability, measured as fully correct task completion rate , achieves marginal improvement per reused subtask (), formalized as:
5. Empirical Evaluation
5.1 Command Accuracy, Latency, and Cost
IoTGPT demonstrates statistically significant improvements over state-of-the-art LLM-driven baselines such as Sasha and SAGE:
| System | STR (%)↑ | ICR (%)↓ | SER (%)↓ | ECR (%) | Latency (sec)↓ | Cost (p<0.05$) demonstrates significant improvements in delivering all necessary commands, appropriateness of device selection, and parameter adjustment accuracy:
All pairwise differences favor IoTGPT. 6. Design Implications and ImpactThe compositional Decompose–Derive–Refine pipeline, in conjunction with a hierarchical DAG-based task memory and device-agnostic personalization, underpins substantial advancements: +35% strict command accuracy (STR) over prior LLM-only agents, –34% latency and –25% inference cost versus the strongest baseline, and +2 points in user-perceived personalization (on a 7-point scale). These results demonstrate that structured subtasking, fine-grained reuse, and adaptive preference modeling are key enablers for reliable, cost-effective, and user-centered automation in smart environments (Yu et al., 8 Jan 2026). A plausible implication is that this architecture paradigm is transferable to other IoT and multi-step instruction domains suffering from similar compositionality, efficiency, and personalization challenges. |
|---|