IoTGPT Architectures

Updated 21 February 2026

IoTGPT architectures are LLM-driven frameworks that decompose natural language instructions into precise, device-specific commands using hierarchical memory and personalization.
They employ a three-stage LLM inference engine and memory-assisted subtask reuse to significantly reduce latency and improve command accuracy.
Empirical evaluations show superior performance in speed, cost, and user satisfaction compared to traditional smart home agents.

IoTGPT architectures are LLM-driven agent frameworks optimized for reliable, efficient, and personalized control of Internet of Things (IoT) devices. The architectural design addresses foundational limitations of previous LLM-based smart home agents, notably the high inference latency, non-determinism in device control, and poor personalization, by introducing hierarchical decomposition, memory-assisted reuse, and ontology-informed adaptation to user preferences (Yu et al., 8 Jan 2026).

1. System Overview and Data Flow

An IoTGPT system integrates client-side interfaces with a backend orchestration pipeline that processes natural-language (NL) instructions for IoT environments. The data flow is as follows:

User Interaction: A mobile application collects a spoken or typed NL command.
Backend Orchestration: The backend server sequentially processes the command through modules: Instruction Parser, Subtask Decomposer, Memory Module, LLM Inference Engine (Decompose/Derive/Refine), Personalization Module, Two-Step Correction, and Device Controller.
Device Actuation: Finalized low-level JSON commands are dispatched to IoT devices via REST or WebSockets, enabling control over diverse hardware (e.g., lights, air conditioners, sensors).

The architecture emphasizes subtask decomposition of NL instructions, hierarchical task memory for reuse, and context-aware personalization, as described in detail below.

2. Core Architectural Modules

2.1 Instruction Parser

The Instruction Parser classifies incoming natural-language commands using a prompt to the LLM, distinguishing between Direct Control, Trigger-Action Rule, and Device Query classes. It then invokes the IoT platform API to obtain the current inventory of device names, capabilities, and specifications. The output is a structured data object containing the parsed command type, the original instruction, and the enumerated device list.

2.2 Subtask Decomposer

The Subtask Decomposer maps high-level or ambiguous instructions into granular, device-specific subtasks. The algorithm constructs a prompt encapsulating the user command and the device list, requesting the LLM to emit a JSON-formatted list of subtasks. A typical output would be:

[
  {"subtask": "Adjust air conditioner temperature", "device": "air conditioner"},
  {"subtask": "Set humidifier level", "device": "humidifier"},
  {"subtask": "Dim the sleep light", "device": "sleep light"}
]

This stage is invoked only if no matching task structure exists in memory.

2.3 LLM Inference Engine (“Decompose–Derive–Refine” Pipeline)

The LLM Inference Engine orchestrates three sequential stages, each defined by specialized prompts:

Decompose: Performed only for new instructions, generating subtasks as above.

Derive: For each subtask, and if not reusable from memory, the LLM translates the subtask into a device-compatible JSON command template, referencing IoT API documentation fetched via retrieval-augmented generation (RAG). Example template:

{
  "desc": "Set temperature to [temperature_value]",
  "device": {"name": "air conditioner"},
  "capability": {"command": "setCoolingSetpoint"},
  "value": {"decimal": "[temperature_value]"}
}

Refine: All pipelines invoke this stage to concretize parameters, filling placeholders with values drawn from user preference tables or defaults.

Memory lookup precedes each stage to maximize reuse, minimizing LLM calls and latency.

2.4 Hierarchical Memory Module

The Memory Module implements a hierarchical directed acyclic graph (DAG) of three node types:

TaskNode: Encodes full instructions, storing text and embedding vectors.
SubtaskNode: Stores subtask names and their command templates.
ContextNode: Associates context keywords (e.g., “sleeping”) with specific parameter bindings.

Edges connect TaskNode → SubtaskNode → ContextNode. Retrieval employs cosine similarity thresholding (on instruction and subtask names) and context keyword string equality, enabling efficient reuse at multiple abstraction levels. Memory is continually updated after successful execution or human correction.

2.5 Personalization Module

Personalization unfolds in two phases:

A. Preference Extraction: Offline or periodically, the LLM processes user device interaction logs, aided by the EUPont ontology, to map commands onto environmental properties (e.g., temperature, humidity). The LLM partitions each property's numeric range into discrete levels (e.g., “low”, “medium”, “high”), generating context-specific tables of preferred device settings.
B. Preference Reflection: At runtime, context keywords select the relevant preference table. Placeholders in command templates are filled by mapping discrete preference levels to concrete values, e.g., mapping “low” AC temperature preference to 18°C within a permitted range. Subtasks may be dynamically injected based on inferred user priorities (e.g., increased security during absence).

2.6 Two-Step Correction and Device Controller

The correction mechanism involves simulated execution of commands (virtual simulation) followed by LLM-driven self-correction based on error logs. Persisting errors trigger optional human-in-the-loop review, after which the memory is updated. The Device Controller formats final JSON for execution against frameworks like Samsung SmartThings via REST.

3. End-to-End Workflow

The following pseudocode defines the primary workflow:

function HandleInstruction(instruction):
  parsed = InstructionParser.parse(instruction)

  if match = Memory.findTask(parsed.instructionText):
    subtasks = match.subtasks
  else:
    subtasks = Decompose(parsed.instructionText, parsed.deviceList)
    Memory.storeNewTask(instruction, subtasks)

  finalCommands = []
  for subtask in subtasks:
    if stMatch = Memory.findSubtask(subtask.name):
      template = stMatch.commandTemplate
    else:
      template = Derive(instruction, subtask, API_DOC)
      Memory.storeNewSubtask(subtask.name, template)

    contextKey = extractContext(instruction)
    bindings = Memory.findContextBindings(subtask.name, contextKey)
    if not bindings:
      bindings = PersonalizationModule.getBindings(contextKey)
      Memory.storeContext(contextKey, bindings)

    command = applyBindings(template, bindings)
    finalCommands.append(command)

  for attempt in 1..MaxRetries:
    result = VirtualSimulator.execute(finalCommands)
    if result.success: break
    errors = result.errors
    finalCommands = LLM.selfCorrect(finalCommands, errors)

  if not result.success or userWantsReview:
    finalCommands = HumanInLoop.reviewAndEdit(finalCommands)
    Memory.updateAfterHumanFeedback(finalCommands)

  DeviceController.execute(finalCommands)
  return success

Task and subtask reuse occur at the earliest opportunity, minimizing redundant reasoning and maximizing inference efficiency.

4. Mathematical Performance Models

Inference latency for IoTGPT is modeled as:

$T_{\text{IoTGPT}} = N_{\mathrm{decomp}} T_d + N_{\mathrm{derive}} T_r + 1 \cdot T_f$

where $N_{\mathrm{decomp}}$ and $N_{\mathrm{derive}}$ are counts of new decompositions and derivations, and $T_d, T_r, T_f$ are average latencies for the Decompose, Derive, and Refine LLM calls, respectively.

Compared to a baseline monolithic agent with $N_{\mathrm{base}}$ calls at average latency $T_b$ :

$T_{\text{base}} = N_{\mathrm{base}} T_b$

Because $N_{\mathrm{decomp}} + N_{\mathrm{derive}} < N_{\mathrm{base}}$ due to memory reuse, IoTGPT achieves lower runtime.

The cost model, assuming per-call cost $\alpha$ :

$C = \alpha \cdot N_{\mathrm{calls}}$

IoTGPT with memory yields $N_{\mathrm{calls,IoTGPT}} = N_d + N_r + 1$ , lower than $N_{\mathrm{calls,base}} = N_{\mathrm{base}}$ for baselines.

Reliability, measured as fully correct task completion rate $R$ , achieves marginal improvement per reused subtask ( $m_{\mathrm{reuse}}$ ), formalized as:

$\Delta R = R_{\mathrm{IoTGPT}} - R_{\mathrm{base}} = (1 - R_{\mathrm{base}}) m_{\mathrm{reuse}}$

5. Empirical Evaluation

5.1 Command Accuracy, Latency, and Cost

IoTGPT demonstrates statistically significant improvements over state-of-the-art LLM-driven baselines such as Sasha and SAGE:

System

STR (%)↑

ICR (%)↓

SER (%)↓

ECR (%)

Latency (sec)↓

Cost (

)↓</th> </tr> </thead><tbody><tr> <td>Sasha</td> <td>49.5</td> <td>38.2</td> <td>12.6</td> <td>28.4</td> <td>80.3</td> <td>3.70</td> </tr> <tr> <td>SAGE</td> <td>64.1</td> <td>24.7</td> <td>9.4</td> <td>18.5</td> <td>48.5</td> <td>2.10</td> </tr> <tr> <td>IoTGPT</td> <td>83.5</td> <td>10.2</td> <td>4.3</td> <td>12.6</td> <td>32.0</td> <td>1.56</td> </tr> <tr> <td>IoTGPT (+mem)</td> <td>101.8¹</td> <td>0.0</td> <td>0.0</td> <td>0.0</td> <td>22.0</td> <td>1.10</td> </tr> </tbody></table></div> <p>¹STR can exceed 100% relative to ground-truth strictness in &quot;warm-start&quot; rephrasing phase.</p> <h3 class='paper-heading' id='personalization-effectiveness'>5.2 Personalization Effectiveness</h3> <p>A user study (7-point scale, Wilcoxon

p<0.05$) demonstrates significant improvements in delivering all necessary commands, appropriateness of device selection, and parameter adjustment accuracy:

Question	Sasha	SAGE	IoTGPT
All necessary commands?	2.47	3.68	5.95
Devices appropriate?	2.44	3.52	6.28
Parameter adjustments right?	2.20	3.04	6.02

All pairwise differences favor IoTGPT.

6. Design Implications and Impact

The compositional Decompose–Derive–Refine pipeline, in conjunction with a hierarchical DAG-based task memory and device-agnostic personalization, underpins substantial advancements: +35% strict command accuracy (STR) over prior LLM-only agents, –34% latency and –25% inference cost versus the strongest baseline, and +2 points in user-perceived personalization (on a 7-point scale). These results demonstrate that structured subtasking, fine-grained reuse, and adaptive preference modeling are key enablers for reliable, cost-effective, and user-centered automation in smart environments (Yu et al., 8 Jan 2026).

A plausible implication is that this architecture paradigm is transferable to other IoT and multi-step instruction domains suffering from similar compositionality, efficiency, and personalization challenges.

Markdown Report Issue Upgrade to Chat

References (1)

Leveraging LLMs for Efficient and Personalized Smart Home Automation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to IoTGPT Architectures.