PROTEA Defense Architecture

Updated 16 January 2026

PROTEA Defense Architecture is a model-agnostic, runtime safety framework for robotic task planning that mitigates adversarial manipulations.
It employs object filtering and external memory mechanisms to reduce dimensionality and capture history-dependent vulnerabilities in plan execution.
Evaluated on the HarmPlan dataset, the architecture uses stepwise LLM judgments to achieve high recall in detecting malicious steps while maintaining low latency.

PROTEA Defense Architecture is a model-agnostic, runtime safety framework for robot task planning systems. Designed to counter adversarial manipulations in planners—particularly those utilizing LLMs—PROTEA employs a multi-component approach that addresses both dimensionality and history-related vulnerabilities in plan validation. The architecture operates as a middleware, interfacing between symbolic planners and robotic execution modules, with its hallmark feature being the use of instruction-tuned LLMs as stepwise plan judges. The system is evaluated extensively on the HarmPlan dataset, comprising both benign and carefully designed malicious plans, stratifying attack difficulty and leveraging diverse foundation models for judge instantiation (Altaweel et al., 12 Jan 2026).

1. System Overview and Data Flow

PROTEA is intended for deployment in robotic stacks, with a high-level instruction $\tau$ and an initial world state $E_0$ passed to a potentially compromised planner $\Pi$ . The planner outputs a symbolic plan $\pi=(a_1,\ldots,a_n)$ , where adversarial attacks may have injected harmful actions. PROTEA interposes itself before execution, operating through the following pipeline:

Object Filtering: $FilterObjects(E_0, \pi) \rightarrow E_0^f$ reduces the input state to objects relevant to $\pi$ , increasing reasoning tractability.
External Memory Initialization: $H \leftarrow \varnothing$ , $E \leftarrow E_0^f$ ; maintains a per-object state graph to capture evolving environmental dynamics across plan steps.
Stepwise LLM Judgment: For each $t = 1 \ldots n$ $t = 1 \dots n$ :
- $a_t \in V$ (action vocabulary)
- $y_t = f_{LLM}(a_t, E, H, V) \in \{\text{safe}, \text{malicious}\}$
- If $y_t = \text{malicious}$ , halt execution and return an explanation.
- Otherwise, $E \leftarrow SimUpdate(E, a_t)$ and $H \leftarrow H \cup \{a_t\}$ .
Plan Verdict: If all $a_t$ judged safe, $\pi$ is executed; otherwise, execution is prevented and an explanation is logged.

PROTEA delivers a binary verdict per plan: either the plan is allowed for execution, or halted as potentially harmful, accompanied by a reasoned explanation (Altaweel et al., 12 Jan 2026).

2. Threat Model and Formal Safety Metrics

PROTEA assumes an adversary $\mathcal{A}$ with the following capabilities:

Insertion or replacement of up to $k$ actions within a plan $\pi$ , either in contiguous or scattered positions.
Manipulation of planner prompts or weights, biasing $\Pi$ to produce harmful subplans.
If $\pi_0$ is benign, $\mathcal{A}$ may compute $\pi' = Inject_{\mathcal{A}}(\pi_0)$ such that there exist indices $i_1 < \ldots < i_k$ , where $a_{i_j}$ is malicious.

Plan Safety Assessment Functions:

Naïve Judge: Computes $S(\pi) = Pr_{LLM}(\text{malicious} \mid \pi, E_0)$ . Accepts $\pi$ if $S(\pi) < \theta$ .
PROTEA Stepwise Scheme: $S_{step}(\pi) = \min_t s_t$ , with $s_t = Pr_{LLM}(\text{safe} \mid a_t, E_{t-1}, H_{t-1})$ . The plan is deemed malicious if $S_{step}(\pi) < \theta_{step}$ at any step.

Detection Metrics:

Standard classification metrics computed over $N$ benign and $M$ malicious plans—True Positives (TP), False Positives (FP), True Negatives (TN), False Negatives (FN):

Metric	Formula
Precision	$TP / (TP + FP)$
Recall (TPR)	$TP / (TP + FN)$
F1 Score	$2 \cdot \text{Precision} \cdot \text{Recall} / (\text{Precision} + \text{Recall})$
FPR	$FP / (FP + TN)$

3. Architectural Modules: Dimensionality and History Handling

PROTEA’s robustness derives from two core mechanisms:

Dimensionality Mitigation: By object filtering, the architecture reduces the size of the symbolic world-state graph from $|O_{total}|$ to $|O_{filtered}| \ll |O_{total}|$ . This keeps LLM prompt lengths tractable and tightly focused on plan-relevant entities.
History Challenge Mitigation: For long-horizon and stealthy adversarial plans (e.g., “consequential” attacks), external memory continually updates the environment state and records past actions; this ensures that the LLM judge’s decision on $a_t$ is informed by complete causal context, enabling the detection of multi-step attack chains.

This dual approach directly addresses the curse of dimensionality and history that has previously limited the efficacy of automated plan safety assessment (Altaweel et al., 12 Jan 2026).

4. Core Algorithm and Implementation Details

The principal pseudocode for PROTEA is as follows:

E = FilterObjects(E0, pi)
H = set()
for t in range(1, n+1):
    s_t = Pr_LLM("safe" | a_t, E, H, V)
    if s_t < theta_step:
        return "malicious", explanation
    E = SimUpdate(E, a_t)
    H.add(a_t)
return "safe"

In practice, the judgment and simulation LLM calls can be executed separately or batched to optimize latency. PROTEA operates in a planner-agnostic fashion and does not require retraining or modification of underlying planning modules (Altaweel et al., 12 Jan 2026).

5. Evaluation: HarmPlan Dataset and Experimental Results

HarmPlan Dataset:

Composed of 1,500 plans (≈50% malicious) from VirtualHome, including 750 benign plans covering domains such as food preparation, cleaning, and leisure.
91 malicious behaviors across six categories: Fire Hazards, Electrical Hazards, Property Damage, Animal Harm, Item Loss, Poisoning/Contamination.
Attack plans comprise direct (4–6 consecutive malicious steps) and consequential (6–13 steps with varying degrees of benign interleaving) variants, stratified by difficulty:
- Medium: 2–4 benign separators
- Hard: ≥5 benign separators
Automated validation ensures feasibility and semantic correctness of each plan, supplementing structural gaps such as “open microwave” or “walk to room” steps.

Detection Performance:

Model	Precision	Recall
GPT-4o-mini	0.902	0.940
LLaMA3.3-70B	0.924	0.787

Recall degrades as attack difficulty increases in naïve, single-shot judgment.
Object Filtering recaptures up to 30% recall for medium and hard plans, especially in models lacking chain-of-thought reasoning.
PROTEA yields state-of-the-art recall on hard plans for LLaMA3.3-70B, Phi-4, and Mixtral.
Category-wise analysis: recall $>$ 90% for Fire/Poisoning; Item Loss (keys/phones) remains challenging ( $\sim$ 60% naive, $\sim$ 75% PROTEA).
PROTEA exhibits slightly lower precision (over-cautious per-step flags) but maintains high recall.
Latency per plan: $\sim$ 1.2 s (naïve), $\sim n \cdot (0.15 s) +$ simulation ( $\approx$ 2–4 s for $n \approx 20$ steps); memory overhead $<$ 10 KB.

6. Integration, Deployment, and Practical Considerations

PROTEA is deployed as a middleware immediately following planner $\Pi$ and prior to execution modules (e.g., trajectory servers in ROS-based stacks):

The object filtering module is directly realizable using graph queries over existing semantic maps.
LLM-based Judge and Simulator components can operate on local/edge servers to manage real-time constraints. Lightweight models (e.g., Grok-3-mini, Phi-4) provide feasible local inference for latency-critical applications.
On detection of a malicious step, PROTEA halts the plan and triggers safe-fallback planners subject to hard-coded safety constraints.
No changes to planning algorithms or retraining are required; system integration is thus minimally invasive and universally applicable across planning paradigms.
The architecture preserves plan execution safety while maintaining flexibility and scalability in large-scale or complex environments through dimension reduction and persistent memory.

7. Contextual Significance and Implications

PROTEA’s LLM-as-a-Judge paradigm introduces a general-purpose, planner-agnostic runtime defense for robot task planning environments. It is the first architecture to systematically address both high-dimensional reasoning and history-dependent attack detection at execution time. Its ability to halt stealthy multi-step adversarial plans prior to harm, while incurring low computational overhead and memory footprint, provides a practical route toward robust, explainable robot safety validation. A plausible implication is that similar runtime judgment architectures may extend to autonomous planning domains beyond robotics, wherever complex symbolic plans are subject to adversarial manipulation (Altaweel et al., 12 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

PROTEA: Securing Robot Task Planning and Execution (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PROTEA Defense Architecture.