- The paper introduces a Cognitive Control Architecture that employs a dual-layer approach combining an Intent Graph and a Tiered Adjudicator to defend against indirect prompt injection attacks.
- The methodology leverages proactive planning and reactive deep reasoning, achieving a 97% reduction in attack success on the AgentDojo benchmark.
- The results imply that robust lifecycle supervision can harmonize security, functionality, and efficiency for deploying autonomous AI agents in high-stakes environments.
Cognitive Control Architecture for Robustly Aligned AI Agents
Introduction
LLM agents have demonstrated considerable potential in autonomously completing sophisticated tasks across diverse domains. However, this autonomy brings with it significant security challenges. One of the foremost threats these agents face is Indirect Prompt Injection (IPI) attacks, where external information contaminated with malicious instructions can hijack agent behavior, leading to unauthorized tool usage and objective deviations. The paper "Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents" (2512.06716) introduces a novel architectural framework to robustly align AI agents against such vulnerabilities, thereby addressing the systemic fragility inherent in existing defenses.
Figure 1: An illustrative example of a multi-step Indirect Prompt Injection (IPI) attack.
Architectural Design
The proposed Cognitive Control Architecture (CCA) is constructed on two synergistic pillars for achieving full-lifecycle cognitive supervision: the Intent Graph and the Tiered Adjudicator. The intent behind these pillars is to ensure both control-flow and data-flow integrity throughout the agent's operation.
The Intent Graph serves as the proactive layer, defining a legitimate sequence of tool calls predicated on the user's goal. This graph acts as a template against which all actions proposed by the agent are verified to ensure they comply with predetermined sequences and data-origin rules (Figure 2).
Figure 2: The Cognitive Control Architecture (CCA) operates in two layers, providing structured supervision through Intent Graphs and deep reasoning via Tiered Adjudicators.
The Tiered Adjudicator operates as the reactive layer, initiating deep reasoning upon detection of deviations from the Intent Graph. It utilizes a multi-faceted Intent Alignment Score, composed of semantic alignment, causal contribution, source provenance, and inherent action risk assessments, to adjudicate actions that deviate from expected trajectories. This mechanism is vital for countering sophisticated attacks by evaluating the depth of alignment between current actions and user goals.
Evaluation and Results
The effectiveness of CCA was rigorously validated on the AgentDojo benchmark, a testbed comprising 97 multi-step tasks emulating real-world operational complexity. Across these experiments, CCA demonstrated superior resilience against sophisticated IPI attacks, reducing the Attack Success Rate (ASR) by over 97% compared to an undefended baseline. Critically, it maintained high levels of Benign Utility (BU) and Utility Under Attack (UA), showcasing its ability to preserve functionality while providing robust security.



Figure 3: Trade-off analysis between DSR (defense success rate) and TSR (task success rate) in CCA.
Implications and Future Developments
The implications of CCA are multifaceted. Practically, it offers a blueprint for deploying AI agents in high-stakes environments, where security and task completion reliability are paramount. Theoretically, it challenges existing paradigms that treat security, functionality, and efficiency as mutually exclusive trade-offs, demonstrating that a well-architected supervision system can reconcile these dimensions. Future developments could explore dynamic graph refinement and context-aware risk modeling to enhance adaptability in scenarios where plans cannot be fully anticipated upfront.
Conclusion
The Cognitive Control Architecture provides a robust and scalable solution for aligning autonomous AI agents, effectively addressing security vulnerabilities while minimizing functional compromises. Its layered approach, synergizing proactive integrity assurance with reactive alignment adjudication, sets a new standard for agent security frameworks. As AI agents continue to evolve, CCA will likely play a pivotal role in their safe and efficient deployment across increasingly complex applications.