Undefined security model for handling malicious third‑party content in LLM-based agents
Develop a concrete security model and policy for large language model–based agents operating within multi-agent systems that specifies how to handle malicious third‑party content, including trust boundaries and semantics for inputs, actions, data, and metadata, so that agents can distinguish benign from adversarial inputs and avoid harming users.
References
LLMs do not have a security model for dealing with malicious third-party content, and it is unclear what this model or policy might look like.
— Multi-Agent Systems Execute Arbitrary Malicious Code
(2503.12188 - Triedman et al., 15 Mar 2025) in Section 7 (Discussion), Blind trust in confused deputies