Mitigating Direct Prompt Injection in Agent Skills

Develop robust defenses against direct prompt injection attacks in the Agent Skills framework, where adversarial instructions embedded in SKILL.md are interpreted with operator-level authority, and ascertain the architectural reforms required to enable effective mitigation given the absence of a formal behavioral specification.

Background

The paper analyzes prompt injection in Agent Skills and distinguishes between direct and indirect injection. Direct injection arises when adversarial directives are embedded directly in SKILL.md and interpreted at operator level due to the framework’s trust model and lack of a data–instruction boundary.

The authors argue that existing defenses (e.g., structured query formats and privilege-based hierarchies) are architecturally inapplicable because SKILL.md occupies the operator layer. Without a formal behavioral specification of intended Skill conduct, the paper concludes that a complete defense for direct injection is currently unattainable within the existing architecture.

References

Direct injection therefore remains an open problem that cannot be fully addressed within the current architectural framework.

— Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis (2604.02837 - Li et al., 3 Apr 2026) in Section 7.1, Defense Directions (Against Prompt Injection (T3))

Mitigating Direct Prompt Injection in Agent Skills

Background

References

Related Problems