Runtime Behavioral Monitoring for Agent Skills

Develop runtime monitoring approaches that can reliably distinguish malicious agent actions from legitimate ones in deployments of the Agent Skills framework without relying on a formal behavioral specification and while maintaining low false positive rates.

Background

Because Agent Skills specify behavior in natural language and operate over an unbounded action space, there is no formal behavioral specification to anchor runtime checks.

The authors note that approaches like anomaly detection, behavioral fingerprints, and LLM-based intent classification remain unvalidated at scale, while excessive false positives could render monitoring impractical.

References

Developing runtime monitoring approaches that can distinguish malicious agent actions from legitimate ones---without a formal behavioral specification and without generating prohibitive false positive rates---is an open challenge.

— Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis (2604.02837 - Li et al., 3 Apr 2026) in Section 7.2, Open Challenges (C3: Runtime Behavioral Monitoring)

Runtime Behavioral Monitoring for Agent Skills

Background

References

Related Problems