Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Published 3 Apr 2026 in cs.CR, cs.AI, and cs.CL | (2604.03081v1)

Abstract: LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces Document-Driven Implicit Payload Execution (DDIPE) as a novel supply-chain poisoning method for LLM coding agent ecosystems.
The methodology leverages camouflaged code and configuration poisoning to bypass alignment and architectural defenses, achieving up to 33.5% direct execution rates.
The findings advocate for heterogeneous model ensembles and refined audit practices as critical defenses against evolving supply-chain risks.

Supply-Chain Poisoning in LLM Coding Agent Skill Ecosystems: Document-Driven Implicit Payload Execution

Introduction and Motivation

The proliferation of LLM-based coding agents has led to diverse ecosystems leveraging modular “agent skills” acquired from open marketplaces. These skills bundle executable logic and structured documentation (SKILL.md), which agents parse and execute, often with system-level permissions. Critically, these marketplaces lack mandatory security vetting, creating a channel for supply-chain risk comparable to, but more acute than, conventional software package attacks. This paper analyzes whether and how supply-chain poisoning can subvert agent action space—inducing agents to silently execute malicious code (e.g., file writes, shell commands) even when prevailing alignment and architectural safeguards are in place (2604.03081).

Threat Model and End-to-End Attack Pipeline

The authors formalize a post-loading threat model: assuming an attacker successfully submits a malicious skill to a public skill market, the attack's success depends on inducing the agent to both generate and execute embedded payloads. The adversarial skill's documentation provides auto-executed operational guidance for the agent, blurring the boundary between passive documentation and imperative instruction. The attacker’s primary objective is to achieve either system-level compromise (e.g., credential or asset exfiltration) or infrastructure poisoning (e.g., tampering IaC, deployment configs).

Figure 2: End-to-end threat scenario for PoisonedSkills, where a disguised malicious skill triggers harmful execution on the victim agent after retrieval and loading.

Document-Driven Implicit Payload Execution (DDIPE)

Central to this paper’s contribution is the Document-Driven Implicit Payload Execution (DDIPE) attack paradigm. DDIPE exploits the agent’s default practice of treating documentation code examples and configuration snippets as trustworthy references, leading to their reproduction and execution during regular workflows. Rather than relying on direct imperative (and easily filtered) prompt injection, DDIPE payloads are camouflaged within legitimate-looking documentation constructs—such as code blocks or config files. This passive embedding achieves a dual bypass: model-level alignment (which is less sensitive to code than imperatives) and framework-level architectural defenses (which inadequately analyze in-doc code semantics).

The attack deploys the following embedding strategies:

Code Example Poisoning: Embedding exploit patterns into canonical code blocks, ensuring their propagation upon task fulfillment.
Configuration Template Poisoning: Backdooring YAML, JSON, or Makefile templates so agent-generated configs facilitate privilege escalation, persistence, or exfiltration.

Three major camouflage tactics enhance stealth: compliance disguise (e.g., presenting theft as telemetry), silent exception handling, and the use of authority-suggestive language and endpoints.

Automated Adversarial Skill Generation Framework

The adversarial skill generation employs a scalable, LLM-driven seed–mutation–validation pipeline. From 81 manually engineered seeds (covering 15 MITRE ATT&CK-aligned attack vectors), the pipeline uses a two-stage heuristic process—first substituting attack techniques and then applying camouflage and compositional mutations via automated LLM-based operators. Deduplication (Jaccard threshold), inverse-coverage sampling, and syntax validation guarantee both diversity and realism. The approach achieves a 13.2× expansion, producing 1,070 structurally diverse adversarial skills across the taxonomy.

Empirical Evaluation: Attack Efficacy and Defense Breakdown

Experimental Matrix

The empirical evaluation spans four popular agent frameworks (Claude Code, OpenHands, Codex, Gemini CLI) and five models (Claude Sonnet 4.6, GLM-4.7, MiniMax-M2.5, GPT-5.4, Gemini 2.5 Pro), collectively forming a robust cross-section of alignment strength and architectural defense. Model–framework axes are varied to differentiate the protective effect of alignment versus architectural isolation.

Execution Outcomes and Bypass Results

DDIPE achieves direct execution rates (DER) between 2.3% and 33.5%, depending on architecture/model pairing. Notably, explicit instruction baseline attacks are universally nullified (DER 0%) under strong defenses, reinforcing the stealth advantage of DDIPE. High DERs (e.g., 27.1% on OpenHands + GLM-4.7) are observed when frameworks rely solely on model alignment, indicating that architectural guardrails are critical compensators against weak alignment.

Under the most fortified regime (Claude Code + Sonnet 4.6), even with multi-layer checks, 2.3% of payloads are executed, with an overall bypass rate (including code generation) of 13.5%. Removing architectural guardrails amplifies execution (Sonnet 4.6, OpenHands: DER 10.7%), whereas strong alignment (Sonnet) constrains this effect far more than weaker models (GLM, MiniMax).

A single minimalistic pip.conf poisoning payload was universally executed by all tested models on Claude Code, underlining systemic blind spots when malicious intent is camouflaged as standard workspace configuration activity. DERs are higher for configuration-style attacks; verbose or overtly suspicious payloads are filtered more effectively.

Models differ sharply in their vulnerability profiles, exhibiting only 62.1% verdict agreement for the same set of payloads under identical framework controls, and just 1.6% of payloads bypass all three Claude Code models. Weak alignment leads to hundreds of “sleeper” payloads (blocked by architecture, triggered when only model-level alignment is present), while strong alignment makes architecture nearly redundant.

Cross-category results underscore preferential exploitation of supply-chain poisoning, environment/credential theft, and configuration tampering—tying bypass frequency to workflows present in normal development operations.

Validation in Production and Response

Responsible disclosure yielded real-world confirmation: four distinct zero-days, two already fixed. Root-cause analysis identifies framework privilege grants based on syntactic context rather than semantic analysis, and shows that even sandboxed or isolated agents (e.g., bubblewrap environments) remain vulnerable to subtle config- or metadata-level poisoning.

Practical and Theoretical Implications

This work demonstrates that LLM-driven agent skill ecosystems are exposed to a novel, high-severity supply-chain vector, capable of action-space hijacking at scale. Architectural and alignment defenses provide non-additive, asymmetric protection: defense diversity and intent-level reasoning are required for robust hazard reduction. The evidence suggests:

Multi-model/ensemble agent architectures drastically shrink the effective attack surface below any single-model's threshold, due to insufficiently overlapping bypass profiles.
Static, rule-based audits intercept a large fraction of attacks (90.7%) but are evaded by benign-disguised, low-lexical-indicator payloads.
Intent recognition and fine-grained, per-skill permissioning emerge as requisite next-generation defenses for open skill ecosystems.

Furthermore, the scalable attack generation pipeline exposes a future AI threat landscape where attackers can programmatically adapt and proliferate novel supply-chain attacks, outpacing manual curation or static review.

Conclusion

This research establishes that LLM coding agent skill ecosystems, as currently architected, are susceptible to document-driven, implicit supply-chain poisoning attacks even under state-of-the-art alignment and architectural defenses. The documented attacks traverse both cognitive (model output) and operational (system-level execution) domains, with empirical validation against real production frameworks and models. Effective mitigation necessitates a combination of heterogeneous model ensembles, semantic intent auditing, and per-skill execution scoping. The DDIPE paradigm and scalable adversarial skill generation pipeline highlighted in this study provide both a diagnostic and stress-testing methodology for evaluating agent supply-chain hardening strategies and underline the urgency of defense evolution as agent extensibility matures.

Markdown Report Issue