Papers
Topics
Authors
Recent
Search
2000 character limit reached

From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching

Published 30 Jan 2026 in cs.CR and cs.AI | (2601.23088v1)

Abstract: Semantic caching has emerged as a pivotal technique for scaling LLM applications, widely adopted by major providers including AWS and Microsoft. By utilizing semantic embedding vectors as cache keys, this mechanism effectively minimizes latency and redundant computation for semantically similar queries. In this work, we conceptualize semantic cache keys as a form of fuzzy hashes. We demonstrate that the locality required to maximize cache hit rates fundamentally conflicts with the cryptographic avalanche effect necessary for collision resistance. Our conceptual analysis formalizes this inherent trade-off between performance (locality) and security (collision resilience), revealing that semantic caching is naturally vulnerable to key collision attacks. While prior research has focused on side-channel and privacy risks, we present the first systematic study of integrity risks arising from cache collisions. We introduce CacheAttack, an automated framework for launching black-box collision attacks. We evaluate CacheAttack in security-critical tasks and agentic workflows. It achieves a hit rate of 86\% in LLM response hijacking and can induce malicious behaviors in LLM agent, while preserving strong transferability across different embedding models. A case study on a financial agent further illustrates the real-world impact of these vulnerabilities. Finally, we discuss mitigation strategies.

Summary

  • The paper introduces CacheAttack, a black-box framework that exploits fuzzy hash properties in semantic caches to hijack LLM responses.
  • It shows that locality-preserving embeddings yield up to 86% cache hit rates, enabling adversaries to redirect and manipulate responses.
  • Empirical results highlight transferability across models and emphasize the trade-off between cache efficiency and robust security measures.

Key Collision Attacks on Semantic Caching for LLMs

Introduction and Problem Motivation

Semantic caching has become a core optimization for modern LLM deployments, significantly improving inference efficiency and reducing redundant computation by reusing previously computed responses for semantically similar queries. Leading cloud providers, including AWS and Microsoft, have integrated semantic caches into production LLM stacks. These systems utilize embedding-derived keys (semantic vectors) instead of exact prompt matches, enabling "fuzzy" cache hit criteria that maximize reuse rates.

Despite performance gains, this paper establishes and formalizes an intrinsic integrity vulnerability in such designs: the very locality that enables semantic reuse fundamentally contradicts the cryptographic security properties (such as the avalanche effect) that prevent targeted collision attacks. The authors show that semantic cache keys act as fuzzy hash functions, lacking collision resistance, which in turn makes them susceptible to adversarial examples constructed specifically to collide with legitimate semantic keys.

The work introduces CacheAttack, an automated black-box attack framework that systematically demonstrates and quantifies this vulnerability. The attack achieves up to 86% hit rates in hijacking LLM responses and induces targeted malicious agent behaviors, even under strong black-box constraints and in multi-tenant settings. Figure 1

Figure 1: Overview of key collision in semantic caching, showing attacker and victim mapping to the same semantic key and resultant response hijack.

Conceptual Analysis: Fuzzy Hashes vs. Avalanche Effect

The core technical argument models semantic cache keys as locality-preserving fuzzy hashes. Formally, prompts are encoded via an embedding model f:p→Rdf : p \to \mathbb{R}^d, with cache hits determined by embedding similarity (semantic cache) or by hash bucket match (semantic KV cache with LSH-based partition). This fuzziness yields high hit rates for similar queries but creates a coarse collision boundary in the vector space.

Whereas cryptographically secure hashes must maximize output distance for even minor input changes (the avalanche effect), semantic caches intentionally collapse embedding neighborhoods to the same key to maximize reuse. This trade-off between locality (performance) and collision resistance (security) is inescapable in current architectures.

Adversaries can exploit this by crafting semantically distinct prompts whose embeddings collide with benign user queries under the cache's matching rule, producing a false-positive cache hit and thus unintentional response reuse.

Threat Model and Attack Formalization

The attack assumes the adversary can submit arbitrary prompts and observe the system's output and latency, but cannot access embedding vectors, cache internals, or similarity thresholds—a stringent black-box setting. Attackers optimize adversarial prompts—typically by appending algorithmically generated suffixes—to induce embedding collisions with target (victim) queries, validated via observable cache hit behavior (notably latency analysis). Figure 2

Figure 2: Case study of a financial agent under a cache collision, illustrating real-world financial harm resulting from response hijacking.

The ultimate goal is to hijack responses and redirect agent flows, resulting in semantic misalignment, misinformation, policy violations, or financial loss via compromised tool invocation.

CacheAttack Framework

CacheAttack employs a generator-validator pipeline:

  • Generator: Uses a search (GCG-based) to optimize discrete suffixes to adversarial prompts that maximally align with the victim's embedding or hash bucket under a surrogate model. An explicit loss term balances collision strength against prompt fluency/perplexity for stealthiness.
  • Validator: Since system internals are opaque, cache hit status is inferred by statistical modeling of response latency, dynamically calibrated to distinguish cache hits from misses in noisy conditions.

Two variants are proposed:

  • CacheAttack-1: Direct hit/miss validation interacts repeatedly with the black-box system, incurring high time/efficiency costs due to cache TTL.
  • CacheAttack-2: Uses a surrogate embedding model to pre-filter candidates, querying the target system only for final verification—thus significantly increasing attack scalability and stealth.

Empirical Evaluation

LLM Response Hijacking

On a curated adversarial dataset (SC-IPI) of security-critical indirect prompts, CacheAttack-2 achieves hit rates (HR) and injection success rates (ISR) above 80% for both semantic and semantic KV cache types. These results strongly indicate a systematic vulnerability, not accidental collisions. Figure 3

Figure 3: Perplexity comparison on Natural Questions, showing cache-inserted prompt PPL distributions for normal queries and adversarial triggers.

Agentic Tool Invocation Hijacking

CacheAttack is demonstrated to compromise LLM-powered agents by hijacking tool invocation paths. Tool selection accuracy and final answer accuracy drop by over 80% when attacks succeed, showing cascading downstream vulnerabilities not just limited to the immediate LLM response.

Cross-model and Backend Generalization

The attacks exhibit transferability across embedding models and backend LLMs—a suffix optimized on one embedding model can readily trigger collisions in another, provided architectural similarity is high. Across backend LLMs (Qwen, DeepSeek, Llama, Mistral), hit rates remain stable, emphasizing the embedding space vulnerability rather than LLM architecture.

Real-world Case Study

In a financial agent scenario, an attacker hijacks a benign investment advisory query, causing the agent to execute a malicious trade order. The attacker does not overwrite cache entries but simply crafts an adversarial prompt that collides with a prior, attacker-planted cache key, fully bypassing downstream security alignment by exploiting the locality-preserving semantics of the cache itself.

Defense Strategies and Trade-offs

Three defense mechanisms are evaluated:

  • Key Salting: Augmenting semantic keys with a cache-local secret (prefix, suffix, or template) reduces attack hit rates by up to 25 percentage points, effectively blocking attacks that rely on surrogate model transferability. However, it does not eliminate attacks for insiders or in weakly isolated settings.
  • Perplexity Filtering: Screening prompts at cache insertion using LLM perplexity effectively detects most adversarial triggers, as adversarial suffixes often induce abnormally high PPL scores compared to natural queries.
  • Per-user Cache Isolation: Namespacing cache keys to users nullifies cross-user collisions but negates the performance and storage benefits of shared semantic caching, introducing operational overhead and increased latency. Figure 4

    Figure 4: Sensitivity analysis of CacheAttack success rate as similarity threshold varies, illustrating the trade-off between cache efficiency and adversarial robustness.

Fundamental Trade-off: Security mechanisms that block collisions typically require loosening (raising) the similarity threshold or isolating namespaces, which in turn degrade cache efficiency and increase inference costs—the very benefit that semantic caches are designed to provide.

Conclusions and Future Directions

This work rigorously establishes that semantic caching for LLMs, in both semantic response and KV cache forms, is inherently vulnerable to adversarial key collisions due to the locality/collision resistance trade-off in embedding-based keys. CacheAttack demonstrates practical exploits with high hit rates even in black-box, cross-model contexts, and real-world agentic settings. Defensive strategies can mitigate but not eliminate the attack surface without compromising cache efficiency.

The results suggest that current semantic caching architectures are unsuitable for untrusted, multi-tenant, or security-sensitive AI deployments without careful redesign. Future work should focus on developing collision-resistant cache mechanisms and domain-specific adversarial detection, particularly for agentic workflows where cascading errors propagate rapidly. Design of application-layer defenses must explicitly account for the semantic and cryptographic properties of embedding-based keys.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.