Papers
Topics
Authors
Recent
Search
2000 character limit reached

SLImE: Semantic Leakage from Image Embeddings

Updated 6 February 2026
  • SLImE is a phenomenon where compressed image embeddings inadvertently retain rich semantic structure, allowing inference of tags, captions, and scene graphs without full image recovery.
  • It utilizes a two-stage pipeline that aligns victim embeddings to an attack space via a linear mapping, enabling effective retrieval of semantic neighborhoods even with minimal alignment samples.
  • Empirical evaluations reveal high semantic recovery metrics across multiple models, prompting research into mitigation strategies like differential privacy, watermarking, and embedding sanitization.

Semantic Leakage from Image Embeddings (SLImE) designates the phenomenon whereby compressed image embeddings, absent access to the original images or encoder, still expose substantial semantic structure identifiable via standalone analysis. SLImE formalizes an attack scenario in which alignment and retrieval operations on image embeddings enable the inference of objects, relationships, and even grammatically coherent descriptions. The critical vulnerability lies in the preservation of semantic neighborhoods under linear or nonlinear mappings, facilitating the propagation of semantic content through sequences of lossy transformations. This mechanism renders image embeddings intrinsically susceptible to privacy risks regardless of pixel-level invertibility or downstream task specialization (Chen et al., 30 Jan 2026).

1. Formalization of Semantic Leakage

Consider an image encoding scheme fV:XRmf_V: X \to \mathbb{R}^m, mapping images xXx \in X to mm-dimensional, L2-normalized embeddings in a "victim" space VV. Let fAf_A denote an "attack" encoder producing nn-dimensional attack-space embeddings. Semantic leakage is defined as the ability to reconstruct semantic content (e.g., tags or captions) from fV(x)f_V(x) by mapping into AA and using retrieval or generation methods, without inverting the embedding to recover xx itself.

Core Definitions

  • Linear Alignment: The attacker fits a linear mapping WRm×nW \in \mathbb{R}^{m \times n} such that, for any vVVv_V \in V,

vVA=vVWfA(x),v_{V \to A} = v_V W \approx f_A(x),

with WW given by the Moore–Penrose pseudoinverse:

W=(VV)1VA,W = (V^\top V)^{-1} V^\top A,

over a small set of aligned pairs.

  • Semantic Neighborhoods: For a tag vocabulary T\mathcal{T} with embeddings {et}\{e_t\}, the mm-neighborhood of a tag tt is

Nm(t):=Top-muTet,eu,\mathcal{N}_m(t) := \text{Top-}m_{u \in \mathcal{T}} \langle e_t, e_u \rangle,

where ,\langle\cdot, \cdot\rangle is cosine similarity.

  • Semantic Neighborhood Preservation: After alignment, for each image ii, the set of Top-KK tags retrieved from vVAv_{V \to A} (denoted PiP_i) is said to preserve neighborhoods at scale (m,K)(m, K) if every tag in PiP_i falls within the mm-neighborhood of a reference tag gGig \in G_i (the Top-KK tags from fA(xi)f_A(x_i)).

Leakage Proposition

The intrinsic vulnerability arises when local semantic neighborhood structure is preserved under WW; this alone suffices to reconstruct meaningful high-level semantics even when exact image or label recovery is impossible (Chen et al., 30 Jan 2026).

2. The Few-TEI Inference Framework

The Few-TEI framework operationalizes semantic leakage via a two-stage pipeline:

Stage 1 – Training a Local Retriever:

  1. Parse captions to structured tags (relational and attribute tuples) using a public (image,caption) corpus.
  2. Contrastively align images and tag embeddings using a loss of the form:

Lit=1Bi=1B1PijPilogesijk=1Nesik,\mathcal{L}_{i\to t} = -\frac{1}{B} \sum_{i=1}^{B} \frac{1}{|\mathcal{P}_i|} \sum_{j \in \mathcal{P}_i} \log \frac{e^{s_{ij}}}{\sum_{k=1}^N e^{s_{ik}}},

where sij=αei,etjs_{ij} = \alpha \langle e_i, e_{t_j} \rangle.

  1. Train a ranking module (DCN v2) on interaction features to promote hard negative discrimination.

Stage 2 – Inference and Attacks:

  • Align victim embeddings to attack space via WW.
  • For each vVv_V, compute vVA=vVWv_{V \to A} = v_V W.
  • Retrieve Top-KK tags P=local_retriever(vVA)P = \text{local\_retriever}(v_{V \to A}).
  • Feed tags to an LLM or VLM to generate grammatical captions or structured scene graphs.
  • Optionally, pass vVAv_{V \to A} to a diffusion model to synthesize a low-fidelity image.
  • Apply adaptive vision-language attacks by extracting detected objects, relations, and scene graphs from LLM/VLM outputs.

All steps operate solely on the standalone embeddings, without task-specific decoders or direct access to original pixels.

3. Empirical Evaluation and Observed Leakage

SLImE has been validated across multiple widely used embedding models—proprietary (GEMINI, Cohere) and open-source (Nomic, CLIP)—and diverse data domains (COCO, nocaps).

Key Result Metrics

  • Tag Retrieval: Exact-match F1 scores are typically <0.25<0.25, but semantic neighborhood F1 rises to $0.8$ at m50m\approx50 and 10k alignment samples; even a single alignment sample produces nontrivial (ROUGE-L>20>20) leakage.
  • Text Reconstruction: With K=10K=10, 10k alignments yield ROUGE-L50\sim50 vs. LLM captions on reference tags, and ROUGE-L\sim10–30 vs. human captions.
  • Adaptive Attacks: Scene graph F1 of $0.75–0.88$, object/relation F1 of $0.5–0.7$ against LLM/VLM extraction outputs when using low-fidelity reconstructed images and tag sets.
  • Cross-Domain: BLEU-4 and ROUGE-L degrade moderately in the out-of-domain setting but remain significantly above trivial baselines.

Notably, increasing alignment sample size increases cosine similarity between attack and mapped embeddings and improves all downstream semantic recovery metrics smoothly.

4. Theoretical Insights, Security Implications, and Countermeasures

Semantic leakage persists under severe compression, alignment using as few as one seed sample, and in the absence of decoder access. The core risk emerges from the deliberate optimization of image embeddings for retrieval, which enforces local neighborhood preservation by design. This property enables the recovery of semantic content by "neighborhood hopping" in the aligned space.

Mitigation Directions

  • Semantic-level Differential Privacy: Modifying the embedding distribution to disrupt the correspondence of neighborhoods and private tags, thereby degrading inference without total utility loss.
  • Watermarking/Adversarial Perturbation: Introducing structured, targeted distortions that selectively impair alignment while (ideally) preserving task-relevant semantics.
  • Embedding-space Sanitization: Analogous to recent developments in text embedding privacy, the challenge is to adjust visual embeddings post-hoc or at training time to counter attacks, without erasing all downstream value.

An open problem is formulating a quantitative trade-off between semantic utility and privacy leakage in retrieval-oriented representations.

5. Relation to Broader Semantic Leakage and Dense Representation Risks

The SLImE framework generalizes prior concerns about semantic leakage in visual semantic embedding models for zero-shot learning (ZSL). In that context, semantic leakage referred to label/word-embedding information being inadvertently "baked in" during encoder training, as measured by the mutual information I(f(X);Y)I(f(X); Y) exceeding zero (Jiao et al., 2021). Recent work has shown that this risk persists even when supervision is ostensibly absent, owing to information-rich geometric alignment between features and distributed word spaces.

Further distinctions appear in settings such as attribute leakage in text-to-image editing, where cross-object correlations and attention bleed-through have prompted sophisticated architectural interventions (e.g., ORE, RGB-CAM, BB) to spatially disentangle semantics (Mun et al., 2024). However, SLImE demonstrates that manifold preservation in compressed embedding spaces—absent explicit label or token access—remains a core privacy vulnerability distinct from instance-level pixel recovery or attribute drift.

6. Open Problems and Future Directions

Current research underscores the challenge of defending against semantic-level inference attacks given the default emphasis on retrieval efficacy and neighborhood geometry. There is no evidence that restricting API access or limiting pixelwise reconstruction is sufficient to impede SLImE-type attacks. The practical and theoretical boundaries of utility-preserving privacy in multimodal embedding spaces remain unresolved. Achieving robust semantic privacy likely requires fundamentally new representation learning paradigms capable of blunting neighborhood preservation relative to sensitive semantic attributes, without compromising retrieval or transfer for permissible tasks (Chen et al., 30 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Semantic Leakage from Image Embeddings (SLImE).