Papers
Topics
Authors
Recent
Search
2000 character limit reached

The Information Theory of Similarity

Published 29 Nov 2025 in cs.IT, cs.DS, cs.IR, and cs.LG | (2512.00378v1)

Abstract: We establish a precise mathematical equivalence between witness-based similarity systems (REWA) and Shannon's information theory. We prove that witness overlap is mutual information, that REWA bit complexity bounds arise from channel capacity limitations, and that ranking-preserving encodings obey rate-distortion constraints. This unification reveals that fifty years of similarity search research -- from Bloom filters to locality-sensitive hashing to neural retrieval -- implicitly developed information theory for relational data. We derive fundamental lower bounds showing that REWA's $O(Δ{-2} \log N)$ complexity is optimal: no encoding scheme can preserve similarity rankings with fewer bits. The framework establishes that semantic similarity has physical units (bits of mutual information), search is communication (query transmission over a noisy channel), and retrieval systems face fundamental capacity limits analogous to Shannon's channel coding theorem.

Authors (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.