Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Enumeration of Similarity Joins

Published 5 May 2021 in cs.DS and cs.DB | (2105.01818v1)

Abstract: This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of $n$ points $A,B$ in $\mathbb{R}d$, a metric $\phi(\cdot)$, and a distance threshold $r > 0$, report all pairs of points $(a, b) \in A \times B$ with $\phi(a,b) \le r$. Our goal is to store $A,B$ into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from $A$ or $B$. We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for $\ell_1, \ell_\infty$ metrics with $\log{O(1)} n$ update time and delay. We show that such a data structure is not feasible for the $\ell_2$ metric for $d \ge 4$. For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for $\ell_p$ metric, with $\log{O(1)} n$ delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).

Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.