Papers
Topics
Authors
Recent
Search
2000 character limit reached

Computing MEMs and Relatives on Repetitive Text Collections

Published 18 Oct 2022 in cs.DS | (2210.09914v5)

Abstract: We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern $P[1 .. m]$ on a large repetitive text collection $T[1 .. n]$, which is represented as a (hopefully much smaller) run-length context-free grammar of size $g_{rl}$. We show that the problem can be solved in time $O(m2 \log\epsilon n)$, for any constant $\epsilon > 0$, on a data structure of size $O(g_{rl})$. Further, on a locally consistent grammar of size $O(\delta\log\frac{n}{\delta})$, the time decreases to $O(m\log m(\log m + \log\epsilon n))$. The value $\delta$ is a function of the substring complexity of $T$ and $\Omega(\delta\log\frac{n}{\delta})$ is a tight lower bound on the compressibility of repetitive texts $T$, so our structure has optimal size in terms of $n$ and $\delta$. We extend our results to several related problems, such as finding $k$-MEMs, MUMs, rare MEMs, and applications.

Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.