Papers
Topics
Authors
Recent
Search
2000 character limit reached

Time-Space Tradeoffs for Finding a Long Common Substring

Published 4 Mar 2020 in cs.DS | (2003.02016v2)

Abstract: We consider the problem of finding, given two documents of total length $n$, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic $O(n)$-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require $\Theta(n)$ space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildh{\o}j (CPM 2013) showed that for $n{2/3} \le s \le n{1-o(1)}$, the LCS problem can be solved in $O(s)$ space and $O(\frac{n2}{s})$ time. Kociumaka et al. (ESA 2014) generalized this tradeoff to $1 \leq s \leq n$, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length $L$ of the sought LCS is large. For $1 \leq s \leq n$, we show that the LCS problem can be solved in $O(s)$ space and $\tilde{O}(\frac{n2}{L\cdot s}+n)$ time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.

Citations (9)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.