On the Decision Tree Complexity of String Matching

Published 28 Dec 2017 in cs.CC, cs.DS, and math.CO | (1712.09738v2)

Abstract: String matching is one of the most fundamental problems in computer science. A natural problem is to determine the number of characters that need to be queried (i.e. the decision tree complexity) in a string in order to decide whether this string contains a certain pattern. Rivest showed that for every pattern $p$, in the worst case any deterministic algorithm needs to query at least $n-|p|+1$ characters, where $n$ is the length of the string and $|p|$ is the length of the pattern. He further conjectured that this bound is tight. By using the adversary method, Tuza disproved this conjecture and showed that more than one half of binary patterns are {\em evasive}, i.e. any algorithm needs to query all the characters (see Section 1.1 for more details). In this paper, we give a query algorithm which settles the decision tree complexity of string matching except for a negligible fraction of patterns. Our algorithm shows that Tuza's criteria of evasive patterns are almost complete. Using the algebraic approach of Rivest and Vuillemin, we also give a new sufficient condition for the evasiveness of patterns, which is beyond Tuza's criteria. In addition, our result reveals an interesting connection to \emph{Skolem's Problem} in mathematics.