Papers
Topics
Authors
Recent
Search
2000 character limit reached

Approximating LCS and Alignment Distance over Multiple Sequences

Published 24 Oct 2021 in cs.DS | (2110.12402v1)

Abstract: We study the problem of aligning multiple sequences with the goal of finding an alignment that either maximizes the number of aligned symbols (the longest common subsequence (LCS)), or minimizes the number of unaligned symbols (the alignment distance (AD)). Multiple sequence alignment is a well-studied problem in bioinformatics and is used to identify regions of similarity among DNA, RNA, or protein sequences to detect functional, structural, or evolutionary relationships among them. It is known that exact computation of LCS or AD of $m$ sequences each of length $n$ requires $\Theta(nm)$ time unless the Strong Exponential Time Hypothesis is false. In this paper, we provide several results to approximate LCS and AD of multiple sequences. If the LCS of $m$ sequences each of length $n$ is $\lambda n$ for some $\lambda \in [0,1]$, then in $\tilde{O}_m(n{\lfloor\frac{m}{2}\rfloor+1})$ time, we can return a common subsequence of length at least $\frac{\lambda2 n}{2+\epsilon}$ for any arbitrary constant $\epsilon >0$. It is possible to approximate the AD within a factor of two in time $\tilde{O}_m(n{\lceil\frac{m}{2}\rceil})$. However, going below-2 approximation requires breaking the triangle inequality barrier which is a major challenge in this area. No such algorithm with a running time of $O(n{\alpha m})$ for any $\alpha < 1$ is known. If the AD is $\theta n$, then we design an algorithm that approximates the AD within an approximation factor of $\left(2-\frac{3\theta}{16}+\epsilon\right)$ in $\tilde{O}_m(n{\lfloor\frac{m}{2}\rfloor+2})$ time. Thus, if $\theta$ is a constant, we get a below-two approximation in $\tilde{O}_m(n{\lfloor\frac{m}{2}\rfloor+2})$ time. Moreover, we show if just one out of $m$ sequences is $(p,B)$-pseudorandom then, we get a below-2 approximation in $\tilde{O}_m(nB{m-1}+n{\lfloor \frac{m}{2}\rfloor+3})$ time irrespective of $\theta$.

Citations (1)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.