Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimap2: pairwise alignment for nucleotide sequences

Published 4 Aug 2017 in q-bio.GN | (1708.01492v5)

Abstract: Motivation: Recent advances in sequencing technologies promise ultra-long reads of $\sim$100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of $\ge$100bp in length, $\ge$1kb genomic reads at error rate $\sim$15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times faster than mainstream short-read mappers at comparable accuracy and $\ge$30 times faster at higher accuracy for both genomic and mRNA reads, surpassing most aligners specialized in one type of alignment. Availability and implementation: https://github.com/lh3/minimap2 Contact: [email protected]

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.