SPAdes genome assembler and its applications to emerging NGS technologies

Pavel Pevzner, University of California, San Diego
Host: Ben Langmead

The lion’s share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A goal of single-cell genomics (SCG) is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of SCG data is challenging because of highly non-uniform read coverage and highly elevated levels of chimeric reads/read-pairs. We describe SPAdes, an assembler for both SCG and standard (multicell) assembly that incorporates a number of new algorithmic ideas. We demonstrate that recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. We further describe (i) TrueSPAdes that assembles accurate and long (10Kb) reads generated by the recently released Illumina TrueSeq technology, (ii) transSPAdes for transcriptome assembly, and (iii) dipSPAdes for assembling highly polymorphic diploid genomes. Finally, we show that the de Bruijn graph assembly approach is well suited to assembling long and highly inaccurate SMRT reads generated by Pacific Biosciences.

Speaker Biography

Dr. Pevzner is Ronald R. Taylor Distinguished Professor of Computer Science and Director of the National Technology Center for Computational Mass Spectrometry at University of California, San Diego. He holds Ph.D. (1988) from Moscow Institute of Physics and Technology, Russia. He was named Howard Hughes Medical Institute Professor in 2006. He was elected the Association for Computing Machinery Fellow (2010) for “contribution to algorithms for genome rearrangements, DNA sequencing, and proteomics” and the International Society for Computational Biology Fellow (2012). He was awarded a Honoris Causa (2011) from Simon Fraser University in Vancouver. Dr. Pevzner has authored textbooks “Computational Molecular Biology: An Algorithmic Approach” in 2000, “Introduction to Bioinformatics Algorithms” in 2004 (with Neal Jones), and “Bioinformatics Algorithms: An Active Learning Approach” in 2014 (with Phillip Compeau).