Computational Challenges of Next-Generation Genomics

Steven Salzberg, Johns Hopkins University School of Medicine

Next-generation sequencing technology allows us to peer inside the cell in exquisite detailed, revealing new insights into biology, evolution, and disease that would have been impossibly expensive to find just a few years ago. The shorter read lengths and enormous volumes of data produced by NGS experiments present many computational challenges that my group is working to address. This talk will discuss three problems: (1) mapping next-gen sequences onto the human genome and other large genomes at very high speed; (2) spliced alignment of RNA transcripts to the genome, including fusion transcripts; and (3) transcript assembly and quantitation from RNA-Seq experiments including the discovery of alternative splice variants. We are developing new computational algorithms to solve each of these problems. For alignment of short reads to a reference genome, our Bowtie program, using the Burrows-Wheeler transform, aligns short reads many times faster than competing systems, with very modest memory requirements [1]. To align RNA-Seq reads (transcripts) to a genome, we have developed a suite of tools including TopHat and Cufflinks [2,3], which can align across splice junctions and reconstruct full-length transcripts from short reads.

This talk will describe joint work with current and former members of my group including Ben Langmead, Cole Trapnell, Mike Schatz, Daehwan Kim, Geo Pertea, Daniela Puiu, and Ela Pertea; and with collaborators including Mihai Pop and Lior Pachter.

  1. B. Langmead et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009, 10:R25.
  2. C. Trapnell, L. Pachter, and S.L. Salzberg. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009 25(9):1105-1111.
  3. C. Trapnell, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28, 511-515 (2010).

Speaker Biography

Dr. Steven Salzberg is a Professor of Medicine in the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins University, where he holds joint appointments in the Departments of Biostatistics and Computer Science. From 2005-2011, he was the Director of the Center for Bioinformatics and Computational Biology (CBCB) and the Horvitz Professor of Computer Science at the University of Maryland, College Park. From 1997-2005 he was Senior Director of Bioinformatics at The Institute for Genomic Research (TIGR) in Rockville, Maryland, one of the world’s leading DNA sequencing centers at the time. Dr. Salzberg has authored or co-authored two books and over 200 publications in leading scientific journals, and his h-index is 83. He is a Fellow of the American Association for the Advancement of Science (AAAS) and the Institute for Science in Medicine, and a former member of the Board of Scientific Counselors of the National Center for Biotechnology Information at NIH. He currently serves on the Editorial Boards of the journals Genome Research, Genome Biology, BMC Biology, Journal of Computational Biology, PLoS ONE, BMC Genomics, BMC Bioinformatics, and Biology Direct, and he is a member of the Faculty of 1000. He co-chaired the Third (1999) through the Eighth (2005) Conferences on Computational Genomics, the 2007 and 2009 International Conferences on Microbial Genomics, and the 2009 Workshop on Algorithms in Bioinformatics.