

Title: Computational Challenges of Next-Generation Genomics
Abstract:
Next-generation sequencing technology allows us to peer inside the cell in exquisite detailed, revealing new insights into biology, evolution, and disease that would have been impossibly expensive to find just a few years ago. The shorter read lengths and enormous volumes of data produced by NGS experiments present many computational challenges that my group is working to address. This talk will discuss three problems:
(1) mapping next-gen sequences onto the human genome and other large genomes at very high speed; (2) spliced alignment of RNA transcripts to the genome, including fusion transcripts; and (3) transcript assembly and quantitation from RNA-Seq experiments including the discovery of alternative splice variants.
We are developing new computational algorithms to solve each of these problems. For alignment of short reads to a reference genome, our Bowtie program, using the Burrows-Wheeler transform, aligns short reads many times faster than competing systems, with very modest memory requirements [1]. To align RNA-Seq reads (transcripts) to a genome, we have developed a suite of tools including TopHat and Cufflinks [2,3], which can align across splice junctions and reconstruct full-length transcripts from short reads.
This talk will describe joint work with current and former members of my group including Ben Langmead, Cole Trapnell, Mike Schatz, Daehwan Kim, Geo Pertea, Daniela Puiu, and Ela Pertea; and with collaborators including Mihai Pop and Lior Pachter.
1. B. Langmead et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009, 10:R25.
2. C. Trapnell, L. Pachter, and S.L. Salzberg. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009 25(9):1105-1111.
3. C. Trapnell, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28, 511-515 (2010).