StringTie: a new approach to transcriptome assembly from RNA-Seq data

Mihaela Pertea, Johns Hopkins University School of Medicine
Host: Steven Salzberg

Sequencing of mRNA through RNA-seq has transformed our ability to identify the genes responsible for adaptive evolution, a fundamental topic in modern evolutionary biology. Using RNA-seq, scientists are now able to generate extensive transcriptome data from diverse eukaryotes in a timely and cost-effective manner, and simultaneously characterize transcribed genes in multiple cell types and changing environments. The enormous amounts of data generated by the sequencing projects require sophisticated, efficient, and innovative new algorithms to analyze them. Previous efforts to model genes de novo, via recognition of splice sites, coding regions, and other signals, have been superseded by more accurate methods based on RNA-seq data. Here we introduce a new transcript assembly algorithm, StringTie, which uses a combination of de novo assembly ideas and a novel application of a network flow algorithm, a method imported from other areas of computer science research.

Speaker Biography

Mihaela Pertea is a Computer Scientist who since 2011 has been an Assistant Professor in the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins University. She received her B.S. and M.S. degrees in Computer Science from the University of Bucharest in Romania, and her M.S.E and Ph.D in Computer Science from Johns Hopkins University. In 2001 she joined The Institute for Genomic Research (TIGR) in Rockville, Maryland, one of the world’s leading DNA sequencing centers at the time, where she was a Bioinformatics Scientist until 2005. From 2005-2011 she was an Assistant Research Scientist in the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park. Dr. Pertea’s major area of research is in computational biology - an interdisciplinary field situated at the intersection of several scientific disciplines, including molecular biology, computer science and statistical mathematics.