Genome Sequencing

Matching reads to reference using FFTs

Aligning reads from a genome sequence to a reference sequence is an important step in sequencing genome data. We treat the reads and reference as signals and use signal processing techniques like fast fourier transforms (FFTs) to find matches between the reads and reference. Once matches are found, we use GraphLab to assemble the matched reads into a single sequence (SNH12). This work was done as part of the final project for a big data course offered at the Johns Hopkins University.

Related publications

[SNH12]   A Sinha, Shuya Chu, Yuge Gong. “FFTLab: Genome Resequencing Pipeline using Signal Processing for Alignment & GraphLab for Assembly”, Submitted as final project report for EN.600.615(01): Big Data, Small Languages, Scalable Systems, The Johns Hopkins University, Baltimore, MD (December 10, 2012)