In the fall of 2001, a series of biowarfare attacks were sent through the U.S. mail as letters containing a powder form of the anthrax bacterium, Bacillus anthracis. Subsequent to these attacks, our group was asked to conduct a rapid sequencing project to decode the genome of the strain used in the attacks. We compared this sequence to a second, reference anthrax genome, which we were sequencing simultaneously. I will discuss the difficult challenges posed by trying to compare two incomplete genome sequences, for which the sequencing error rate is relatively high, and to identify differences that would withstand the scrutiny of forensic investigators. Using state-of-the-art computational and statistical methods, our identified 60 novel, high-quality genetic markers distinguishing the attack strain from other strains (1). We also concluded that, in order to facilitate future comparisons at this level of detail, genome sequencing centers need to release not only genome sequences but also detailed information about the accuracy of every nucleotide in those sequences.
T.D. Read, S.L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. Busch, K.L. Smith, J.M. Schupp, D. Solomon, P. Keim, and C.M. Fraser. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science 296 (2002), 2028-2033.