Assignment 4.

Due date: Thursday, November 7.

Note: we have a guest speaker on November 7, Chris Overton. Please turn in the assignment by email, not at the talk. If you need to give me hardcopy, put it in my box in 224 NEB. And don't miss the talk! The assignment is due by midnight on Thursday.

Implement the Smith-Waterman dynamic programming algorithm for local sequence alignment. Assume a similarity metric is being used to compare two protein (amino acid) sequences. The similarity matrix you should use is the PAM-120 matrix, here. It is organized alphabetically by the 20 one-letter codes of the amino acids, with zeros in positions of unused letters. Note that this is already in log-odds form, so you simply add up the scores across an alignment to get a total score.

Your algorithm should take two sequences and produce the optimal alignment of a segment of one sequence with a segment of another. Test your algorithm on at least two different pairs of sequences and include the output with your code. The output must be easily readable; for example, if the sequences are:

WERPKLAQCWTUPDGPRTMNBFCD
QCURGUPELRTT

then the output should be something like:

WERPKLAQC-WTUPDGPRTMNBFCD
      -QCURGUPELRTT-
Similarity score=3.754

You can generate artificial sequences to test your algorithm, or if you prefer, use real data from one of the protein or DNA databases. If the proteins are more than 80 amino acids long, you should add line breaks to make the output readable. Note that although you are permitted to discuss the assignment, each of you should produce your own separate implementation. You should turn in source code plus the two sample inputs and outputs.

Back to class Web page