Homework 2.  This assignment on the use of FASTA and BLAST is borrowed from Prof. William Pearson of the University of Virginia.  The exercises are copyright 1999 by William Pearson (with slight revisions by Steven Salzberg).


These exercises use programs on the FASTA WWW Search page


1. Use the FASTA search page to compare Drosophila glutathione transferase gtt1_drome (accession number P20432 in GenPept) to the Annotated PIR protein sequence database.

A. What is the highest scoring non-homolog?

B. Examine how the expectation value changes with different scoring matrices (BLOSUM62, PAM250) and different gap penalties. What happens to the E()-value for the highest scoring unrelated sequence? Look at the distribution of scores and the E()-value of the highest scoring unrelated sequence when the gap-open/gap-ext penalties are small (-8/-1).

C. Try the search with ktup=1. Again, look at the E()-values for distant homologs and the highest scoring unrelated sequence.  What is the highest scoring unrelated sequence now?

2. Do the same search against the NCBI/BLAST SwissProt database

3. Do the same search using BLAST. Search the SwissProt database (use accession number 121694 or sp|P20432).

Compare the scores and alignments of gtt1_drome vs gth4_arath produced by fasta and blast.

4. Compare the scores and alignments of gtt1_drome vs gth4_arath with fasta if the BLOSUM62 scoring matrix is used with the default gap penalties.

Compare the scores and alignments of gtt1_drome vs gth4_arath with fasta if the BLOSUM62 scoring matrix is used with gap penalties of -8/-2.

5. Use TFASTX to compare gtt1_drome to the Invertebrate and Plant sections of Genbank. What is the highest scoring Plant homologue?

6. Use prss to evaluate the statistical significance of the match between gtt1_drome and gth4_arath. Note the difference between the p()-value and the E()-value. What would be the conversion between p() and E() for a search of SwissProt?

7. Low-complexity regions - use the fasta program to search the PIR1 database with grou_drome. Do the same search against the PIR1 (seg) database. Compare the E()-value of the highest scoring unrelated sequence and the GTP-binding regulatory protein RGFFBH.