1. Use the FASTA search page to compare Drosophila glutathione transferase gtt1_drome (accession number P20432 in GenPept) to the Annotated PIR protein sequence database.
A. What is the highest scoring non-homolog?
B. Examine how the expectation value changes with different scoring matrices (BLOSUM62, PAM250) and different gap penalties. What happens to the E()-value for the highest scoring unrelated sequence? Look at the distribution of scores and the E()-value of the highest scoring unrelated sequence when the gap-open/gap-ext penalties are small (-8/-1).
C. Try the search with ktup=1. Again, look at the E()-values for distant homologs and the highest scoring unrelated sequence. What is the highest scoring unrelated sequence now?
2. Do the same search against the NCBI/BLAST SwissProt database
3. Do the same search using BLAST. Search the SwissProt database (use accession number 121694 or sp|P20432).
Compare the scores and alignments of gtt1_drome vs gth4_arath produced by fasta and blast.
4. Compare the scores and alignments of gtt1_drome vs gth4_arath with fasta if the BLOSUM62 scoring matrix is used with the default gap penalties.
Compare the scores and alignments of gtt1_drome vs gth4_arath with fasta if the BLOSUM62 scoring matrix is used with gap penalties of -8/-2.
5. Use TFASTX to compare gtt1_drome to the Invertebrate and Plant sections of Genbank. What is the highest scoring Plant homologue?
6. Use prss to evaluate the statistical significance of the match between gtt1_drome and gth4_arath. Note the difference between the p()-value and the E()-value. What would be the conversion between p() and E() for a search of SwissProt?
7. Low-complexity regions - use the fasta
program to search the
PIR1 database with grou_drome.
Do the same search against the
PIR1 (seg) database. Compare
the E()-value of the highest scoring unrelated sequence and the GTP-binding
regulatory protein RGFFBH.