This Perl script will run BLAST and/or FASTA searches on all the predictions made by GLIMMER 2.0 for genome sequence. You need to have BLAST (either wu-blast or ncbi-blast) and FASTA (fasta3 from Bill Pearson's lab at UVA) installed locally. This is UNSUPPORTED code! We hope you find it useful, but if you can't get it to run, you'll have to find someone local to help you. I'm afraid we don't have time to respond to help inquiries. See down below for CHANGES (Aug 2001) HOW TO RUN BLASTFASTA.PL 1-) set the directory names properly so that $FASTA_DIRECTORY points to the directory where fasta is installed $BLAST_DIRECTORY points to the directory where blast is installed $DBASE_DIRECTORY points to the database where the searches are made 2-) Set the environment variable BLASTMAT to $BLAST_DIRECTORY/matrix Hence if BLAST is in the directory /usr/local/bin/blast, then type: setenv BLASTMAT /usr/local/bin/blast/matrix at the Unix command prompt. 3-) To run the program: blastfasta.pl -f -b "seqfile" should be a nucleotide sequence in FASTA format "Output file" is the file where both FASTA and BLAST results for the sequence will be written. "Number of hits" indicates how many of the top hits (with highest E-values) should be output. "-b" lets you omit blast runs while "-f" lets you include fasta runs Examples: blastfasta3.pl aquae.fna aeolicus_g2.coord output 3 -f -b This run will run only FASTA on the file aquae.fna and write the top 3 hits to the file output. blastfasta3.pl aquae.fna aeolicus_g2.coord output 3 -f This run will run both FASTA and BLAST and output the top 3 hits of both in the file output. 4-) Be patient! The searches may take a long time for a large genome, depending on what CPU you're using. Check the output file from time to time to ensure it is running the way you want. CHANGES August 2001 Thanks to Melanie Duffield and Carl Mayers for fixing a major bug: the script was not properly searching genes on the complementary strand. Here are some additional comments about their changes/bugfix: 1. As well as the codons and corresponding amino acids for translating DNA into proteins, the script includes reverse complements for the complementation of the DNA sequences. 2. Depending on whether the orf is on the forward or reverse strand it is now treated slightly differently. Forward - These sequences are converted into protein as before Reverse - These sequences are first reversed and then complemented (using the sub-routine complement_strand). They are then converted into protein as for the forward sequences. 3. All sequences are now passed on to FASTA or BLAST for database searching. 4. There is also some new code to write a FASTA format file containing all the predicted protein sequences. These lines are commented out, but if you remove the comments on the lines # open E ,">> $BLAST_DIRECTORY/db/fasta"; and those beginning "print E ..." and "close E ..." then you will get this additional output file.