Computational Genomics: Sequence Modeling (600.403, Fall 2004)

Course description:

This short course will cover probabilistic methods for modeling biological sequences (e.g., DNA and protein sequences). Topics include inferring relationships between and among sequences and evolutionary trees over sequences. Pre-requisites: knowledge of algorithms, probability and programming.

Dates/times:

This is a short course meeting from 1-2 pm on Mondays, Tuesdays, and Wednesdays from October 4 through November 1. Tentatively the location is Shaffer 2.

Instructors:

Noah Smith and Roy Tromble; please contact us by email if you have any questions about the course.

Syllabus (in pdf form)

Textbook:

R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.

Errata for the Durbin et al. book.

This book may also be useful but is not required:

D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.

Additional notes on HMMs, following the lectures on 10/13 and 10/18 are available in PDF.

Topics (subject to change):

Lecture Slides

Assignments

  • Class project (pdf)

    You will need dataset A. You will also need dataset B; to get that, come to class on November 1 with your partner (or wait until after that class and email us begging forgiveness!).

    Updated pbw and pvit (not completely tested, but appear to work). These might run faster than the old versions. tools.v2.linux.tar.gz tools.v2.sun.tar.gz


    Noah A. Smith and Roy Tromble