Empirical Research Methods in Computer Science (600.408)

Fall, 2005

This course is designed to teach computer scientists to be good, well, scientists. Most of the computer science curriculum teaches you how to be a good engineer (e.g., computer programming, software engineering, and object-oriented design), how to think about problems abstractly (e.g., algorithms, data structures, theory of network communcation, and automata theory), how computing can be applied to solve various problems (e.g., vision, speech and language processing, and transaction processing systems), or how computer systems work (e.g., operating systems). This course takes the view that computational objects (like computer programs) are complex systems that sometimes need to be studied as one would study a biological or physical system: by rolling up your sleeves and running controlled experiments.

Experimentation in computer science aims to answer questions like these:

As computer scientists, we have an advantage that other scientists don't have: we can control almost all experimental conditions, because we made the system we're studying. Running another experiment, for us, is relatively cheap: we just need cycles and time. Once you realize this, your imagination may run wild as you consider the many, many, many experiments it would be possible to run - far more than you've got time for! This so-called advantage can feel like a bit of a curse, especially if you're a perfectionist.

This course will teach you how to be rigorous about experimentation, from the design of controlled experiments to statistical analysis. We'll cover areas that are not emphasized in most statistics classes but which are highly relevant to computer scientists, like computationally-intensive nonparametric statistical tests. We'll also cover probabilistic modeling, parameter tuning, and exploratory data analysis.

More information will be posted here soon!

Instructors: Noah Smith and David Smith (no, we're not related), two PhD students in the CS department. We run experiments every day. We decided to teach this class because we had to learn this stuff mostly on our own.

Format: This is a one-credit short course. We'll meet for 75 minutes once a week, for eight weeks.

Date/Time: Wednesdays 4-5:15pm, from October 12 through December 7 (no class on Nov. 23), 2005.

Location: Shaffer 300

Prerequisites: You should have taken intermediate programming and data structures, or their equivalents.

Grades: Probably 4 assignments and a final exam.

Textbook(s): None required.


Homework 1, portions due 10/18 and 10/25

Homework 2, due 11/11

Homework 3, due 12/12

Lecture Slides
Lecture 1, 10/12

Lecture 2, 10/19

Lecture 3, 10/26

Lecture 4, 11/2

Lecture 5, 11/9

Lecture 6, 11/16

Lecture 7, 11/30

Final Exam, due 12/17

Noah A. Smith
Last modified: Tue Feb 22 14:33:44 EST 2005