Spring Semester 2008

January 28, 2008 – May 2, 2008

Assignment 7: Taming some Pythons

Out on: March 25, 2008
Due by: April 1, 2008, noon (before lecture)
Collaboration: None
Grading: Packaging 10%, Style 10%, Performance 10%, Functionality 70%

Overview

The seventh assignment concentrates on Python programming, mostly material we covered in the seventh and ninth week. Be sure to read the Python tutorials we've provided for you! We certainly have not covered every little bit you need to know in lecture or lab, some stuff remains for you to learn by yourself. :-)

Problem 1: Pythonized Guessing

Remember the "Guess the Number" game from the bonus problem of Assignment 4? You are going to write that game in Python now! Here is the description (again):

Write a Python program guess.py that plays the (infamous) "Guess the Number" game. The "player" thinks of a number from 1 to 100, and your program (started using python guess.py) "guesses" that number by asking a few questions:

Think of a number between 1 and 100.
Hit return when you're ready.

Is your number between 1 and 50? n
Is your number between 50 and 75? y
Is your number between 50 and 62? n
Is your number between 63 and 69?
...

And so on, eventually you print the number the "player" thought of (assuming he or she answers consistently of course). Actually, there are not too many questions left, but you're writing this, not me.

Small hint: You don't need functions for this. Of course you're free to use them anyway, but you're probably making things too complicated.

Problem 2: Random Python

Ready for another trip down memory lane? For Assignment 4 we provided this MIPS code to generate random numbers. Translate the algorithm we used there from MIPS into Python. When run, your program random.py should print the exact same numbers as random.s when run on the SPIM simulator.

Small hint: You should have a separate function here, just like the MIPS code did, that returns the next random number. Your main program just calls that function over and over again, printing the number it returns.

Problem 3: Counting Words

The final problem should really convince you that it is much easier to program in Python than MIPS. Your mission is to develop a program wc.py to count the number of distinct words that appear in a given text file and then print that information in a table.

For our purposes, a "word" is any sequence of letters that doesn't contain whitespace (that should remind you of the split() method for strings). We don't want to count "The" and "the" as two different words, so we convert words to lower case for the purpose of counting (that should remind you of the lower() method for strings). Finally, we don't worry about punctuation and noise words; for a really good word counting program we'd have to, but it's too much extra work for this problem, so you're not required to do it.

Here is an example of what the program does. Suppose you have the following text in a file trek.txt.

Space, the final frontier.
These are the voyages of the starship Enterprise.
Its continuing mission:
To explore strange new worlds.
To seek out new life and new civilizations.
To boldly go where no one has gone before.

Your program would be started using the command python wc.py trek.txt and it would produce the following (rather boring) output:

to      3
the     3
new     3
worlds. 1
where   1
voyages 1
these   1
strange 1
starship        1
space,  1
seek    1
out     1
one     1
of      1
no      1
mission:        1
life    1
its     1
has     1
gone    1
go      1
frontier.       1
final   1
explore 1
enterprise.     1
continuing      1
civilizations.  1
boldly  1
before. 1
are     1
and     1

Note that after each word, we print a tab character "\t" before we print the number of occurrances. If you run the program on Marx's old rant (available here) you get slightly more interesting results. Here just the first few lines of output:

the     1316
of      897
and     407
in      343
to      335
a       218
is      157
with    146
that    136
by      134
this    109
all     107
it      105
for     104
or      98
as      88
you     86
its     84
they    82
project 82
bourgeois       80

In fact, "project" is only there because of the numerous mentions of "Project Gutenberg" whose job it is to publish old (and new) rants by lots of people. So after lots of "noise words" the most frequently occurring word is "bourgeois" which makes sense...

Deliverables

Please turn in a gzip compressed tarball of your assignment; the filename should be cs102-assign-7-login.tar.gz with login replaced by your Unix login name on ugradx.cs.jhu.edu (so I would use cs120-assign-7-phf.tar.gz). The tarball should contain no derived files whatsoever (i.e. no executable files), but allow building all derived files. Include a README file (a plain Unix text file) that contains your answers to written problems, briefly explains what your programs do, and has any other notes you want us to check out before grading.

Grading

For reference, here is a short explanation of the grading criteria. Packaging refers to the proper organization of the stuff you hand in, following the guidelines for Deliverables above. Style refers to programming style, including things like consistent indentation, appropriate identifiers, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for. Performance refers to how fast your program can produce the required results compared to other submissions. Design refers to proper modularization and the proper choice of algorithms and data structures. Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous and you had to make a certain choice, defend that choice in your README file.

If your programs cannot be built/run you will get no points whatsoever. If your programs fail miserably even once, i.e. terminate with an exception of any kind or dump core, we will take off 10%. Finally, make sure to include your name and email address in every file you turn in!

Bonus Problem

If you really want to impress us, try the following. Extend the word counting program to deal with punctuation and noise words. For the latter, you can simply include a list of common noise words in the program, and then check each word from the file against the list, skipping it if it occurs. For the former, you have to be careful: You should not just remove all punctuation, for example "pre-release" makes sense as one word even though it contains punctuation. It's a little tougher... Note that we won't give you extra points for this, but we'll give you extra kudos. :-)