Assignment 7: Taming some Pythons
Out on:
March 25, 2008
Due by:
April 1, 2008, noon (before lecture)
Collaboration:
None
Grading:
Packaging 10%, Style 10%, Performance 10%, Functionality 70%
Overview
The seventh assignment concentrates on Python programming, mostly material we covered in the seventh and ninth week. Be sure to read the Python tutorials we've provided for you! We certainly have not covered every little bit you need to know in lecture or lab, some stuff remains for you to learn by yourself. :-)
Problem 1: Pythonized Guessing
Remember the "Guess the Number" game from the bonus problem of Assignment 4? You are going to write that game in Python now! Here is the description (again):
Write a Python program guess.py that plays the
(infamous) "Guess the Number" game.
The "player" thinks of a number from 1 to 100, and your program
(started using python guess.py)
"guesses" that number by asking a few questions:
Think of a number between 1 and 100. Hit return when you're ready. Is your number between 1 and 50? n Is your number between 50 and 75? y Is your number between 50 and 62? n Is your number between 63 and 69? ...
And so on, eventually you print the number the "player" thought of (assuming he or she answers consistently of course). Actually, there are not too many questions left, but you're writing this, not me.
Small hint: You don't need functions for this. Of course you're free to use them anyway, but you're probably making things too complicated.
Problem 2: Random Python
Ready for another trip down memory lane?
For Assignment 4 we provided
this MIPS code
to generate random numbers.
Translate the algorithm we used there from MIPS
into Python.
When run, your program random.py should print
the exact same numbers as random.s
when run on the SPIM simulator.
Small hint: You should have a separate function here, just like the MIPS code did, that returns the next random number. Your main program just calls that function over and over again, printing the number it returns.
Problem 3: Counting Words
The final problem should really convince you that it is
much easier to program in Python than
MIPS.
Your mission is to develop a program wc.py
to count the number of distinct words that appear in a
given text file and then print that information in a
table.
For our purposes, a "word" is any sequence of letters
that doesn't contain whitespace (that should remind
you of the split() method for strings).
We don't want to count "The" and "the" as two different
words, so we convert words to lower case for the purpose
of counting (that should remind you of the
lower() method for strings).
Finally, we don't worry about punctuation
and noise words; for a really good word counting program we'd
have to, but it's too much extra work for this problem, so
you're not required to do it.
Here is an example of what the program does. Suppose you
have the following text in a file trek.txt.
Space, the final frontier. These are the voyages of the starship Enterprise. Its continuing mission: To explore strange new worlds. To seek out new life and new civilizations. To boldly go where no one has gone before.
Your program would be started using the command
python wc.py trek.txt and it would
produce the following (rather boring) output:
to 3 the 3 new 3 worlds. 1 where 1 voyages 1 these 1 strange 1 starship 1 space, 1 seek 1 out 1 one 1 of 1 no 1 mission: 1 life 1 its 1 has 1 gone 1 go 1 frontier. 1 final 1 explore 1 enterprise. 1 continuing 1 civilizations. 1 boldly 1 before. 1 are 1 and 1
Note that after each word, we print a tab character "\t" before we print the number of occurrances. If you run the program on Marx's old rant (available here) you get slightly more interesting results. Here just the first few lines of output:
the 1316 of 897 and 407 in 343 to 335 a 218 is 157 with 146 that 136 by 134 this 109 all 107 it 105 for 104 or 98 as 88 you 86 its 84 they 82 project 82 bourgeois 80
In fact, "project" is only there because of the numerous mentions of "Project Gutenberg" whose job it is to publish old (and new) rants by lots of people. So after lots of "noise words" the most frequently occurring word is "bourgeois" which makes sense...
Deliverables
Please turn in a
gzip
compressed
tarball
of your assignment;
the filename should be
cs102-assign-7-login.tar.gz
with login replaced by your Unix login name
on ugradx.cs.jhu.edu
(so I would use cs120-assign-7-phf.tar.gz).
The tarball should contain no derived files whatsoever
(i.e. no executable files),
but allow building all derived files.
Include a README file (a plain Unix text file) that
contains your answers to written problems, briefly explains what
your programs do, and has any other notes you want us to check
out before grading.
Grading
For reference, here is a short explanation of the grading criteria.
Packaging refers to the proper organization of the
stuff you hand in, following the guidelines for Deliverables above.
Style refers to programming style, including
things like consistent indentation, appropriate identifiers,
useful comments, suitable documentation, etc.
Simple, clean, readable code is what you should be aiming for.
Performance refers to how fast your program can
produce the required results compared to other submissions.
Design refers to proper modularization and the
proper choice of algorithms and data structures.
Functionality refers to your programs being
able to do what they should according to the specification
given above; if the specification is ambiguous and you had
to make a certain choice, defend that choice in your
README file.
If your programs cannot be built/run you will get no points whatsoever. If your programs fail miserably even once, i.e. terminate with an exception of any kind or dump core, we will take off 10%. Finally, make sure to include your name and email address in every file you turn in!
Bonus Problem
If you really want to impress us, try the following. Extend the word counting program to deal with punctuation and noise words. For the latter, you can simply include a list of common noise words in the program, and then check each word from the file against the list, skipping it if it occurs. For the former, you have to be careful: You should not just remove all punctuation, for example "pre-release" makes sense as one word even though it contains punctuation. It's a little tougher... Note that we won't give you extra points for this, but we'll give you extra kudos. :-)