Projects are designed to test your mastery of course material as well as your programming skills; think of them as “take-home exams” and don’t communicate with anyone about possible solutions. This project focuses on the implementation, testing, benchmarking, and application of hash tables.
This project is a little more open-ended than what you’re probably used to for this course. We no longer break things down into separate problems below, the entire project is the problem now. This also means that the “rubric” for the project is the grading breakdown above, and it all applies.
Your “one” task for this project is to take the simple spell checker we
give you and to turn it into the fastest, most memory-efficient spell
checker in the course, subject to the constraints detailed below.
You are expected to do this by (once again) implementing the
this time using one of several hash table techniques (your choice, see
Important: The 20% for Performance will be awarded by ranking your submissions according to both time and space required by your spell checker. Less time is better, less space is better, and both are considered separately for 10% each. So one submission might win in both categories, or different submissions might win in each, it all depends. Submissions that are tied (perform almost the same in our benchmarks) will receive the same rank (in the respective dimension); it’s theoretically possible (although highly unlikely) for all submissions to be tied.
Note that your spell checker must produce correct results in order to take part in the performance tournament! If that wasn’t the case, winning in the tournament would be trivial: Just submit a solution that does nothing. Obviously that’s not okay…
The good news is that you don’t actually have to write the spell checker:
We hand you
Spell.java for free!
Here’s a quick example for how it works:
$ java Spell Error: Need path to dictionary file! $ java Spell ogden.dict I know how to program. know program
As you can see
Spell expects the name of a dictionary file as the sole
command line argument.
It reads that dictionary file into a
Map and then proceeds to read words from
It tries to find each word in the
Map it generated and it prints out the
words it doesn’t know, assuming they are spelling errors.
(So input without spelling errors results in no output whatsoever.)
ogden.dict dictionary is not very complete (indeed it only
contains 852 words, not nearly enough for a decent spell checker) so it thinks
program are spelling errors.
You can find
Spell.java on Piazza.
We have also posted a tarball
spell.tar.gz that contains three different
dictionaries, three different example texts to check, and the expected errors
that should be output for some of those combinations.
You can determine how well
Spell is doing using the familiar
(which we’ll also use for ranking, albeit in a slightly modified form).
Here are two examples for benchmarking:
$ ./xtime java Spell ogden.dict <democracy.txt >/dev/null 3.08 seconds 190416 kilobytes java Spell ogden.dict $ ./xtime java Spell scowl-small.dict <sanity.txt >/dev/null 67.45 seconds 211520 kilobytes java Spell scowl-small.dict
All these files are available to you, and you should probably start by trying
to figure out why
Spell.java (before you add your new, improved
implementation) is only useful for tiny dictionary files.
Note that you must use the
Spell.java program as provided on Piazza.
The only change you’re allowed to make is to replace
new SimpleMap with
new HashMap for your new
(We will use our own
Spell.java for grading, so if you made changes to yours,
those changes will “disappear” for grading.)
You are supposed to improve the performance of
Spell.java by replacing its
SimpleMap with a new
HashMap class you write.
HashMap.java has to implement the
Map interface, but beyond
that you have quite a few options for how to proceed:
We’re actually happy to help those adventurous folks who stroll all the way
into universal hashing territory:
You can find
UniversalHashes.java on Piazza, a class that implements two
common constructions for universal hash functions.
Just be warned that the code is pretty new and might contain a bug or
(You can’t blame us later if your stuff doesn’t work:
It’s “free code” so it comes with “no warranties” of any kind.
Don’t use it if you’re worried about it breaking.)
HashMap.java is the “big piece of code” you have to write and submit this
week, you obviously need “all the usual pieces” for it.
toString()methods) and exception conditions. (Also, if you’re still not using base classes for testing against an interface, make sure you finally start doing so.)
And that’s it. Yes, you’re really on your own for figuring out what kind of hash table you should implement. Just a word of caution: The code in Peter’s lecture notes is not exactly great. You should probably only consider it as a starting point if you really have no idea what else to do.
All critical map operations, except
insert, must run in O(1) expected
time (or better);
insert can run in O(1) amortized time in case you have to
grow the bucket array table to keep the load factor down.
Depending on just how serious you are about those Performance points, you may
well end up writing several different hash tables over the course of this
However, you have to pick one of those as your
HashMap.java and submit it
under that name.
Presumably you’ll pick the best one, but if you want to submit any of the
others you wrote (not required, strictly optional) make sure you give them
different names that express what kind of hash table you used.
(So, for example, if in addition to your “winning”
HashMap.java you also
implemented a hash table using separate chaining, call that one
ChainingHashMap.java or something similar.
Whichever one you call
HashMap.java is the one we’ll run against everybody
You cannot know how fast your
HashMap is until it’s actually written.
You cannot improve your
HashMap until you can tell how fast it is.
So the worst mistake you can make is to “think about it” for days without
writing any code.
(Thinking ahead is good in principle, thinking ahead for too long is the
We recommend you start right now by writing the simplest
HashMap you can
think of and making that work.
For example you could write one based on separate chaining but with a fixed
Are you going to win with that? No!
But at least now you’ve started playing the game.
You want your test cases and benchmarks in place before you keep going.
Make sure that your test cases are complete and that your benchmarks tell you
how well the various
Map operations work for that first version of
You should probably save a backup as soon as you get done with the first round.
From then on, it’s “try to improve things” followed by “see if the tests still pass” followed by “benchmark to see if things actually got better” followed by either “Whoops, that was a bad idea, let’s undo that” or “Yay, I made progress, let’s save a backup of the new version” and so on and so forth. We predict that there will be a correlation between how well you do and how often you “went around” this iterative development cycle.
You may use
java.util.LinkedList if you wish.
You may not use fancier Java collection classes, in particular you may
java.util.HashMap to implement your
You may use the
NodeList classes we handed out earlier in
the semester if you wish.
You may not use other
Map implementations to hack your
If you want to use some other data structure class, better ask on Piazza first! You don’t want to find out minutes before the deadline that you used something that’s not okay…
You should use your
README file to explain how you approached this
project, what you did in what order, how it worked out, how your plans changed,
Try to summarize all the different ways you developed, evaluated, and improved
HashMap over time.
If you don’t have a story to tell here, you probably didn’t do enough…
Mapimplementations from the last project. Use those to set some goals for how fast your new hash table should be. (You should certainly be able to beat
TreapMapis harder, but not impossible.)
HashMapimplementation. If you don’t do that, you may end up with changes that made things worse but you never noticed since you didn’t have the data to check.
You must turn in a gzip-compressed
tarball of your project;
the filename should be
replaced by the number of this project (see above) and
jhed replaced by
your JHED identifier.
(For example, Peter would use
cs226-assignment-3-pfroehl2.tar.gz for his
submission of Project 3.)
The tarball should contain no derived files whatsoever (i.e. no
.html files, etc.), but should allow building all derived
Include a plain text
README.docx or whatnot) that briefly explains what
your programs do and contains any other notes you want us to check out before
grading; your answers to written problems should be in this file as well.
Finally, make sure to include your name and email address in every file you
turn in (well, in every file for which it makes sense to do so anyway)!
For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all projects.
Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for projects.
Style refers to Java programming style, including things like consistent
indentation, appropriate identifier names, useful comments, suitable
Many aspects of this are enforced automatically by
Checkstyle when run with the configuration
file available on Piazza.
Style also includes proper modularization of your code (into interfaces,
classes, methods, using
Simple, clean, readable code is what you should be aiming for.
Testing refers to proper unit tests for all of the data structure classes you developed for this project, using the JUnit 4 framework as introduced in lecture. Make sure you test all (implied) axioms that you can think of and all exception conditions that are relevant.
Performance refers to how fast/with how little memory your program can produce the required results compared to other submissions.
Functionality refers to your programs being able to do what they should
according to the specification given above; if the specification is ambiguous
and you had to make a certain choice, defend that choice in your
If your programs cannot be built you will get no points whatsoever.
If your programs cannot be built without warnings using
we will take off 10% (except if you document a very good reason; no, you
cannot use the
@SuppressWarnings annotation either).
If your programs fail miserably even once, i.e. terminate with an exception of
any kind, we will take off 10% (however we’ll also take those 10% off if you’re
trying to be “excessively smart” by wrapping your whole program into a