The sixth assignment is all about ordered maps, specifically fast ordered maps in the form of balanced binary search trees. You’ll work with a little program called
Words that reads text from standard input and uses an (ordered) map to count how often different words appear.
We’re giving you a very basic (unbalanced) binary search tree implementation of
OrderedMap that you can use to play around with the
Words program and use as starter code for your own development. We have also included a
SimpleMap implementation of an unordered map for comparisons.
Your main goal for this assignment is to write two new
TreapMap classes. These must provide balanced binary search trees with either worst-case (AVL tree) or expected (treap) logarithmic time performance for all critical map operations. You’ll also compare the actual performance of all four
Map implementations using the provided
Words client program.
It is important that you run your benchmarks using XTime (performance analysis) under “identical” conditions to make them comparable. They should be run on the same (virtual) machine, using the same Java version, and with as little “load” (unrelated programs running, other users logged in, etc.) as possible. Also you should repeat your measurements at least a few times to rule out embarrassing outliers.
Unlike for prior assignments, we are not giving you all the files you need to edit for this assignment. Rather, it’s your turn to create the new test and implementation files from scratch. Show us that you learned something about packages, imports, etc. from all the prior assignments. Below is the expected hierarchy and files as the autograder will use them. Files you will be creating and editing are marked with an asterisk (*).
hw6-student.zip --- hw6/ AvlTreeMap.java * BinarySearchTreeMap.java Map.java OrderedMap.java SimpleMap.java TreapMap.java * Words.java tests/ AvlTreeMapTest.java * BinarySearchTreeMapTest.java * MapTest.java * (abstract) OrderedMapTest.java * (abstract) SimpleMapTest.java * TreapMapTest.java *
We have provided interfaces
Map and extension
OrderedMap. Your first task is to create a full test suite for each of these. The tests for regular map operations should go into a
MapTest JUnit test suite. Additional tests that only apply to
OrderedMaps should go into an
OrderedMapTest class. Both of these should be abstract with a
createUnit method similar to prior assignments. You should then create two concrete test suites for the basic implementations we’ve provided,
BinarySearchTreeMapTest, that are dependent on the the abstract test classes you have written.
You will also need to create concrete test classes for the
TreapMap to test the code you’ll be writing in later parts of the assignment. Even though they are not part of the interface being tested, thoroughly testing all the possible rotation types in
AvlTreeMapTest is important. For an AVL tree, a helper method that returns the height or balance factor of a tree could be useful.
It is particularly challenging to test probabilistic data structures such as treaps, but again try to use carefully contrived data in
TreapMapTest in order to test expected rotations. One option would be to put helper methods in your
TreapMap implementation that allow you to make up priorities instead of them being randomly generated.
In your README, discuss the difficulties you encountered in testing rotations for Avl and Treap map implementations, what tests cases you used and why you chose those particular examples. You are enouraged to draw little ASCII trees to illustrate what situations the test cases cover.
Your second task for this assignment is to develop an
OrderedMap<K, V> implementation called
AvlTreeMap that provides a balanced binary search tree by enforcing the AVL properties we discussed in lecture. You are strongly encouraged to use the provided
BinarySearchTreeMap as a starting point. You’ll need decide whether to extend the BinarySearchTreeMap or whether to simply copy and edit the code. Hint: think carefully about heights and balance factors (see below) before deciding!
All critical map operations must run in O(log n) worst case time! Keep this in mind as you write your code, for example when you think about how to track the height of your subtrees. It’s not okay to use the obvious O(n) algorithm to compute heights, that would ruin the whole data structure!
Your final coding for this assignment is to develop an
OrderedMap<K, V> implementation called
TreapMap that provides a probabilistically balanced binary search tree by enforcing the treap properties we discussed in lecture. Again, you’ll want to start with the BinarySearchTreeMap implementation we gave you, but have a similar design decision to make.
All critical map operations must run on O(log n) expected time! Keep this in mind as you write your code!
Once you’re reasonably sure that your
TreapMap implementations work as they should (using your tests from Part A), its time for some benchmarking with the XTime program from assignment 5. We have provided a
Words program that does a frequency count on all the “words” that occur in an input file.
For the benchmarks you should first run the
Words program using the provided
BinarySearchTreeMap implementations on a variety of data sets. You can either use data sets from prior assignments (treating integers as strings for example), or make up some new ones. See below for suggestions on getting really big data sets with actual words. Then do the same XTime benchmarking tests with your AvlTreeMap and TreapMap implementations.
Put the actual data you collect for this problem in your
README file and describe your observations. Also try to explain your observations using your understanding of the code you’re benchmarking. Why are the numbers as they are?
Project Gutenberg is a good source for “natural language” test data if you would like to play with that. we’d suggest Einstein, Frankenstein, or Dracula as simple test cases. Religious texts are more voluminous and thus provide more challenging test cases, for example The Bible or The Koran. If you’re not into religious texts, try Dewey or Goldman instead, lots to learn. Feel free to test “natural language” on whatever you want, we’ll pick some of our grading test cases from Project Gutenberg as well.
Go to the Assignment 6 page for Gradescope and click submit. Note that you can resubmit any time up until the deadline. You will be prompted to upload your files at which point you will upload all of the necessary source files. In the future we might not list them out, but for this assignment they are listed explicitly below:
README AvlTreeMap.java AvlTreeMapTest.java BinarySearchTreeMapTest.java MapTest.java OrderedMapTest.java SimpleMapTest.java TreapMap.java TreeMapTest.java
You need to submit all of these files to the autograder along with a README. You can upload them individually or in a zip file. If you upload them in a zip file make sure they are all at the top level, you cannot have any extra directories or else the autograder won’t be able to find them. This even applies to the test file - do not submit it in a tests subdirectory!
Make sure the code you hand in does not produce any extraneous debugging output. If you have commented out lines of code that no longer serve any purpose you should remove them.
You must hand in the source code and a README file. The README file can be plain text (
README with no extension), or markdown (
README.md). In your README be sure to answer the discussion questions posed in this description. You should discuss your solution as a whole and let the staff know anything important. If you are going to be using late days on an assignment, we ask that you note it in your README.
If you want to learn markdown formatting, here is a good starting point.
Once you are ready to submit your files, go to the assignment 4 page for Gradescope and click submit. Note that you can resubmit any time up until the deadline. Only your most recent submission will be graded. Please refer to course policies as far as policies regarding late days and penalties.
After you submit, the autograder will run and you will get feedback on your functionality and how you performed on our test cases. Some test cases are “hidden” from you so you won’t actually know your final score on the test cases until after grades are released. We also include your checkstyle score as a test case.
If you see the “Autograder Failed to Execute” message, then either your submission did not compile at all or there was a packaging error. Please see the Gradescope Submission Notes in Piazza Resources for help debugging why your submission is not working.
You do not need to fully implement each file before you submit, but you’ll probably fail the test cases for the parts of the assignment you haven’t done yet. Also note that only the files with
// TODO items in them will be used. You cannot modify any of the provided interface files as the autograder will overwrite any changes you made with the original provided file.
For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all assignments.
Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments.
Style refers to Java programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable
javadoc documentation, etc. Many aspects of this are enforced automatically by Checkstyle when run with the provided configuration file.
privateappropriately, etc.). Simple, clean, readable code is what you should be aiming for.
Testing refers to proper unit tests for all of the data structure classes you developed for this assignment, using the JUnit 4 framework as introduced in lecture. Make sure you test all parts of the implementation that you can think of and all exception conditions that are relevant.
Performance refers to how fast/with how little memory your program can produce the required results compared to other submissions.
Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous and you had to make a certain choice, defend that choice in your
If your submission does not compile, you will not receive any of the autograded-points for that assignment. It is always better to submit code that at least compiles. You will get freebie points just for compiling.
If your programs have unnecessary warnings when using
javac -Xlint:all you will be penalized 10% functionality per failed part. (You are also unable to use the
@SuppressWarnings annotation - we use it just to filter our accepted warnings from yours.)
If your programs fail because of an unexpected exception, you will be penalized 10% functionality per failed part. (You are not allowed to just wrap your whole program in to a universal try-catch.)