Assignment 6: Balancing Maps

Overview

The sixth assignment is all about ordered maps, specifically fast ordered maps in the form of balanced binary search trees. You’ll work with a little program called Words that reads text from standard input and uses an (ordered) map to count how often different words appear.

We’re giving you a very basic (unbalanced) binary search tree implementation of OrderedMap that you can use to play around with the Words program and use as starter code for your own development. We have also included a SimpleMap implementation of an unordered map for comparisons.

Your main goal for this assignment is to write two new OrderedMap implementations: AvlTreeMap and TreapMap classes. These must provide balanced binary search trees with either worst-case (AVL tree) or expected (treap) logarithmic time performance for all critical map operations. You’ll also compare the actual performance of all four Map implementations using the provided Words client program.

It is important that you run your benchmarks using XTime (performance analysis) under “identical” conditions to make them comparable. They should be run on the same (virtual) machine, using the same Java version, and with as little “load” (unrelated programs running, other users logged in, etc.) as possible. Also you should repeat your measurements at least a few times to rule out embarrassing outliers.

Package Setup

Unlike for prior assignments, we are not giving you all the files you need to edit for this assignment. Rather, it’s your turn to create the new test and implementation files from scratch. Show us that you learned something about packages, imports, etc. from all the prior assignments. Below is the expected hierarchy and files as the autograder will use them. Files you will be creating and editing are marked with an asterisk (*).

hw6-student.zip
---
hw6/
    AvlTreeMap.java *
    BinarySearchTreeMap.java
    Map.java
    OrderedMap.java
    SimpleMap.java
    TreapMap.java *
    Words.java
    tests/
        AvlTreeMapTest.java *
        BinarySearchTreeMapTest.java *
        MapTest.java *  (abstract)
        OrderedMapTest.java *  (abstract)
        SimpleMapTest.java *
        TreapMapTest.java *

Part A: Test-first development

We have provided interfaces Map and extension OrderedMap. Your first task is to create a full test suite for each of these. The tests for regular map operations should go into a MapTest JUnit test suite. Additional tests that only apply to OrderedMaps should go into an OrderedMapTest class. Both of these should be abstract with a createUnit method similar to prior assignments. You should then create two concrete test suites for the basic implementations we’ve provided,SimpleMapTest and BinarySearchTreeMapTest, that are dependent on the the abstract test classes you have written.

You will also need to create concrete test classes for the AvlTreeMap and TreapMap to test the code you’ll be writing in later parts of the assignment. Even though they are not part of the interface being tested, thoroughly testing all the possible rotation types in AvlTreeMapTest is important. For an AVL tree, a helper method that returns the height or balance factor of a tree could be useful.

It is particularly challenging to test probabilistic data structures such as treaps, but again try to use carefully contrived data in TreapMapTest in order to test expected rotations. One option would be to put helper methods in your TreapMap implementation that allow you to make up priorities instead of them being randomly generated.

Discussion

In your README, discuss the difficulties you encountered in testing rotations for Avl and Treap map implementations, what tests cases you used and why you chose those particular examples. You are enouraged to draw little ASCII trees to illustrate what situations the test cases cover.

Part B: AVL Trees

Your second task for this assignment is to develop an OrderedMap<K, V> implementation called AvlTreeMap that provides a balanced binary search tree by enforcing the AVL properties we discussed in lecture. You are strongly encouraged to use the provided BinarySearchTreeMap as a starting point. You’ll need decide whether to extend the BinarySearchTreeMap or whether to simply copy and edit the code. Hint: think carefully about heights and balance factors (see below) before deciding!

All critical map operations must run in O(log n) worst case time! Keep this in mind as you write your code, for example when you think about how to track the height of your subtrees. It’s not okay to use the obvious O(n) algorithm to compute heights, that would ruin the whole data structure!

Part C: Treaping through the tulips

Your final coding for this assignment is to develop an OrderedMap<K, V> implementation called TreapMap that provides a probabilistically balanced binary search tree by enforcing the treap properties we discussed in lecture. Again, you’ll want to start with the BinarySearchTreeMap implementation we gave you, but have a similar design decision to make.

All critical map operations must run on O(log n) expected time! Keep this in mind as you write your code!

Part D: Benching Word Counts

Once you’re reasonably sure that your AvlTreeMap and TreapMap implementations work as they should (using your tests from Part A), its time for some benchmarking with the XTime program from assignment 5. We have provided a Words program that does a frequency count on all the “words” that occur in an input file.

For the benchmarks you should first run the Words program using the provided SimpleMap and BinarySearchTreeMap implementations on a variety of data sets. You can either use data sets from prior assignments (treating integers as strings for example), or make up some new ones. See below for suggestions on getting really big data sets with actual words. Then do the same XTime benchmarking tests with your AvlTreeMap and TreapMap implementations.

Discussion

Put the actual data you collect for this problem in your README file and describe your observations. Also try to explain your observations using your understanding of the code you’re benchmarking. Why are the numbers as they are?

Data Sources

Project Gutenberg is a good source for “natural language” test data if you would like to play with that. we’d suggest Einstein, Frankenstein, or Dracula as simple test cases. Religious texts are more voluminous and thus provide more challenging test cases, for example The Bible or The Koran. If you’re not into religious texts, try Dewey or Goldman instead, lots to learn. Feel free to test “natural language” on whatever you want, we’ll pick some of our grading test cases from Project Gutenberg as well.

Deliverables

Go to the Assignment 6 page for Gradescope and click submit. Note that you can resubmit any time up until the deadline. You will be prompted to upload your files at which point you will upload all of the necessary source files. In the future we might not list them out, but for this assignment they are listed explicitly below:

  README
  AvlTreeMap.java
  AvlTreeMapTest.java
  BinarySearchTreeMapTest.java
  MapTest.java
  OrderedMapTest.java
  SimpleMapTest.java
  TreapMap.java
  TreeMapTest.java

You need to submit all of these files to the autograder along with a README. You can upload them individually or in a zip file. If you upload them in a zip file make sure they are all at the top level, you cannot have any extra directories or else the autograder won’t be able to find them. This even applies to the test file - do not submit it in a tests subdirectory!

Make sure the code you hand in does not produce any extraneous debugging output. If you have commented out lines of code that no longer serve any purpose you should remove them.

README

You must hand in the source code and a README file. The README file can be plain text (README with no extension), or markdown (README.md). In your README be sure to answer the discussion questions posed in this description. You should discuss your solution as a whole and let the staff know anything important. If you are going to be using late days on an assignment, we ask that you note it in your README.

If you want to learn markdown formatting, here is a good starting point.

Submitting to Gradescope

Once you are ready to submit your files, go to the assignment 4 page for Gradescope and click submit. Note that you can resubmit any time up until the deadline. Only your most recent submission will be graded. Please refer to course policies as far as policies regarding late days and penalties.

After you submit, the autograder will run and you will get feedback on your functionality and how you performed on our test cases. Some test cases are “hidden” from you so you won’t actually know your final score on the test cases until after grades are released. We also include your checkstyle score as a test case.

If you see the “Autograder Failed to Execute” message, then either your submission did not compile at all or there was a packaging error. Please see the Gradescope Submission Notes in Piazza Resources for help debugging why your submission is not working.

You do not need to fully implement each file before you submit, but you’ll probably fail the test cases for the parts of the assignment you haven’t done yet. Also note that only the files with // TODO items in them will be used. You cannot modify any of the provided interface files as the autograder will overwrite any changes you made with the original provided file.


Grading

For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all assignments.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments.

Style refers to Java programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable javadoc documentation, etc. Many aspects of this are enforced automatically by Checkstyle when run with the provided configuration file.

Testing refers to proper unit tests for all of the data structure classes you developed for this assignment, using the JUnit 4 framework as introduced in lecture. Make sure you test all parts of the implementation that you can think of and all exception conditions that are relevant.

Performance refers to how fast/with how little memory your program can produce the required results compared to other submissions.

Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous and you had to make a certain choice, defend that choice in your README file.

If your submission does not compile, you will not receive any of the autograded-points for that assignment. It is always better to submit code that at least compiles. You will get freebie points just for compiling.

If your programs have unnecessary warnings when using javac -Xlint:all you will be penalized 10% functionality per failed part. (You are also unable to use the @SuppressWarnings annotation - we use it just to filter our accepted warnings from yours.)

If your programs fail because of an unexpected exception, you will be penalized 10% functionality per failed part. (You are not allowed to just wrap your whole program in to a universal try-catch.)