Assignment 5: Listing Sets


This assignment will give you an opportunity to get a better understanding of sets and priority queues. Additionally, we will be introducing the XTime Program, which will allow better profiling of the time and space requirements of the programs that we write. We will start by using is sparingly in this assignment, and will be using it more often in the assignments to come.

Package Setup

Files you will be editing for submission are marked with an asterisk (*). Unlike in prior assignments, this time you will need to figure out what to add and where for some of the files you’ll be editting. In other words, not everything is marked with explicit and complete TODO comments.
hw5/ * * *

Part A: Terrific Heuristics

For the first part of this project you get to write some code variations. You’ll extend two implementations of Set<T> to use the following heuristics:

We have also provided a version of the Unique program you did in the first two assignments, this time implemented with a Set. After you are reasonably sure that your new “adaptive” set implementations work (ie, test them!), run the Unique program again with these Set versions. For input data, please generate large, random input files, and test you Unique on these random inputs. Additionally, generate any other more controlled data sets that you think might be helpful.

To gather heuristics, we will be using the class that is included in this zip. It is also going to be available on the class Piazza. To use this program, you will run the XTime program with the java class containing your main method (in this case Unique) as the first parameter. On the command line, this would look like

java XTime ClassWithMyMain "InputParam" < input.file

The output will show runtime and memory usage.


In your README, discuss the results from using the heuristic “improvements” to the Set implementations. Were there any noticeable differences compared to using the default implementation of Unique we provided? Did you expect to see improvements? If they didn’t perform as expected, why do you think that was the case? Were your data sets big enough? Consider and describe any particular types of data sets that you think the heuristics would perform particularly well on, and why. If you didn’t see any measurable differences, what do you think it would take to measure the relative performance of each implementation?

Part B: More Set Operations

For this part of the assignment you don’t have to write any code. Instead, answer the questions regarding implementations of set operations Union and Intersection. Specifically in your README address these questions:

Part C: Heaping Priorities

For the third part of this assignment, you will implement BinaryHeapPriorityQueue<T> using the binary heap data structure described in lecture. It’s your choice whether you want to use a plain Java array or the ArrayList<T> class from the Java library as the basis for this implementation. You will need to provide a default constructor, which uses the default comparator, and a second constructor which takes a custom comparator as input. If a client creates a BinaryHeapPriorityQueue<T> with no comparator, the best and remove methods should operate on the largest element in the queue, not the smallest element. In other words, the default behavior should be that of a maximum priority queue.

Testing files have been provided, although you are always advised to review them carefully and add any elements you think are missing.

Part D: Unique Priority Queues

We have provided a revised UniquePQ version of the unique program, that uses a priority queue to find the unique values. The tricky part here is that the priority queue stores repeat values. But as you remove the value from a priority queue, all the replicated values will necessarily come out in a sequence. The approach taken for UniquePQ is to essentially throw away all the repeats.

Once you have BinaryHeapPriorityQueue<T> implementations working, run the UniquePQ program for with it, using a variety of data sets. This will require changing the instantiation type of the priority queue used in the main method. (Hint: write program(s) to generate very large data sets.)


In your README, discuss the data sets you ran the UniquePQprogram on. How large were they? Were the data values equally distributed across a range, or were they skewed? Were there many repeat values or not? Did you notice any measurable difference in how long UniquePQtook to run with each PriorityQueue implementation? What performance differences did you expect the implementations to have for this particular problem or specific data sets?

If the implementations didn’t perform as expected, why do you think that was the case? Were your data sets big enough? Were your expectations wrong? Consider and describe any particular types of data sets that you think the various implementations would perform particularly well on, and why. If you didn’t see any measurable differences, what do you think it would take to measure the relative performance of each implementation?

Lastly, theorize on how you would expect PriorityQueue implementations to perform on the Unique problem as compared to a Set implementation. Do you expect particular priority queues to perform better, worse, or the same as the ArraySet implementation from hw5 for example? How about compared to your original Unique solution from assignment 1? Run a few comparison cases to see if your theories seem to be correct. Discuss your trials and results.


The files you must modify and submit are listed explicitly below:

You need to submit all of these files to the autograder along with a README. You can upload them individually or in a zip file. If you upload them in a zip file make sure they are all at the top level, you cannot have any extra directories or else the autograder won’t be able to find them. This even applies to the test file - do not submit it in a tests subdirectory!

Make sure the code you hand in does not produce any extraneous debugging output. If you have commented out lines of code that no longer serve any purpose you should remove them.


You must hand in the source code and a README file. The README file can be plain text (README with no extension), or markdown ( In your README be sure to answer the discussion questions posed in this description. You should discuss your solution as a whole and let the staff know anything important. If you are going to be using late days on an assignment, we ask that you note it in your README.

If you want to learn markdown formatting, here is a good starting point.

Submitting to Gradescope

Once you are ready to submit your files, go to the assignment 4 page for Gradescope and click submit. Note that you can resubmit any time up until the deadline. Only your most recent submission will be graded. Please refer to course policies as far as policies regarding late days and penalties.

After you submit, the autograder will run and you will get feedback on your functionality and how you performed on our test cases. Some test cases are “hidden” from you so you won’t actually know your final score on the test cases until after grades are released. We also include your checkstyle score as a test case.

If you see the “Autograder Failed to Execute” message, then either your submission did not compile at all or there was a packaging error. Please see the Gradescope Submission Notes in Piazza Resources for help debugging why your submission is not working.

You do not need to fully implement each file before you submit, but you’ll probably fail the test cases for the parts of the assignment you haven’t done yet. Also note that only the files with // TODO items in them will be used. You cannot modify any of the provided interface files as the autograder will overwrite any changes you made with the original provided file.


For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all assignments.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments.

Style refers to Java programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable javadoc documentation, etc. Many aspects of this are enforced automatically by Checkstyle when run with the provided configuration file.

Testing refers to proper unit tests for all of the data structure classes you developed for this assignment, using the JUnit 4 framework as introduced in lecture. Make sure you test all parts of the implementation that you can think of and all exception conditions that are relevant.

Performance refers to how fast/with how little memory your program can produce the required results compared to other submissions.

Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous and you had to make a certain choice, defend that choice in your README file.

If your submission does not compile, you will not receive any of the autograded-points for that assignment. It is always better to submit code that at least compiles. You will get freebie points just for compiling.

If your programs have unnecessary warnings when using javac -Xlint:all you will be penalized 10% functionality per failed part. (You are also unable to use the @SuppressWarnings annotation - we use it just to filter our accepted warnings from yours.)

If your programs fail because of an unexpected exception, you will be penalized 10% functionality per failed part. (You are not allowed to just wrap your whole program in to a universal try-catch.)