600.226: Data Structures (Spring 2017)

Assignment 3: Assorted Complexities

Overview

The third assignment is mostly about sorting and how fast things go. You will also write yet another implementation of the Array interface to help you analyze how many array operations various sorting algorithms perform.

Note: The grading criteria now include 10% for unit testing. This refers to JUnit 4 test drivers, not some custom test program you hacked. The problems (on this and future assignments) will state whether you are expected to produce/improve test drivers or not.

Problem 1: Arrays with Statistics (30%)

Your first task for this assignment is to develop a new kind of Array implementation that keeps track of how many read and write operations have been performed on it. Check out the Statable interface first, reproduced here in compressed form (be sure to use and read the full interface available on Piazza!):

public interface Statable {
    void resetStatistics();
    int numberOfReads();
    int numberOfWrites();
}

This describes what we expect of an object that can collect statistics about itself. After a Statable object has been “in use” for a while, we can check how many read and write operations it has been asked to perform. We can also tell it to “forget” what has happened before and start counting both kinds of operations from zero again.

You need to develop a class StatableArray that extends our dear old SimpleArray and also implements the Statable interface; yes, both at the same time. When a StatableArray is created, you initialize internal counters to keep track of the number of read and write operations it has been asked to perform so far; obviously both counts start at zero. Each time an operation is performed on the StatableArray object, you need to increment the relevant counter by one and invoke the actual operation in the super class using Java’s super keyword. Don’t forget that your constructor for StatableArray will also have to invoke the SimpleArray constructor!

Consider a freshly constructed StatableArray object. It would return 0 for both numberOfReads and numberOfWrites. Now imagine we call the length operation followed by three calls to the get operation. At this point, our object would return 4 for numberOfReads but still 0 for numberOfWrites. If we now call the put operation twice, the object would return 2 for numberOfWrites but still 4 for numberOfReads. Hopefully this is clear enough?

You need to write JUnit 4 test cases for StatableArray. Your focus should be on the Statable aspect of the class, but you will need to call Array methods to trigger the various possible outcomes. Call the file with your test cases StatableArrayTest.java please.

Hints

Problem 2: All Sorts of Sorts (50%)

Your second task for this assignment is to explore some of the basic sorting algorithms and their analysis. All of these algorithms are quadratic in terms of their asymptotic performance, but they nevertheless differ in their actual performance. We’ll focus on the following three algorithms:

The Piazza post for this assignment contains a basic framework for evaluating sorting algorithms. You’ll need a working StatableArray class from Problem 1, and you’ll need to understand the following interface as well (again compressed, be sure to to use and read the full interface available on Piazza!):

public interface SortingAlgorithm<T extends Comparable<T>> {
    void sort(Array<T> array);
    String name();
}

Let’s look at the simple stuff first: An object is considered an algorithm suitable for sorting in this framework if (a) we can ask it to sort a given Array and (b) we can ask it for its name (e.g. “Insertion Sort”).

The more complicated stuff is at the top: The use of extends inside the angle brackets means that any type T we want to sort must implement the interface Comparable as well. It obviously can’t just be any old type, it must be a type for which the expression “a is less than b” actually makes sense. Using Comparable in this form is Java’s way of saying that we can order the objects; you should definitely read up on the details here!

As an example for all this, we have provided an implementation of SelectionSort on Piazza already. (Actually, there are also two other algorithms, NullSort and GnomeSort, just so you start out with a few to run right away.)

You need to write classes implementing BubbleSort and InsertionSort for this problem. Just like our example algorithms, your classes have to implement the SortingAlgorithm interface. You may not use any Java built-in arrays or ArrayList objects in your implementions. Remember that if you adapt code from allowable other sources (prior courses or textbook) you must cite the origins of the code.

All of this should be fairly straightforward once you get used to the framework. Speaking of the framework, the way you actually “run” the various algorithms is by using the PolySort.java program we’ve provided as well. You should be able to compile and run it without yet having written any sorting code yourself. Here’s how:

$ java PolySort 4000 <random.data
Algorithm            Sorted?  Size         Reads        Writes       Seconds

Null Sort            false    4,000        0            0            0.000007
Gnome Sort           true     4,000        32,195,307   8,045,828    0.243852
Selection Sort       true     4,000        24,009,991   7,992        0.252085

This will read the first 4000 strings from the file random.data and sort them using all available algorithms. As you can see, the program checks if the algorithm actually worked (Sorted?) and reports how many operations of the underlying StatableArray were used in order to perform the sort (Reads, Writes). Finally, the program also prints out long it took to sort the array (Seconds) but that number will vary widely across machines so you can really only use it for relative comparisons on the machine actually running the experiment.

However, the main point of all this is not the coding work. Instead, the main point is to evaluate and compare the sorting algorithms on different sets of data. We’ve provided three sets of useful test data on Piazza and you can use the command line argument to vary how much of it is used (thereby changing the size of the problem). You should try to quantify how the various algorithms differ and explain why they differ as well (i.e. what about a given algorithm makes it better or worse than another one for a given data set). In your README file you should describe the series of experiments you ran, what data you collected, and what your conclusions about the performance of these algorithms are. Some ideas for what to address:

Just to be clear: Yes, we’ll need the code, and it should be up to the usual standards. But the “report” you put in your README is just as important as the code!

Problem 3: Analysis of Binary Search (20%)

Your final task for this assignment is to analyze the following binary search algorithm theoretically (without running it) in detail (without using O-notation). Here’s the code, and you must analyze exactly this code (the line numbers are given so you can refer to them in your writeup for this problem):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public static int binarySearch(int[] A, int key) {
    int lo = 0;
    int hi = A.length - 1;
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (A[mid] > key) {
            hi = mid - 1;
        } else if (A[mid] < key) {
            lo = mid + 1;
        } else {
            return mid;
        }
    }
    return -1;
}

You need to determine exactly how many comparisons C(n) and assignments A(n) are performed by this implementation of binary search in the worst case. (As usual we refer to the size of the problem, which is the length of the array to be sorted here, as “n” above.)

Important: Don’t just state the resulting formulas, your writeup has to explain how you derived them! Anyone can google for the answer, but you need to convince us that you actually did the work! Also, please remember that we’re interested in the worst case which, just like for linear search, means that the key is not actually in the array; that assumption simplifies a lot about this problem.

Even More Hints

Deliverables

You must turn in a zipped (.zip only) archive containing all source code, your README file, and any other deliverables required by the assignment. The filename should be HW##-jhed.zip with ## replaced by the 2-digit number (use leading 0s) of this assignment (see above) and jhed replaced by your Blackboard login. (For example, Peter would use HW03-pfroehl1.zip for his submission of Assignment 3.) The zip should contain no derived files whatsoever (i.e. no .class files, no .html files, etc.), but should allow building all derived files. Include a plain text README file (not README.txt or README.docx or whatnot) that briefly explains what your programs do and contains any other notes you want us to check out before grading. Your answers to written problems should be in this README file as well. Finally, make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!

Grading

For reference, here is a short explanation of the grading criteria; some of the criteria don't apply to all problems, and not all of the criteria are used on all assignments.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments.

Style refers to Java programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable javadoc documentation, etc. Many aspects of this are enforced automatically by Checkstyle when run with the configuration file available on Piazza. Style also includes proper modularization of your code (into interfaces, classes, methods, using public, protected, and private appropriately, etc.). Simple, clean, readable code is what you should be aiming for.

Testing refers to proper unit tests for all of the data structure classes you developed for this assignment, using the JUnit 4 framework as introduced in lecture. Make sure you test all (implied) axioms that you can think of and all exception conditions that are relevant.

Performance refers to how fast/with how little memory your program can produce the required results compared to other submissions.

Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous and you had to make a certain choice, defend that choice in your README file.

If your programs cannot be built you will get no points whatsoever. If your programs cannot be built without warnings using javac -Xlint:all we will take off 10% (except if you document a very good reason; no, you cannot use the @SuppressWarnings annotation either). If your programs fail miserably even once, i.e. terminate with an exception of any kind, we will take off 10% (however we'll also take those 10% off if you're trying to be "excessively smart" by wrapping your whole program into a universal try-catch).