Assignment 3: Assorted Complexities

Overview

The third assignment is mostly about sorting and how fast things go. You will also write yet another implementation of the Array interface to help you analyze how many array operations various sorting algorithms perform.

Note: The grading criteria now includes points for unit testing with JUnit - so you need to be able to work with JUnit for this assignment. This refers to JUnit 4 test drivers, not some custom test program you hacked. Assignments will specify what JUnit test drivers you should improve/add-to.

Package Setup

This assignment has a slightly more advanced package setup. This time, since we rely on our SimpleArray from the hw2 package, we provide you with the necessary files from homework 2 in exceptions and hw2. They are the same as last assignment. For this assignment, you have a new directory data with the data files needed for Part B, and hw3 with skeleton code for all parts.

Files you will be editing are marked with an asterisk (*).

hw3-student.zip
--
exceptions/
    IndexException.java
    LengthException.java
hw2/
    Array.java
    SimpleArray.java
data/
    random.data
    ascending.data
    descending.data
hw3/
    BubbleSort.java *
    GnomeSort.java
    InsertionSort.java *
    NullSort.java
    PolySort.java *
    SelectionSort.java
    SortingAlgorithm.java
    Measured.java
    MeasuredArray.java *
    MeasuredArrayTest.java *

These provided files should compile as is. You can compile everything by running $ javac -Xlint:all hw3/*.java hw2/*.java exceptions/*.java, or using the generic compile script from the Java Command Line notes. You should only need to compile exceptions/*.java and hw2/*.java one time.


Part A: Measured Arrays

Your first task for this assignment is to develop a new kind of Array implementation that keeps track of how many access and mutate operations have been performed on it. It also counts the number of occurrences of a particular value in the Array. Check out the Measured interface first, reproduced here in compressed form (be sure to read and use the full interface:

public interface Measured<T> {
    void reset();
    int accesses();
    int mutations();
    int count(T t);
}

This describes what we expect of an object that can collect statistics about itself. After a Measured object has been “in use” for a while, we can check how many access and mutate operations it has been asked to perform, through the accesses and mutations methods, respectively. We can also tell it to “forget” what has happened before and start counting both kinds of operations from zero again using the reset method.

You need to develop a class MeasuredArray that extends our dear old SimpleArray and also implements the Measured interface; yes, both at the same time. When a MeasuredArray is created, you initialize internal counters to keep track of the number of access and mutate operations it has been asked to perform so far; obviously both counts start at zero. You will need to override the accessor and mutator methods of the class so that the relevant counter is incremented each time that type of operation succeeds. The overridden methods must also call the the actual operation in the super class using Java’s super keyword. (Rewriting these methods instead of using inheritance properly will result in significant style deductions, as well as compiler issues since you can’t change the provided SimpleArray.java code.)

Don’t forget that your constructor for MeasuredArray will also have to invoke the SimpleArray constructor! However, this operation is neither an accessor or a mutator.

Consider a freshly constructed MeasuredArray object. It would return 0 for both accesses and mutations. Now imagine we call the length operation followed by three calls to the get operation. At this point, our object would return 4 for accesses but still 0 for mutations. If we now call the put operation twice, the object would return 2 for mutations but still 4 for accesses. (You don’t have to check whether a put operation actually changes the value or not since that’s not how the put operation was originally written.)

The reset operation should set both the number of accesses and mutations back to 0. Lastly, implement the count method which should determine and return the number of occurrences of the parameter value. Since it will always need to inspect every value in the array, it should also naturally update the accesses value accordingly. [NOTE: be careful about this if you use an iterator to implement count - see discussion questions below.]

We provide you with SimpleArray, but we leave it as part of the hw2 package. So, you should only need to compile the hw2 folder one time to be able to use SimpleArray in your solution. In the skeleton code we already have the import taken care of (note the import hw2.SimpleArray statement). You do not need to hand in SimpleArray on Gradescope.

Testing

You will need to write JUnit 4 test cases for MeasuredArray. Your focus should be on the Measured aspect of the class (ie, reset, accesses, mutations, count), but you will need to call Array and ArrayInterface methods to trigger the various possible outcomes. The tests you write do not need to check that it is a working Array implementation; we’ll do that ourselves. (Of course, you should test for yourselves that the methods do work correctly.) Later on we will show you a nice way to test with the same inheritance/interface-realization structure as the data structures you are testing, but for this first time we are just trying to get our feet wet with JUnit.

The file you need to add unit tests to is MeasuredArrayTest.java. We provide you skeleton code with the basic @Before and @Test annotations. For this requirement you don’t need anything fancier than that. Make sure that you pass your own tests in your final deliverable. If you don’t pass your tests but want to receive credit for writing tests, comment those tests out. You will receive some autograder points for passing your own tests.

Since we are only concerned with testing Measured, there isn’t any exception testing to cover. So, you don’t need to use the expected parameter of the @Test annotation. But, if you want to test some Array axioms, you could have something like this:

@Test(expected=IndexException.class)
public void testInvalidIndex() { ... }

and write code that should trigger an IndexException.

For help running your JUnit4 tests, see the JavaCommandLineNotes and JUnitIntellij notes in Piazza Resources.

Discussion

In your README for Part A, discuss from a design perspective whether or not iterating over a MeasuredArray should affect the accesses and mutation counts. Note that for the purposes of this assignment we are NOT asking you to rewrite the ArrayIterator to do so. However, if you wanted to include the next() and/or hasNext() methods in the statistics measured, can you inherit ArrayIterator from SimpleArray and override the relevant methods, or not? Explain.

Hints

Part B: All Sorts of Sorts

Your second task for this assignment is to explore some of the basic sorting algorithms and their analysis. All of these algorithms are quadratic in terms of their asymptotic performance, but they nevertheless differ in their actual performance.

We’ll focus on the following three algorithms:

The provided files contain a basic framework for evaluating sorting algorithms. You’ll need a working MeasuredArray class from Problem 1, and you’ll need to understand the following interface as well (again compressed, be sure to to read and use the full interface:

  public interface SortingAlgorithm<T extends Comparable<T>> {
      void sort(Array<T> array);
      String name();
  }

Let’s look at the simple stuff first:

An object is considered an algorithm suitable for sorting in this framework if (a) we can ask it to sort a given Array and (b) we can ask it for its name (e.g. “Insertion Sort”).

The more complicated stuff is at the top: The use of extends inside the angle brackets means that any type T we want to sort must implement the interface Comparable as well. It obviously can’t just be any old type, it must be a type for which the expression “a is less than b” actually makes sense. Using Comparable in this form is Java’s way of saying that we can order the objects; you should probably read up on the details here!

We provide a PolySort.java that runs the various sorting algorithms on input data and reports on it’s statistics. We also provide a working GnomeSort implementation of SortingAlgorithm, a working SelectionSort, and a NullSort that doesn’t actually do anything. GnomeSort is an intentionally inefficient sorting algorithm that we can use for comparison.

Your first task is to implement BubbleSort and InsertionSort in the provided Java files. Note that these classes implement SortingAlgorithm.

PolySort takes one or two command line arguments - the name of the file to read is required, and the number of Strings to read in from standard input is optional. It then runs the sorting algorithm for each implementation and reports some statistics. The following is an example invocation:

  $ java PolySort random.data 4000
  Algorithm            Sorted?  Size         Accesses     Mutations    Seconds

  Null Sort            false    4,000        0            0            0.000007
  Gnome Sort           true     4,000        32,195,307   8,045,828    0.243852
  Selection Sort       true     4,000        24,009,991   7,992        0.252085

This will read the first 4000 strings from the file random.data and sort them using all available algorithms. As you can see, the program checks if the algorithm actually worked (Sorted?) and reports how many operations of the underlying MeasuredArray were used in order to perform the sort (Accesses, Mutations). Finally, the program also prints out how long it took to sort the array (Seconds) but that number will vary widely across machines so you can really only use it for relative comparisons on the machine actually running the experiment. It’s hard to use time as an actual benchmark.

Your second task is to add code to PolySort that times how long it takes the Java Collections library to perform a sort.

There are many tools in the Java Collections that store data and implement sorting. We would like you to use Collections.util.ArrayList to store the strings and Collections.sort to sort the strings from the file. It may be helpful to look at the documentation for Collections and ArrayList. Please print out the timing results for sorting using Java collections in the same format as above, with accesses and mutations set to 0, directly after the algorithms you implemented (with no spaces/new lines in between). For example:

  $ java PolySort random.data 4000
  Algorithm            Sorted?  Size         Accesses     Mutations    Seconds

  Null Sort            false    4,000        0            0            0.000007
  Gnome Sort           true     4,000        32,195,307   8,045,828    0.243852
  Selection Sort       true     4,000        24,009,991   7,992        0.252085
  (Your implementations here)
  Java Collections     true     4,000        0            0            1.234567

The emphasis of this problem is not the coding work. Rather it is on evaluating and comparing the sorting algorithms on different sets of data. We’ve provided three different data sets, and you can vary the command line argument to experiment with different sizes as well.

Discussion

There is an intentional mistake within one of the provided data files. The goal of this assignment is to use the measurements to catch that mistake. If you catch it, please discuss it in your README but avoid posting on Piazza so that other students can make the same connection on their own. You might also want to correct the problem and rerun your tests on that file. Report and discuss all your run results.

In your README file you should describe the series of experiments you ran, what data you collected, and what your conclusions about the relative performance of these algorithms are. Specifically, you should address the following:

  1. Does the actual running time correspond to the asymptotic complexity as you would expect?
  2. What explains the practical differences between these algorithms?
  3. Does it matter what kind of data (random, already sorted in ascending order, sorted in descending order) you are sorting? How should each algorithm behave based on the type of data it receives?
  4. How do your implementations compare to the sorting done by Java Collections? Do you think that the standardized code is just better written than yours or is more asymptoticly efficient? How would you go about determining which is the case?

If you are using markdown formatting in your README, and are including data in table format, please make sure that it is still readable in ascii as the graders will not have a markdown-rendered view.

Hints

Part C: Analysis of Selection Sort

Your final task for this assignment is to analyze the following descending selection sort algorithm mathematically (without running it) in detail (without using O-notation).

Here’s the code, and you must analyze exactly this code (the line numbers are given so you can refer to them in your writeup for this problem):

   1: public static void selectionSort(int[] a) {
   2:    int max, temp;
   3:    for (int i = 0; i < a.length - 1; i++) {
   4:        max = i;
   5:        for (int j = i + 1; j < a.length; j++) {
   6:            if (a[j] > a[max]) {
   7:                max = j;
   8:            }
   9:        }
  10:        temp = a[i];
  11:        a[i] = a[max]; 
  12:        a[max] = temp;
  13:     }
  14:  }

You need to determine exactly how many comparisons C(n) and assignments A(n) are performed by this implementation of selection sort in the worst case. Both of those should be polynomials of degree 2 since you know that the asymptotic complexity of selection sort is O(n^2). (As usual we refer to the size of the problem, which is the length of the array to be sorted here, as “n” above.) Don’t forget to include the operations that control the loops.

Important: Don’t just state the polynomials, your writeup has to explain how you derived them, ideally line by line! Anyone can google for the answer, but you need to convince us that you actually did the work


Deliverables

The files you have // TODO items in are listed explicitly below:

BubbleSort.java
InsertionSort.java
MeasuredArray.java
MeasuredArrayTest.java
PolySort.java

You need to submit all of these files to the autograder along with a README. You can upload them individually or in a zip file. If you upload them in a zip file make sure they are all at the top level, you cannot have any extra directories or else the autograder won’t be able to find them.

Make sure the code you hand in does not produce any extraneous debugging output. If you have commented out lines of code that no longer serve any purpose you should remove them.

README

You must hand in the source code and a README file. The README file can be plain text (README with no extension), or markdown (README.md). In your README be sure to answer the discussion questions posed in this description. You should discuss your solution as a whole and let the staff know anything important. If you are going to be using late days on an assignment, we ask that you note it in your README.

If you want to learn markdown formatting, here is a good starting point.

Submitting to Gradescope

Once you are ready to submit your files, go to the assignment 3 page for Gradescope and click submit. Note that you can resubmit any time up until the deadline. Only your most recent submission will be graded. Please refer to course policies as far as policies regarding late days and penalties.

After you submit, the autograder will run and you will get feedback on your functionality and how you performed on our test cases. Some test cases are “hidden” from you so you won’t actually know your final score on the test cases until after grades are released. We also include your checkstyle score as a test case.

If you see the “Autograder Failed to Execute” message, then either your submission did not compile at all or there was a packaging error. Please see the Gradescope Submission Notes in Piazza Resources for help debugging why your submission is not working.

You do not need to fully implement each file before you submit, but you’ll probably fail the test cases for the parts of the assignment you haven’t done yet. Also note that only the files with // TODO items in them will be used. You cannot modify any of the provided interface files as the autograder will overwrite any changes you made with the original provided file.


Grading

For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all assignments.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments.

Style refers to Java programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable javadoc documentation, etc. Many aspects of this are enforced automatically by Checkstyle when run with the provided configuration file.

Testing refers to proper unit tests for all of the data structure classes you developed for this assignment, using the JUnit 4 framework as introduced in lecture. Make sure you test all parts of the implementation that you can think of and all exception conditions that are relevant.

Performance refers to how fast/with how little memory your program can produce the required results compared to other submissions.

Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous and you had to make a certain choice, defend that choice in your README file.

If your submission does not compile, you will not receive any of the autograded-points for that assignment. It is always better to submit code that at least compiles. You will get freebie points just for compiling.

If your programs have unnecessary warnings when using javac -Xlint:all you will be penalized 10% functionality per failed part. (You are also unable to use the @SuppressWarnings annotation - we use it just to filter our accepted warnings from yours.)

If your programs fail because of an unexpected exception, you will be penalized 10% functionality per failed part. (You are not allowed to just wrap your whole program in to a universal try-catch.)