600.226: Data Structures

Fall Semester 2005: September 8, 2005 - December 12, 2005

Assignment 2: Up and Running

Out on: September 15, 2005
Due by: September 21, 2005 by 5:59 pm for full credit (11:59 pm for 10% off, hard deadline)
Collaboration: None
Grading: Packaging 10%, Style 10%, Performance 10%, Design 20%, Functionality 50%

Overview

The second assignment for 600.226: Data Structures deals mostly with algorithm analysis and abstract data types. There are even some "written" problems this time around, to be answered in the README file.

Problem 1: Assorted Sorts

Your first task for this assignment is to explore some basic sorting algorithms and their (detailed) analysis. All of these are quadratic in terms of their asymptotic performance but differ in their "actual" performance. We'll consider the following algorithms:

We're providing a basic "test bed" for sorting algorithms in the form of a Java application below. This will allow you to concentrate on the algorithms instead of writing (relatively) boilerplate code. However, you still need to understand this interface:

/**
 * A basic sorting algorithm.
 */
public interface SortingAlgorithm
{
  /** Sort an array of strings. */
  public void sort( String[] strings );
  /** Name of this algorithm. */
  public String name();
}

Our application relies on this interface to actually "run" a given sorting algorithm. You will have to write your algorithms as classes that implement this interface. The sort method performs the actual sorting, while the name method returns the name of the sorting algorithm for printing by the framework.

We give you an implemention of BubbleSort to study (is there anything wrong with it?) but it is your job to implement the classes SmartBubbleSort (which stops if no swaps were performed during a pass), InsertionSort, and SelectionSort. All of this should be fairly straightforward.

The main point of this problem is actually not coding. We want you to compare the running times of these algorithms and try to explain why they are different. In your README you should describe a series of experiments you ran, what data you collected, and what your conclusions about running times are. Things to address: Does the actual running time correspond to the asymptotic complexity as you would expect? What explains the practical differences between these algorithms? Does it matter what kind of data you are sorting? Can you perform a more detailed "theoretical" analysis by comparing, for instance, the exact number of comparisons or the exact number of data moves as a function of input size?

Just to be clear: Yes, we'll need the code, and it should be up to the usual standards. But the "report" you put in your README is just as important as the code!

Here is our small framework for sorting: sorting.tar.gz Let us know how you like the code. :-)

Problem 2: The Dreaded Double Selection (aka Peter's Double Whammy Sort)

Your second task is to implement a variation of the SelectionSort algorithm. In DoubleSelectionSort you locate the minimum and the maximum of the as of yet unsorted part of the array, and you swap them "to the front" and "to the back" as appropriate. This should increase the size of the sorted portion by two in each pass. After you are convinced that your implementation works (and actually sorts!) comment on its performance compared to the above algorithms. Feel free to use any measures you think appropriate, and discuss any problems you had implementing the algorithm in your README.

Problem 3: Bogusses or Bogi?

Check up on the definition of BogoSort. After being suitably amused, your third task is to implement a non-random variation of this "bogus" sorting scheme. Write a class BogoSort that sorts an array by trying out all possible permutations of the array and stopping when it gets to one that happens to be sorted. Any comments? :-)

Problem 4: Arrays and Variables

In lecture we derived algebraic specifications for the abstract data types Variable and Array. Think of a way to change the specification of Array to use the Variable specification for the individual "cells" of the array. Write up the algebraic specification for this VarArray in the format we used in lecture. Do you like this new specification better? Why? Why not?

Deliverables

Please turn in a gzip compressed tarball of your assignment (the extension should be .tar.gz). The tarball should uncompress into a directory cs226-assignment-2-login with login replaced by your Unix login name (so I would use cs226-assignment-2-phf); uncompressing should not create any other files in the current directory. The tarball should contain no derived files whatsoever (i.e. no .class files, no .html files, etc.), but allow building all derived files. Include a README file that briefly explains what your programs do and contains any other notes you want us to check out before grading (and of course your answers to "written" problems).

Grading

For reference, here is a short explanation of the grading criteria. Packaging refers to the proper organization of the stuff you hand in, following the guidelines for Deliverables above. Style refers to Java programming style, including things like consistent indentation, appropriate identifiers, useful comments, suitable javadoc documentation, etc. Simple, clean, readable code is what you should be aiming for. Performance refers to how fast your program can produce the required results compared to other submissions. Design refers to proper modularization and the proper choice of algorithms and data structures. Functionality refers to your programs being able to do what they should according to the specification given above; if the specification is ambiguous and you had to make a certain choice, defend that choice in your README file.

If your programs cannot be built you will get no points whatsoever. If your programs cannot be built without warnings using javac -Xlint we will take off 10% (except if you document a very good reason). If your programs fail miserably even once, i.e. terminate with an exception of any kind, we will take off 10%.

Bonus Problem

There is a second way we could have measured the performance of various sorting algorithms. Java comes with a built-in "profiler" that can tell you how much time your program spends "where" in the code. You can get a "short" version of this by using java -Xprof to run your programs instead of the plain java command. If you need more detail, you can use java -Xrunhprof:cpu=samples which will create a new file java.hprof.txt containing lots of details. The tail of that file has a useful little table, and you can select many more options for profiling (to see what the options are, run java -Xrunhprof:help). For this problem, compare the measurements of the sorting algorithms made by our framework to the measurements you get from Java's profiler. Do they correspond in the expected way or not? If not, why not? Note that we won't give you extra points for this, but we'll give you extra kudos. :-)

Updated: $Id: assignment-2.html 57 2005-09-16 01:36:51Z phf $ Validate: XHTML CSS
Copyright © 2005 Peter H. Fröhlich. All rights reserved.