Fall Semester 2005: September 8, 2005 - December 12, 2005
Out on:
September 15, 2005
Due by:
September 21, 2005 by 5:59 pm for full credit (11:59 pm for 10% off, hard deadline)
Collaboration:
None
Grading:
Packaging 10%, Style 10%, Performance 10%, Design 20%, Functionality 50%
The second assignment for
600.226: Data Structures
deals mostly with algorithm analysis and abstract data types.
There are even some "written" problems this time around, to be
answered in the README file.
Your first task for this assignment is to explore some basic sorting algorithms and their (detailed) analysis. All of these are quadratic in terms of their asymptotic performance but differ in their "actual" performance. We'll consider the following algorithms:
We're providing a basic "test bed" for sorting algorithms in the form of a Java application below. This will allow you to concentrate on the algorithms instead of writing (relatively) boilerplate code. However, you still need to understand this interface:
/**
* A basic sorting algorithm.
*/
public interface SortingAlgorithm
{
/** Sort an array of strings. */
public void sort( String[] strings );
/** Name of this algorithm. */
public String name();
}
Our application relies on this interface to actually "run"
a given sorting algorithm.
You will have to write your algorithms as classes that
implement this interface.
The sort method performs the actual sorting,
while the name method returns the name of the
sorting algorithm for printing by the framework.
We give you an implemention of BubbleSort to
study (is there anything wrong with it?) but it is your job
to implement the classes SmartBubbleSort
(which stops if no swaps were performed during a pass),
InsertionSort, and SelectionSort.
All of this should be fairly straightforward.
The main point of this problem is actually not coding.
We want you to compare the running times of these algorithms
and try to explain why they are different.
In your README you should describe a series of
experiments you ran, what data you collected, and what your
conclusions about running times are.
Things to address: Does the actual running time correspond
to the asymptotic complexity as you would expect? What
explains the practical differences between these algorithms?
Does it matter what kind of data you are sorting? Can you
perform a more detailed "theoretical" analysis by comparing,
for instance, the exact number of comparisons or the exact
number of data moves as a function of input size?
Just to be clear: Yes, we'll need the code, and it should be
up to the usual standards. But the "report" you put in your
README is just as important as the code!
Here is our small framework for sorting: sorting.tar.gz Let us know how you like the code. :-)
Your second task is to implement a variation of the
SelectionSort algorithm.
In DoubleSelectionSort you locate the
minimum and the maximum of the as
of yet unsorted part of the array, and you swap them
"to the front" and "to the back" as appropriate.
This should increase the size of the sorted portion
by two in each pass.
After you are convinced that your implementation works
(and actually sorts!) comment on its performance
compared to the above algorithms.
Feel free to use any measures you think appropriate,
and discuss any problems you had implementing the
algorithm in your README.
Check up on the definition of
BogoSort.
After being suitably amused, your third task is to implement a
non-random variation of this "bogus" sorting scheme.
Write a class BogoSort that sorts an array by trying
out all possible permutations of the array and
stopping when it gets to one that happens to be sorted.
Any comments? :-)
In lecture we derived algebraic specifications for the abstract data
types Variable and Array.
Think of a way to change the specification of Array
to use the Variable specification for the individual
"cells" of the array.
Write up the algebraic specification for this VarArray
in the format we used in lecture.
Do you like this new specification better? Why? Why not?
Please turn in a
gzip
compressed
tarball
of your assignment (the extension should be .tar.gz).
The tarball should uncompress into a directory
cs226-assignment-2-login
with login replaced by your Unix login name
(so I would use cs226-assignment-2-phf);
uncompressing should not create any other files
in the current directory.
The tarball should contain no derived files whatsoever
(i.e. no .class files, no .html files, etc.),
but allow building all derived files.
Include a README file that briefly explains what your
programs do and contains any other notes you want us to check out
before grading (and of course your answers to "written" problems).
For reference, here is a short explanation of the grading criteria.
Packaging refers to the proper organization of the
stuff you hand in, following the guidelines for Deliverables above.
Style refers to Java programming style, including
things like consistent indentation, appropriate identifiers,
useful comments, suitable javadoc documentation, etc.
Simple, clean, readable code is what you should be aiming for.
Performance refers to how fast your program can
produce the required results compared to other submissions.
Design refers to proper modularization and the
proper choice of algorithms and data structures.
Functionality refers to your programs being
able to do what they should according to the specification
given above; if the specification is ambiguous and you had
to make a certain choice, defend that choice in your
README file.
If your programs cannot be built you will get no points whatsoever.
If your programs cannot be built without warnings using
javac -Xlint
we will take off 10% (except if you document a very good reason).
If your programs fail miserably even once,
i.e. terminate with an exception of any kind,
we will take off 10%.
There is a second way we could have measured the performance of
various sorting algorithms.
Java comes with a built-in "profiler" that can tell you how much
time your program spends "where" in the code.
You can get a "short" version of this by using
java -Xprof
to run your programs instead of the plain java
command.
If you need more detail, you can use
java -Xrunhprof:cpu=samples
which will create a new file
java.hprof.txt
containing lots of details.
The tail of that file has a useful little table, and you can
select many more options for profiling
(to see what the options are, run
java -Xrunhprof:help).
For this problem, compare the measurements of the sorting
algorithms made by our framework to the measurements you
get from Java's profiler.
Do they correspond in the expected way or not?
If not, why not?
Note that we won't give you extra points for this, but we'll give
you extra kudos. :-)