Spring Semester 2006: January 30, 2006 - May 5, 2006
Out on:
February 9, 2006
Due by:
February 15, 2006 by 5:59 pm for full credit (11:59 pm for 10% off, hard deadline)
Collaboration:
None
Grading:
Packaging 10%, Style 10%, Performance 10%, Design 20%, Functionality 50%
The second assignment for
600.226: Data Structures
deals mostly with algorithm analysis and abstract data types.
There are even some "written" problems this time around, to be
answered in the README file.
Your first task for this assignment is to explore some basic sorting algorithms and their (detailed) analysis. All of these are quadratic in terms of their asymptotic performance but differ in their "actual" performance. We'll consider the following algorithms:
We're providing a basic "test bed" for sorting algorithms in the form of a Java application below. This will allow you to concentrate on the algorithms instead of writing (relatively) boilerplate code. However, you still need to understand this interface:
/**
* A basic sorting algorithm.
*/
public interface SortingAlgorithm
{
/** Sort an array of strings. */
public void sort( String[] strings );
/** Name of this algorithm. */
public String name();
}
Our application relies on this interface to actually "run"
a given sorting algorithm.
You will have to write your algorithms as classes that
implement this interface.
The sort method performs the actual sorting,
while the name method returns the name of the
sorting algorithm for printing by the framework.
We give you an implemention of BubbleSort to
study (is there anything wrong with it?) but it is your job
to implement the classes SmartBubbleSort
(which stops if no swaps were performed during a pass),
InsertionSort, and SelectionSort.
All of this should be fairly straightforward.
The main point of this problem is actually not coding.
We want you to compare the running times of these algorithms
and try to explain why they are different.
In your README you should describe a series of
experiments you ran, what data you collected, and what your
conclusions about running times are.
Things to address: Does the actual running time correspond
to the asymptotic complexity as you would expect? What
explains the practical differences between these algorithms?
Does it matter what kind of data you are sorting? Can you
perform a more detailed "theoretical" analysis by comparing,
for instance, the exact number of comparisons or the exact
number of data moves as a function of input size?
Just to be clear: Yes, we'll need the code, and it should be
up to the usual standards. But the "report" you put in your
README is just as important as the code!
Here is our small framework for sorting: sorting.tar.gz Let us know how you like the code. :-)
Your second task is to implement a variation of the
SelectionSort algorithm.
In DoubleSelectionSort you locate the
minimum and the maximum of the as
of yet unsorted part of the array, and you swap them
"to the front" and "to the back" as appropriate.
This should increase the size of the sorted portion
by two in each pass.
After you are convinced that your implementation works
(and actually sorts!) comment on its performance
compared to the above algorithms.
Feel free to use any measures you think appropriate,
and discuss any problems you had implementing the
algorithm in your README.
In lecture we derived algebraic specifications for the abstract data
types Variable and Array.
For both, I mentioned that it's easier to specify the ADT if we make
sure that instances are initialized to some known value when they are
created using the new function.
Following the example specifications for Variable and
Array, develop a specification for uninitialized
variables.
You will have to introduce at least one new function, make some
existing function partial, provide the proper preconditions, and
adjust the axioms.
Write up the algebraic specification for this kind of Variable
in the format we used in lecture.
Do you like this version of the specification better? Why? Why not?
Please turn in a
gzip
compressed
tarball
of your assignment;
the filename should be
cs226-assign-2-login.tar.gz
with login replaced by your Unix login name
on ugradx.cs.jhu.edu
(so I would use cs226-assign-2-phf.tar.gz).
The tarball should contain no derived files whatsoever
(i.e. no .class files, no .html files, etc.),
but allow building all derived files.
Include a README file that briefly explains what your
programs do and contains any other notes you want us to check out
before grading.
For reference, here is a short explanation of the grading criteria.
Packaging refers to the proper organization of the
stuff you hand in, following the guidelines for Deliverables above.
Style refers to Java programming style, including
things like consistent indentation, appropriate identifiers,
useful comments, suitable javadoc documentation, etc.
Simple, clean, readable code is what you should be aiming for.
Performance refers to how fast your program can
produce the required results compared to other submissions.
Design refers to proper modularization and the
proper choice of algorithms and data structures.
Functionality refers to your programs being
able to do what they should according to the specification
given above; if the specification is ambiguous and you had
to make a certain choice, defend that choice in your
README file.
If your programs cannot be built you will get no points whatsoever.
If your programs cannot be built without warnings using
javac -Xlint
we will take off 10% (except if you document a very good reason).
If your programs fail miserably even once,
i.e. terminate with an exception of any kind,
we will take off 10%.
There is a second way we could have measured the performance of
various sorting algorithms.
Java comes with a built-in "profiler" that can tell you how much
time your program spends "where" in the code.
You can get a "short" version of this by using
java -Xprof
to run your programs instead of the plain java
command.
If you need more detail, you can use
java -Xrunhprof:cpu=samples
which will create a new file
java.hprof.txt
containing lots of details.
The tail of that file has a useful little table, and you can
select many more options for profiling
(to see what the options are, run
java -Xrunhprof:help).
For this problem, compare the measurements of the sorting
algorithms made by our framework to the measurements you
get from Java's profiler.
Do they correspond in the expected way or not?
If not, why not?
Note that we won't give you extra points for this, but we'll give
you extra kudos. :-)