600.226 Data Structures -- Spring 2013
Homework #8
The main purpose of this assignment is to work with binary search
trees, and compare BoundedOrderedSet implementations on large datasets.
Part A) Problem Prep
For this assignment you need to understand basic and balanced
binary search tree operations. As always, read and review before
starting the assignment. Here is a recap of resources:
Part B) Short Answer -- due 3pm on Monday, 4/1 -- 15
points
You must submit a legible hard-copy of your answers in class
on Monday. If you submit on Blackboard, please use a .txt or .pdf
file type only!
1) Draw the binary tree given by this ranked array representation. Is it a binary search tree? If not, why not? [4]
|   | M | A | R | C | H | | M | A | D | N | E |   |   | S | S |
2) Identify all the nodes that are not balanced in this binary search
tree. [3]
15
/ \
9 25
/ \ / \
7 10 20 30
/ \ \ / \ \
3 8 13 16 23 35
/ \ / \ \ / \
1 5 12 14 18 21 24
\
19
3) Draw this AVL tree after inserting p, rebalancing if
needed. Show all steps. [3]
m
/ \
e u
/ \ / \
c j o v
/ \ / \
g l n s
4) Draw this AVL tree after deleting e, replacing with the max value in
left subtree, rebalancing as much as necessary. Show all steps. [5]
m
/ \
e t
/ \ / \
c h p y
/ \ / \ / / \
a d g j n v z
\ / / \ / \
b f i k u w
\
l
Part C) Implementation -- due 2:30pm on Friday, 4/5 -- 85 points
(+20 Extra Credit)
The goal of this assignment is to be able to implement unbalanced
and balanced (AVL) binary search trees and use them to implement the
BoundedOrderedSet interface from assignment 6. We will then be able
to compare three implementations for actual practical efficiencies
using large datasets. The BoundedOrderedSet extends our Set
interface from assignment 5. For the purpose of this assignment I
have changed the Set interface to also extend Iterable, to
potentially enable you to implement some methods more easily without
unchecked casts. For testing purposes, we have a JUnit program
BoundedOrderedSetTest226.java that you may use. (A complete
list/zip/link of files to use in this assignment appears below.)
- [20] Write an (unbalanced) binary search tree implementation of the
BoundedOrderedSet.java interface. This implementation should build
on the BST.java implementation we wrote in class (make sure to cite
it). You will need to implement the iterator (create, hasNext,
next, remove), union, intersection
and position. Try to make union and intersection as efficient as
possible, but remember that they must result in BSTs also. (First
get them to work without worrying about efficiency, and then if time
try to improve.) As before, add comments to each method identifying
its run-time. You do not have to fully implement the bounding
methods (setMin, setMax, getMin, getMax) for regular credit - doing
so will be extra credit. You will need to include dummy methods for
them at this point so that everything compiles.
- [30] Write an AVL tree implementation of the BoundedOrderedSet.java
interface, with the same guidelines and details as for the binary
search tree implementation above. This implementation could build
on the BST.java implementation we wrote in class, and/or the Weiss AVL
tree code. You will need to write the delete method and incorporate
the height or balance factors and rotations.
- We have written a driver program that compares all three
BoundedOrderedSet implementations: linked list (from hw6), BST, and
AVL, using large datasets of String and Integers. Most of the code
for this part of the assignment has been written for you in
BOSCompare.java. You only have to add a few lines in main to
create BoundedOrderedSets for your BST and AVL implementations, for
Integers and Strings, create some test files, and do something to
with the timing results. There are lots of warnings in the
given program in our attempt to use generics too generically - you
do not have to fix any of those warnings. The basic operation of
this program is as follows:
- Get the name of any plain text input file containing only integers
from the user. Read one integer at a time and store in an Integer
based ArrayList (use the Java API). We will refer to the size of
this collection as N.
- [2] Create a collection of the various BoundedOrderedSet implementations.
- Create an array of N/10 random integers in the range [0,N) to be
used as indices into the ArrayList of values to give us a list of
values to search for or delete from the BoundedOrderedSet. We will
refer to this as the index subset.
- For each type of BoundedOrderedSet (BOS), do the following four
operations as ordered here, timing each one: insert all the values
from the ArrayList into it (most succeed, duplicates should fail),
search the BOS for all the values in the Arraylist that are in
positions in the index subset (all succeed), delete from the BOS all
the Arraylist values corresponding to the index subset (most
succeed, duplicates fail), search the BOS for all the Arraylist
values corresponding to the index subset (all fail).
- Sort the original collection of integer values and repeat all the
testing. (You can use the Java API to sort.)
- Do the same entire process using Strings read from a plain text
file instead, with unordered and then ordered versions.
- [8] Make random files with at least 100,000 values in each to use for
testing (one with integers, one with strings). Include some
duplicate values in each.
- [15] You should have three implementations to compare. For each of
them we do an unsorted set of timing tests, and a sorted version. For
each of those, four tests are run, and this is all done for Integers
and Strings. That's a total of 3*2*4*2 = 48 timing results. Edit the
given comparison program to produce output for the timing results.
Create a report document called "results" which contains your timing results
(all four tables). (Do not make the program create this file when
run, instead manually create it in any format you want based on whatever data
you made the program spit out. Eventually you must submit it as plain
text or pdf.)
Report the timing values in 4 labelled tables: Integer Unsorted, Integer
Sorted, String Unsorted, String Sorted. Each table must have a row
for each implementation (LinkedList, BST, AVL) and a column for
each operation (insert, search1, delete, search2), where the entries are the
actual measured elapsed time for each.
In your report include
commentary on the results - what met your expectations, what didn't?
Can you explain any abnormalities? Submit this results file in
plain text or (preferred) pdf format.
- [20] Extra Credit: correctly implement the bounding methods (setMin,
setMax, getMin, getMax) for the binary search tree implementation
and the AVL tree implementation.
Submission: include all of the following files in one zip that you
submit on Blackboard. All our files that you may need are bundled
together here - make sure you start with these versions.
- Our interfaces: Set.java, BoundedOrderedSet.java
- Your implementations: Linked List bounded ordered set (yours or
ours from
assignment 6), BST bounded ordered set, AVL bounded ordered
set. Be sure to include core BST.java and AVL.java files if your
BST and AVL BOS implementations are separate files that wrap around
these.
- Comparison program: BOSCompare.java and any additional
files you use for this part.
- Input text files: integers.txt, strings.txt to use for the
comparison.
- Your timing result report (.txt or .pdf) with your tables and discussion.
- Our BoundedOrderedSetTest226.java and any other BST or AVL JUnit
test programs you may have used.
Grading: here is a rough breakdown of each part.
- [20] BST implementation of BoundedOrderedSet
- [30] AVL implementation of BoundedOrderedSet (10 more for delete method)
- [10] BOSCompare implementation and input files
- [15] Timing results and commentary document
- [10] Submission, (check)style, etc. (You do not have make our
files checkstyle compliant if you use them intact.)
- [20] Extra Credit (bounded methods for BST and AVL)