Data Structures Project 4

Due: Wed, Dec 13, 2000, 5pm. 100 points.

In this project, you will implement the AVL tree and a hashing scheme.

We are seeking to create a dictionary to help investigate similarity in people's outlook if they have a similar background. We will only consider people's places of residence for this exercise. We have divided the world into 50,000 distinct regions, each identified by an integer. There are 6 billion people in the world, each identified by a two (case insensitive) letter country code and a 10 digit social security number. Of course, not every human being has participated in our survey. Hence, we have only limited data.

For each participant in the survey, we have a list of region IDs (an integer between 1 and 50,000) the person has resided in. Searches would use these regions as keys. Thus, given a list of regions, we would be required to find all people who have resided in that list of regions. Note that the order of regions is important and there may be multiple people with the same key.

We decide to use hashing for our dictionary, but realize that the survey could be highly skewed resulting in many collisions. In order to keep the worst case behavior efficient, we further decide to use separate chaining and maintain each entry in the hash table as an AVL tree. Thus the worst case search time is still log(n).

In short, implement the Dictionary interface (from the textbook) using hashing. Given a list of region codes, hash the record to an index in your hash table. Use an initial hash table size equal to 7. The hash table is really a table of roots of AVL trees. Multiple records hashing to an index get inserted into the corresponding AVL tree. Thus the AVL tree must provide insert, search and remove methods. You may use the Dictionary interface for AVL tree methods as well.

In your main method, first read all records from this record file and insert them into your dictionary. Each record is on a new line. The first field is a two letter country code and the next field is the 10 digit social security number. Next comes the number of regions that person has stayed in followed by the region IDs. Next, read records from this example query file, one at a time and print on screen the content of the records if the key was found and RECORD NOT FOUND, otherwise. The query file lists the number of regions and region IDs for each query (which is on its own line in the file). Note that different record and query files will be used for testing your code.)

Styart Remember that the key length is not fixed. For the AVL tree, you will need to implement a ResidenceComparator class that compares two lists (you can implement this using vectors, lists or sequences) of region IDs. You may impose any order.

Assuming unique key values will lose some points. Implementing rehashing will earn some extra points. But be sure to provide a method to turn re-hashing off so that your AVL tree code may be tested.