600.226 Data Structures (Spring 2004 - Prof. Eisner)

Homework 10 ... due 11:59pm Tuesday 4 May Sunday 2 May 2004

Summary

Goal of this homework: To implement graphs and Dijkstra's algorithm in the context of a larger, "more real-world" system. Get used to reading and modifying other people's code. Get used to using the built-in Java classes. Try something open-ended where you have to carry out the design yourself.

Collaboration policy for this homework: You must follow the CS Department Academic Integrity Code, except that for this homework, you may optionally work with one other student in the class. You must name your partner (if any) at the top of your Readme file, and both of you will receive the same grade for the assignment.

How to hand in your work: instructions, upload page

What to hand in (all packaged up in a zip or tar file):

Please do NOT include baltimore.map or any other large data files. We don't have enough space!

If baltimore.map is your only large data file, then skip this paragraph. We already have it; no need to submit it. But perhaps you did 6 degrees of Kevin Bacon, or maybe you made some improvement that is best demonstrated on a .map file of your home town. If you need to submit these files in order for us to grade your assignment, please make the large files available by http:, ftp:, or an email attachment to cs226-staff. Explain in your README where to find them. You must still submit the entire assignment except the data files by the usual submission system. That is, submit a zipfile containing the README and all code etc., as descibed below.
Your commented code:
- Any .java files, written either by us or by you, that are necessary to make your route planner run correctly.
- Any .java files needed for your improvements. These may be improvements to the route planner, or separate from it.
- Please make sure that you include everything that is needed to make your code compile!
- Make sure that all the package directories are included in your zip file, and that all the files IN those directories are also included!
A doc subdirectory containing HTML documentation that you have generated with the command javadoc -private -linksource -d doc */*.java. The graders will look at this documentation, so you should provide javadoc comments for all of your methods, including private ones.
A Readme file that contains your answers to miscellaneous questions on the homework, and in particular describes your improvements. The top of your Readme should state the name of your partner (if any).

Acknowledgments: Thanks to Jim Gillispie, of the Sheridan Libraries' Maps Collection, for help finding and interpreting the data.

Introduction

Since computer-generated driving directions hit the web several years ago, they have been one of the more obvious and visible applications of graph algorithms. If you haven't used one of these systems lately, experiment a bit at one or two of these sites before starting the assignment. Note that the routes are not always great -- the underlying map data can be wrong, and even when they're right, they don't always distinguish between a cow path and a thruway.

I've written some support code to help you handle messy geographical data like map files and address strings. You'll write the clean underlying part -- graph data structures and Dijkstra's algorithm.

The basic system that you will write will behave something like this. Note that it tells you to drive the wrong way down Charles St, but you're only young once.

$ java geography/Streetmap ../data/baltimore.map
Reading map from ../data/baltimore.map (may take a moment) .................
Built graph with 84740 edges and 26988 vertices.
Address finder knows 3769 street names.
Welcome to the route planner!

Enter start address: 3500 charlas st
  Found nearby intersection at 3499 N Charles St (near (-76.61757, 39.329265))
Enter destination address: fayette
  Found nearby intersection at 1 E Fayette St (near (-76.615122, 39.290436))
  Explored 19386 edges from 6029 vertices; 7476 relaxations
Start on N Charles St going south and continue 0.33 miles.
Turn right onto E 30th St going west and continue 0.01 miles.
Turn left onto N Charles St going south and continue 1.79 miles.
Turn right onto W Madison St going west and continue 0.01 miles.
Turn left onto Washington Pl going south and continue 0.15 miles.
Bear left onto W Centre St going southeast and continue 0.01 miles.
Turn right onto N Charles St going south and continue 0.41 miles.
Total driving distance: 2.71 miles  (Straight-line distance: 2.69 miles.)

I'm only supplying a Baltimore city map for you to play with. However, you can also make your own US maps from data that are freely available on the web. (See below.)

Once you have got the basic system working, you are asked to make two improvements of your choice, as described below.

Files Available and Their Design

You can download everything you need for this assignment in one convenient zipfile, or browse the files on the web. You can also read the Javadocs for the supplied code.

Stuff from Past Assignments

You'll need to use some data structures from your past assignments, notably priority queues with locators (for Dijkstra's algorithm). As in HW7 and HW8, you can find working packages for that here. You should use the packages rather than your own HW6 code.

The `graph` Package

You will finish writing this graph package. It will include directed graphs (Digraph) and Dijkstra's shortest-path algorithm (Dijkstra).

We've provided the necessary interfaces and some class code to get you started. I have tried to simplify and improve the interfaces in the textbook, although this is partly just a matter of taste. Read the javadoc comments in the code!

We've also provided test methods in Dijkstra.java and HashKeyedDigraph.java that let you check your implementation of Dijkstra's algorithm against the airport examples illustrated in the book.

One important feature that is not in the textbook is the notion of a keyed graph. Such a graph has names for some or all of its vertices. The names serve as keys, so you can look up a vertex by its name. For example, you might want to fly from "BWI" to "LAX" -- and that means you need to find the vertices corresponding to those names. This requires a hash map or something like that.

You've actually already built keyed data structures. In the election classes of Homeworks 7 and 8, the entries on the priority queue had names like "Toomey" and "Born". So when saw the candidate's name on a ballot, you could find her entry. That is really the idea of a keyed adaptable priority queue, but you didn't write a KeyedAdaptablePriorityQueue interface and implement it. Rather, you used two objects -- an adaptable priority queue plus a separate hash dictionary that maintained the mapping from keys to entries.

This time let's notice the abstract idea of keying and handle it in a general way. You'll implement a KeyedDigraph interface that extends Digraph, so that keying is an official feature that happens inside the class and can be reused for multiple problems. In particular, once you've tested your keyed graph implementation with airports, you will be able to apply it to route planning - where the keys are not strings but geographic locations of the form (longitude, latitude).

Section 12.3.2 of the textbook discusses decorating graph vertices and edges, which is useful for many algorithms, including Dijkstra's. In general they talk about decorating positions. And even more generally we could decorate any object at all, so support for decoration belongs in the container package.

(If we'd had decorable objects before, then instead of extending MyEntry into LocationAwareEntry (for adaptable priority queues) and then TwinnedLocationAwareEntry (for priority deques), we could have simply decorated instances of MyEntry with extra positions as necessary. What are the pros and cons of this strategy?)

Here are some extra container classes for that. We've formalized the notion slightly more carefully for you by providing a container.Decorable interface that can be implemented in a general way. Your implementation can then be extended into specific kinds of decorable objects, like a decorable Vertex. Moreover, we have required the attribute names to be objects of a new class container.Attribute, rather than arbitrary objects. The javadoc comments explain why this is a good idea.

The Data

You will download the data. Let's look at a few lines from the middle of baltimore.map:

-76618945  +39333311  -76620362  +39334208        4           98    W  University  Pky
-76619746  +39335708  -76620493  +39335678  7     4     99    98    W  39th	   St     
-76618169  +39335774  -76619746  +39335708  1           5           W  39th	   St     
-76617992  +39332692  -76618032  +39333710  3600  3601  3698  3699  N  Charles     St
-76617992  +39332692  -76618945  +39333311  101         109         W  University  Pky
-76616286  +39331624  -76616273  +39331473  100   7     102   103   E  University  Pky
-76617570  +39329265  -76616273  +39331473  3400  3401  3598  3599     Greenway

Each line represents a segment of a street, generally from one intersection to the next. There are 11 tab-separated columns in each line. The long numbers are longitudes and latitudes, and the short numbers are house numbers.

For example, the 4th line above says that you can travel from (76.617992W, 39.332692N) to (76.618032W, 39.333710N), and that if you do so, you will be going along a block of N. Charles St. whose numbers run from 3600 and 3601 (on your left and right, respectively) to 3698 and 3699. Note that some house numbers are missing: that's because some segments of University Parkway only have buildings on one side.

We have to assume that all street segments are two-way -- i.e., that you can also travel in the reverse direction. The assumption is occasionally wrong (as above), but most streets are two-way. So each line will correspond to a pair of directed edges in our graph.

For the interested: The .map file is extracted from the US Census Bureau's TIGER/Line files for 2002 (which date back to the 2000 census). If you want to use your route planner to tool around your hometown, or drive cross-country, you are welcome to make other maps of the US and its territories and commonwealths. Here's how. First look up state and county codes for each county you want on your map. For example, Baltimore city, Maryland is state 24, county 510. Now download each corresponding file (in this case, MD/tgr24510.zip), and extract just its .rt1 file (in this case, tgr24510.rt1). The rest of the data is interesting but beyond the scope of this assignment. In fact, we won't even use the full .rt1 files. To boil them down into this assignment's simpler .map format, use this Perl script to pull out just the relevant columns and put tabs between them.

The `geography` Package

To apply your graph package to the above data, you will need classes for dealing with the .map file and with the addresses that the user types in. Here's what's in the geography package. I've left a few bits for you to write, as indicated below. Read the javadoc comments in the code!

StreetSegment corresponds to a line of the .map file, i.e., a city block. It has meaningful fields for the different columns. Its constructor takes a string and is able to parse that string appropriately. (If not, it throws DataFormatException.)
Point is a point on the surface of the earth -- a (longitude, latitude) pair. There are methods distanceTo() and directionTo() for finding how to get to another point. Those methods know about the size and curvature of the earth. There are also hashCode() and equals(), so that you can hash Point objects.
An Intersection is what you currently get when you look up an address. It contains a Point and a string that describes that point. In the example run above, looking up 3500 charlas st found the nearby intersection 3499 N Charles St (near (-76.61757, 39.329265)).
An AddressFinder is the magic object that lets you look up an address. When we read the line
```
-76617992  +39332692  -76618032  +39333710  3600  3601  3698  3699  N  Charles     St
```
from the .map file, we insert the resulting StreetSegment into our address finder. The address finder says "Gee! I now know 4 new addresses! For example, there is a 3600 N Charles St near the Point (76.617992W, +39.332692N). I'll remember that later in case someone looks up that address or a similar address. Similarly, I know roughly where 3601, 3698, and 3699 N Charles St. are."

You may have noticed that in the example run above, the user looked up 3500 charlas st -- an address that was not in the AddressFinder. The AddressFinder had to make a reasonable guess about where it was. In general, this process of locating addresses is called geocoding.

The geocoding strategy currently used by the AddressFinder class is described in AddressFinder.java. It involves some ordered dictionaries (java.util.SortedMap), implemented with red-black trees (java.util.TreeMap). You knew those had to find their way in somehow.
Street is an inner class that is defined and used only inside AddressFinder. Its full name is actually AddressFinder.Street. If you're not sure you understand Java's inner classes, read the comments for Street and for its constructor.
A Route is basically a graph.Path whose edges are streets. Your graph.Dijkstra class will return the best path, but if you printed that path directly, it wouldn't come out looking like directions. So you'll turn it into a Route, which has a very specialized toString() method.

The Route.toString() method knows, for example, that a turn of between 100 and 130 degrees should be described as "Turn sharply left", and that a direction of about 45 degrees is "northwest". It also knows how to combine segments of the same street, so that the directions say "go 2 miles" instead of reporting every block separately.
Streetmap is the class that puts it all together. As its constructor reads StreetSegments from a .map file, it adds them to an AddressFinder and also to a graph.KeyedDigraph. Each directed edge of the digraph holds a street name as its element. Each vertex of the digraph holds a Point as its element, and is also keyed by that Point.

Putting it together, a Streetmap can find a route between two address strings more or less as follows:
1. Use the AddressFinder to turn each address String into an Intersection.
2. Find each Intersection on the graph.KeyedDigraph, by using the Intersection's Point as a key.
3. Use the graph.Dijkstra class to find the shortest graph.Path between the points.
4. Turn the graph.Path into a Route that can be printed nicely
The main method for Streetmap doesn't run a website or anything, but it is more useful than most test methods. It prompts the user for two addresses and prints out the resulting route, as shown above.

I think you will be impressed by how fast this is, even when it has to explore most of Baltimore to find the destination. This is partly because computers are speedy. But when you realize that there are exponentially many possible routes from point A to point B, you really have to give a lot of credit to Dijkstra, for making it possible to find the shortest one in O(m log n) time. Especially when you realize that he did it in 1959 before anyone had ever heard of a data structure.

What You Need to Do

There is relatively new little code to write. However, to fill it in, you will have to study much of the provided code, particularly the method interfaces (which are commented).

That is one of the goals of this final assignment: to deeply understand how the data structures and techniques we've used over the semester can work together in a "real" system design.

Finish the `container.HashDecorable` class

This is trivial - a mere warmup. HashDecorable is basically just a wrapper around a small dictionary. The main reason to ask you to do it is so that you have a chance to try out Java's built-in dictionaries, which are called Maps (the java.util.Map interface) because they "map" each key to a value. You probably want to use a small java.util.HashMap. Refer as necessary to their documentation.

If you prefer, you can instead use your own dictionary class from Homework 7.

Implement `graph.Digraph`

First study the Digraph (directed graph) interface. It is about as simple as possible, because it is not very powerful. By contrast, the book tries to give you the only Graph and Digraph data structures you'll ever need (sections 12.1.1 and 12.4) -- at the cost of a complicated interface and a somewhat less efficient implementation.

(The book's authors allow various kinds of modifications to the graph, whereas we only need to build it. They support iteration over a vertex's in-edges as well as its out-edges. They store lists of all vertices and edges. Finally, they go to some trouble to allow the same graph to contain both directed and undirected edges, which is rarely useful.)

We provide basic, obvious implementations of vertices and edges: StoredVertex and StoredDirectedEdge. Also simple, eh? A vertex stores an element, and an edge stores an element plus two endpoint vertices. Also, they both support decorations.

While it's not so important for this assignment, we are careful not to say that all vertices and edges must be objects of these classes or their extensions. Extending these classes would make them store more information. But for some kinds of graphs, the vertices or edges might be able to store less information: the elements or endpoints might be computed on demand, or looked up in an auxiliary data structure, rather than stored in the object itself. So we say that StoredVertex and StoredDirectedEdge are merely implementations of abstract interfaces, Vertex and DirectedEdge. For generality, all your graph methods should manipulate vertices and edges where possible through these abstract interfaces.

Your job is to build an adjacency-list implementation of directed graphs. This involves writing two classes. You want to implement Digraph as a class AdjListDigraph. To do so, you will need to extend StoredVertex into a class DirectedAdjListVertex that stores a list of its out-edges. We have given you skeletons for both files.

Finish implementing `graph.KeyedDigraph`

Now study the KeyedDigraph interface, which extends Digraph. The basic idea was discussed earlier in this assignment.

You will implement KeyedDigraph as a wrapper class. That is, it will take an existing graph as an argument and turn it into a keyed graph. This doesn't care what the implementation of the existing graph is, so it is more flexible than just extending AdjListDigraph to implement KeyedDigraph. Instead of having a keyed AdjListDigraph, you will have the ability to build a keyed any-kind-of-graph.

Most of the wrapper class is already available in HashKeyedDigraph. Just edit the parts that say FILL THIS IN, and make sure the test methods work.

If you prefer, you can use your dictionary classes from Homework 7 instead of Java's HashMap class.

Answer in your Readme file:

Argue that if a KeyedDigraph is manipulated only through the KeyedDigraph interface, then it has the following invariant: every key of the keyed digraph refers to at most one vertex.

Finish implementing `graph.Dijkstra`

You'll implement Dijkstra's algorithm within this Dijkstra class. Much of the rest of the class has already been implemented for you, in order to lay out the design, which differs from the textbook's design in several ways. Read the comments in the code.

The core algorithm is the main part you have to write. You just need to edit the parts marked FILL THIS IN. You've already got a crucial portion -- an implementation of pqueue.AdaptablePriorityQueue from Homework 6.

The interface is especially different from the one in the textbook. It is more convenient for the user in several ways. The user specifies a weight function with a Weighter object, and gets back a shortest Path object. You will need to look at those small classes. Also, the user can specify a target vertex. The algorithm will stop when the target vertex is found, but can continue again later if the user is interested in distances from the same start vertex to other vertices.

The Dijkstra class contains test methods that try the toy examples from the book, so you can easily find out whether you're doing the right thing.

Answer in your Readme file:

Perhaps a better idea would be for the Dijkstra class to implement the java.util.Iterator interface. An instance of this iterator would iterate over graph vertices in order of increasing distance from the start vertex. Describe how you could add next() and hasNext() methods to your Dijkstra class.

Finish implementing `geography.Streetmap`

Now download the geography package, if you haven't already. Read through the mostly-finished Streetmap class. To hook it up to your Dijkstra class, you will just need to edit the few places that say FILL THIS IN. This is easy once you understand the design of the geography package. It's basically a matter of figuring out which existing methods to call.

Finish implementing `geography.AddressFinder`

Read through the AddressFinder class. This is finished except for the "interesting" calls that look things up in the red-black trees. As usual, you should FILL THEM IN. You will just have to figure out how to use Java's SortedMap interface, which is somewhat different from the book's ordered dictionary interface.

Answer in your Readme file:

The implementation of findStreetLike() obtains an iterator over a range of keys in the red-black tree, e.g., all keys that satisfy CHARL <= key < CHARM. Roughly, how would you implement the next() method of such an iterator? Your technique shouldn't need to rely on the red-black properties; it should work for any binary search tree.
You probably used the SortedMap.subMap() method. How can this method be implemented in a simple way that requires O(1) space and O(1) runtime?
Extra credit: The comments in findStreetLike() claim that if the string k falls right between keys x and y in the sorted map, then either x or y (or both) shares a maximal prefix with k. That is, no other key z in the sorted map shares a longer prefix with k. Prove it. (Hint: Proof by contradiction.)

Try it out

At this point, you should have a working route planner!

Improvements

You are required to make an improvement of your choice from the following list. The route planner is far from perfect, and it's appropriate to end the course with a more open-ended task. For extra credit, you can make multiple improvements.

As in Homework 8, a substantial part of your grade will depend on the clarity and elegance of your solution.

Answer in your Readme file:

Describe how you carried out the improvement. Give a clear overview of your design, mentioning anything that was interesting. (You may point the graders to high-level comments in the code for this.)

Show some interesting examples of input and output -- for example, illustrating the routes you get before and after your improvement.

Tell the graders exactly which files and methods to look at to see what you've done.

Possible Improvement: Simpler Routes

You'll notice that some of the routes you get are pretty complicated. In an effort to minimize the total distance, Dijkstra's algorithm will take all kinds of little shortcuts that would make the route hard to remember or follow for a real driver.

Come up with some techniques for making the routes shorter or simpler, and implement them. The basic technique is to change the notion of the weight of a path. Instead of just considering total distance, you should use some kind of combined criterion. You might say that paths are heavier (worse) if they (1) are longer, and (2) spend a lot of time on minor roads, and (3) have a lot of turns, and (4) have a lot of street name changes.

You are already doing (1). (2) is pretty easy if you have some way of defining a "minor" road. (Don't use AddressFinder.Street.numKnownAddresses(), since then Jones Falls Expressway and I-70 will come out as minor.) (3) and (4) are harder because now the cost is associated not just with an edge, but with a transition from one edge to the next. To handle these, you will either have to modify the street graph in some systematic way (how?), or modify Dijkstra's algorithm to act as if you have done that.

You will also need some way of combining these criteria into one definition of edge heaviness. Just make something up that seems to work well, like a linear combination of different criteria. Remember that Dijkstra's algorithm does not allow negative edge weights.

Possible Improvement: Address Interpolation

There are various ways to improve the geocoding done by AddressFinder. For example, the AddressFinder currently finds only intersections. But for addresses that are in the middle of the block, it might be helpful to return locations that are not Intersections.

To see why, try planning a route from 3360 Charles to 3370 St. Paul. The system will round to the nearest intersection and send you directly across 34th St. It would be better if it sent you a little way up Charles, across 34th, and down St. Paul.

Improve AddressFinder to do address interpolation in order to return approximate locations that are not Intersections.

Now improve Streetmap so that it can find the shortest route between non-intersections. You want to temporarily add two vertices to the graph, for the start and end locations, and then plan your route. When you're done you should remove those vertices again to prevent the graph from getting cluttered over time. (Alternatively, maybe you can think of a strategy that gets the same answer without actually modifying the graph.)

You can try looking up addresses at this geocoding website to see what the competition does. And here's an awesome feat of geocoding.

Possible Improvement: Intersection Lookup

Most of the online mapping applications allow you to specify intersections in the form "University and St. Paul" or "University Pkwy & St. Paul Street".

Improve AddressFinder so that it can do this as efficiently as it currently finds an intersection near a house number. The user should be able to enter either kind of address.

What if the two streets don't intersect at all? If you weren't sure which streets the user meant, you should consider alternative interpretations. If I say "Highfield and Norwood," and E Highfield Rd doesn't intersect with Norwood Ave, the system can try Norwood Rd, which is smaller but does intersect. You can use any reasonable strategy for trying the other interpretations. If you still can't find an intersection, return null as documented.

Possible Improvement: More Efficient Decorations

Hashes are overkill for storing attribute-value pairs. An object is likely to have only 0, 1, or 2 attributes most of the time. At that size, linked lists are plenty fast. In fact they are faster: no need to compute the hash code. And they take up less space: nothing is wasted on empty buckets, or on auxiliary variables to keep track of the size and capacity.

Design and implement a hybrid implementation of Decorable, called SlimDecorable, that uses a linked list at small sizes (< 10 attributes) and starts using a hash table at larger sizes. It should remain extremely space-efficient at small sizes. A Decorable object that currently has no attributes should only require 4 bytes (1 pointer) of storage more than an equivalent non-decorable object. Each additional attribute-value pair (up to about 10) should only require space for one new Item and a next pointer.

Comment your design well in SlimDecorable.java. Write a test method that checks that everything goes well when the decorable object switches to using a hash table.

Change your graph class to use SlimDecorable instead of HashDecorable, and make sure your route planner continues to work exactly as before.

Possible Improvement: More Efficient Edge Storage

Our graph implementation is a bit wasteful of storage, which might be an issue for really big graphs. Every out-edge stored at a DirectedAdjListVertex, v, contains a pointer to v itself as well as to the other endpoint w. Figure out a way to avoid storing the pointer to v.

Using your idea, construct a new implementation of Digraph that is similar to AdjListDigraph. However, it should not store full edges (StoredDirectedEdge) -- it should only store "out edges" that support destination() but not origin(). The total memory used should be less.

Call your new class HalfAdjListDigraph. Of course, it will still have to implement Digraph, so the user should not have to know or care that you are not storing the full edge. For example, insertReverseOf() and bestPathTo() need to get at the origin of an edge. They should still be able to do so.

Make sure that Dijkstra and Streetmap still work when you simply substitute HalfAdjListDigraph for AdjListDigraph. Notice that you didn't have to make a keyed version: the wrapper class HashKeyedDigraph wraps the new implementation as well as the old one.

When explaining your design in your Readme, discuss the implications for speed. Why will you be a bit slower?

Fortunately for speed, many algorithms mostly care about destination() and not origin(). For example, graph search algorithms (DFS, BFS, Dijkstra's) spend most of their time going forward on directed edges. It is very rare that they have to reverse direction and look at origin(). Use this observation to come up with a clean way to restore such algorithms to almost their original speed, while still using the more efficient storage scheme. (Hint: Add to the Digraph interface.) Modify Dijkstra to use this technique so that it works well with both AdjListDigraph and HalfAdjListDigraph. Make sure that it and Streetmap get the same answers as before.

Finally, in your Readme, discuss the case of a pair of opposing directed edges. They store the same two vertices; assume that they are also required to store the same element. (Really you can think of them as jointly constituting a single undirected edge.) Can you save any further memory in this case? If not, explain why not. If so, sketch a design.

Possible Improvement: Object Graphs

In a keyed digraph, one can use any objects as keys. Since vertices are in 1-to-1 correspondence with keys, one could conceptually think of a key object as being the vertex of the graph, instead of referring to the vertex.

This is powerful for two reasons. First, a given object (such as the string "BWI") might appear as a vertex in many different graphs. Second, two separately constructed objects that are equal are still considered to represent the same vertex, with the same incident edges. Neither of these is generally possible for a Vertex as we've defined it, such as a DirectedAdjListVertex.

In fact, many mathematicians do think of graphs this way. The mathematical definition of a graph is a set of arbitrary objects, V, and a set of edges between the objects, E. The objects are called "vertices" but there is no requirement that they come from some special Vertex class. They could be airport names, or points on the earth.

Redesign the directed graph interface so that it directly reflects this vision. Your interface should be similar to Digraph, but should not mention vertices or keys at all. It should never be necessary to go from an object to a vertex (keyed lookup) or a vertex back to an object (the element() method). Instead, the user should be able to think of an edge as connecting any two objects in the graph, such as strings.

Carry out your design as a new package objgraph, with an interface DirectedObjgraph and an implementation of that interface. Write some test methods that show off your interface, including the ability to decorate objects with respect to a graph (do this by decorating the underlying vertex).

Your implementation could just be a wrapper around graph.KeyedDigraph. Then there are Vertex objects there; the user just doesn't see them. Alternatively, it isn't hard to implement the idea without using Vertex objects at all; this saves space.

Now think about Dijkstra's algorithm (you don't have to reimplement it) and figure out why this pretty design hurts efficiency. Explain in your Readme.

Possible Improvement: Six Degrees of Kevin Bacon

This one is hard enough that it counts as more than 1 improvement (i.e., extra credit). I think it's not that hard, though.

Make a movies package that solves the "Six Degrees of Kevin Bacon" problem.

As we discussed in class, if you actually link all pairs of actors who appeared in each movie, you will have an enormous number of edges and you probably won't be able to handle it. Instead, you should build a bipartite graph with two kinds of vertices: actors and movies. Now each (actor,movie) pair needs only one edge. This greatly reduces the out-degree of each vertex, although it doubles the length of each path between actors.

You will need to download the Internet Movie Database (IMDB) and write a class or two to deal with its file format. Actually you might fare better with this subset of the IMDB, prepared by a CS226 student who did this problem last year (details).

You are not required to do approximate name lookup (although it would be a good idea -- see below). You can require the name to be typed exactly.

By the way, the original "six degrees of separation" study (Milgram, 1967) was not very careful -- it's now being replicated on a global scale, using email, and you can participate!

Possible Improvement: Graphical Map Display

This one is also hard enough that it counts as more than 1 improvement (i.e., extra credit).

Using Java's SWING classes, make a graphical route planning class that calls Streetmap's methods to do the work. You can interpret (longitude, latitude) coordinates as (x,y) coordinates.

You should prompt the user for addresses in a dialog box. Then you should display a map with the shortest route highlighted, as well as printing directions. It would be nice if you build up the map as you search, by displaying edges as you explore them.

Possible Improvement: An Approximate-Lookup Class

The approximate-string-lookup technique in geography.AddressFinder.findStringLike() finds the element associated with a string key k. If k isn't in the dictionary, it finds the "most important" element associated with any key in the dictionary that shares the longest possible prefix with k.

The approximate-number-lookup technique in geography.AddressFinder.Street.findIntersectionNear() is similar. It finds the element associated with a numeric key k. If k isn't in the dictionary, it returns the element associated with the first key in the dictionary that is as close as possible to k.

This kind of technique might be useful elsewhere, for example, for looking up actors' names in Six Degrees of Kevin Bacon. So it really deserves to be "encapsulated" -- put into a class of its own.

Design an extension of the java.util.SortedMap interface that supports approximate lookup. Also write an implementation that extends java.util.TreeMap. These files should go in the map package. Now simplify AddressFinder so that it just uses your new classes to handle the approximate lookup. This should let you shorten or remove several methods in AddressFinder. Check that everything still works as before.

In your design, the user of the class should be able to specify the "similarity" function and the "importance" function, preferably when constructing the map. (In the first case above, the similarity of two String keys was the length of their longest common prefix, and the importance of a Street element was the number of known intersections on that street.) What assumptions, if any, must be made about the similarity function for the design to work?

You might also want to include the ability for the search to fail quickly if no keys in the dictionary are sufficiently similar to the search key. Currently, if the user types in "b", the address finder will do a linear search through all the streets that begin with "b", and return the biggest one. That's a lot of time to waste on a typo.

Possible Improvement: A* Search

Dijkstra's algorithm finds the distance from the start vertex to every vertex. If you know your destination, this is wasteful, so our Dijkstra class knows to stop early as soon it finds the destination.

But it's still wasteful. It would be more accurate to say that it stops as soon as it blunders into the destination. It only uses the destination to decide when to stop, not what order to visit vertices in.

In the opening example of this assignment, why did we have to explore 19386 edges to figure out how to get from JHU to 1 E Fayette St.? Because we were just spreading out from JHU equally in all directions until we happened to find the destination. We spent as much time searching north as south. Dumb move. From the coordinates, it is clear that we want to go south, and we should concentrate our search in that direction.

The A* algorithm uses this insight to do a much quicker job of finding the shortest route to a single destination. (It doesn't necessarily find the shortest route to any other destination, though.) In our example, it only has to visit 247 edges instead of 19386, or about 1.3% as many. That's a big win!

The A* algorithm is a generalization of Dijkstra's that works by changing the definition of priority on the priority queue. The key of a vertex v isn't just its best known distance from vStart, denoted g(v) and stored as v.get(attrDist). Rather, it is f(v) = g(v)+h(v), where h(v) is an estimate of its true best distance to vEnd.

In other words, the key f(v) is an estimate of the total length of the shortest path from vStart through v all the way to vEnd. If that path is estimated to be comparatively short, then v is a promising choice and we should put it near the front of the priority queue so that we explore it soon.

Another way of thinking about it: g(v) is no worse for northern vertices than for southern vertices, which is why Dijkstra's algorithm is slow. But h(v) is increasingly bad as the vertices get farther north (or in general, farther from the destination), so f(v) = g(v)+h(v) will be bad for these vertices, and A* will appropriately put them at the back of the queue.

The A* algorithm works for any graph. The only requirement is that for all v, h(v) >= 0 is an optimistic estimate (i.e., an underestimate) of the true distance from v to vEnd. Suppose v is on the shortest path. If h(v) were a huge overestimate, then v would get mistakenly stuck at the back of the queue forever with a huge key, while we found and incorrectly returned another path that did not use v. But if h(v) is a huge underestimate, we're okay. (Note that if we underestimate all h(v) as 0, we are back to Dijkstra's algorithm, slow but still correct.

A proof of the A* algorithm's correctness will go here shortly.

It should be fairly easy to modify your Dijkstra class to implement A*. When calling getPathTo(vEnd), the user should supply a Heuristic object that is specific to vEnd, and which computes the h function on each vertex v -- the optimistic estimate of the best distance from v to vEnd. A null heuristic could be treated as h(v) = 0, giving Dijkstra's algorithm.

Note that if the user subsequently calls getPathTo on the same Dijkstra instance but with a different heuristic, then anything still on the priority queue must be reprioritized according to the new heuristic. You could either handle this (which would be nice) or throw an exception. But multiple calls with the same heuristic (in particular, null) should still be allowed.

For the Streetmap class, a good heuristic to use is the straight-line distance from v to the goal point. This is optimistic, as required, because no route can be shorter than a straight line. And it's easy to compute with p.distanceTo(pEnd).

Now do a systematic comparison of Dijkstra's algorithm to the A* algorithm for route planning. Find some way of picking a bunch of random routing problems (you could do it by hand, but perhaps you could get random points from the .map file). For each problem, record the distance d between vStart and vEnd, versus the amount of work done by each algorithm. Draw graphs if possible. How does the amount of work performed by Dijkstra's algorithm vary with the distance d? Explain why this might be expected. How about the amount of work performed by A*?

Possible Improvement: Something Else

If you have another idea about how to improve the system, great! Just check with me first that it is interesting enough to get full credit.

Thanks for a Great Semester!

Need I say that you've come a long way from Toilets and Stacks? The ideas in this course -- data structures and design principles -- will serve you well in many years of programming. A special thanks to Mike Goodrich and Roberto Tamassia for writing a textbook that I was actually happy to use.

Jason Eisner - jason@cs.jhu.edu - $Date: 2004/05/11 01:57:31 $

600.226 Data Structures (Spring 2004 - Prof. Eisner)

Homework 10 ... due 11:59pm Tuesday 4 May Sunday 2 May 2004

Summary

Introduction

Files Available and Their Design

Stuff from Past Assignments

The graph Package

The Data

The geography Package

What You Need to Do

Finish the container.HashDecorable class

Implement graph.Digraph

Finish implementing graph.KeyedDigraph

Finish implementing graph.Dijkstra

Finish implementing geography.Streetmap

Finish implementing geography.AddressFinder

Try it out

Improvements

Possible Improvement: Simpler Routes

Possible Improvement: Address Interpolation

Possible Improvement: Intersection Lookup

Possible Improvement: More Efficient Decorations

Possible Improvement: More Efficient Edge Storage

Possible Improvement: Object Graphs

Possible Improvement: Six Degrees of Kevin Bacon

Possible Improvement: Graphical Map Display

Possible Improvement: An Approximate-Lookup Class

Possible Improvement: A* Search

Possible Improvement: Something Else

Thanks for a Great Semester!

The `graph` Package

The `geography` Package

Finish the `container.HashDecorable` class

Implement `graph.Digraph`

Finish implementing `graph.KeyedDigraph`

Finish implementing `graph.Dijkstra`

Finish implementing `geography.Streetmap`

Finish implementing `geography.AddressFinder`