Goal of this homework: To implement graphs and Dijkstra's algorithm in the context of a larger, "more real-world" system. Get used to reading and modifying other people's code. Get used to using the built-in Java classes. Try something open-ended where you have to carry out the design yourself.
Collaboration policy for this homework: You must follow the
CS Department Academic
Integrity Code, except that for this homework, you may
optionally work with one other student in the class. You must name
your partner (if any) at the top of your
Readme file, and
both of you will receive the same grade for the assignment.
How to hand in your work: instructions, upload page
What to hand in (all packaged up in a zip or tar file):
Please do NOT include
any other large data files. We don't have enough space!
baltimore.map is your only large data file,
then skip this paragraph. We already have it; no need to submit
it. But perhaps you did 6 degrees of Kevin Bacon, or maybe you
made some improvement that is best demonstrated on a .map file of
your home town. If you need to submit these files in order for
us to grade your assignment, please make the large files available
by http:, ftp:, or an email attachment to
README where to find them. You must still
submit the entire assignment except the data files by the
usual submission system. That is, submit a zipfile containing
README and all code etc., as descibed below.
Your commented code:
.javafiles, written either by us or by you, that are necessary to make your route planner run correctly.
.javafiles needed for your improvements. These may be improvements to the route planner, or separate from it.
doc subdirectory containing HTML documentation
that you have generated with the command
-linksource -d doc */*.java. The graders will look at this
documentation, so you should provide javadoc comments for all of
your methods, including private ones.
Readme file that contains your answers to
miscellaneous questions on the homework, and in particular
describes your improvements. The top of your
should state the name of your partner (if any).
Acknowledgments: Thanks to Jim Gillispie, of the Sheridan Libraries' Maps Collection, for help finding and interpreting the data.
Since computer-generated driving directions hit the web several years ago, they have been one of the more obvious and visible applications of graph algorithms. If you haven't used one of these systems lately, experiment a bit at one or two of these sites before starting the assignment. Note that the routes are not always great -- the underlying map data can be wrong, and even when they're right, they don't always distinguish between a cow path and a thruway.
I've written some support code to help you handle messy geographical data like map files and address strings. You'll write the clean underlying part -- graph data structures and Dijkstra's algorithm.
The basic system that you will write will behave something like this. Note that it tells you to drive the wrong way down Charles St, but you're only young once.
$ java geography/Streetmap ../data/baltimore.map Reading map from ../data/baltimore.map (may take a moment) ................. Built graph with 84740 edges and 26988 vertices. Address finder knows 3769 street names. Welcome to the route planner! Enter start address: 3500 charlas st Found nearby intersection at 3499 N Charles St (near (-76.61757, 39.329265)) Enter destination address: fayette Found nearby intersection at 1 E Fayette St (near (-76.615122, 39.290436)) Explored 19386 edges from 6029 vertices; 7476 relaxations Start on N Charles St going south and continue 0.33 miles. Turn right onto E 30th St going west and continue 0.01 miles. Turn left onto N Charles St going south and continue 1.79 miles. Turn right onto W Madison St going west and continue 0.01 miles. Turn left onto Washington Pl going south and continue 0.15 miles. Bear left onto W Centre St going southeast and continue 0.01 miles. Turn right onto N Charles St going south and continue 0.41 miles. Total driving distance: 2.71 miles (Straight-line distance: 2.69 miles.)
I'm only supplying a Baltimore city map for you to play with. However, you can also make your own US maps from data that are freely available on the web. (See below.)
Once you have got the basic system working, you are asked to make two improvements of your choice, as described below.
You can download everything you need for this assignment in one convenient zipfile, or browse the files on the web. You can also read the Javadocs for the supplied code.
You'll need to use some data structures from your past assignments, notably priority queues with locators (for Dijkstra's algorithm). As in HW7 and HW8, you can find working packages for that here. You should use the packages rather than your own HW6 code.
You will finish writing this
graph package. It will include
directed graphs (
Dijkstra's shortest-path algorithm (
We've provided the necessary interfaces and some class code to get you started. I have tried to simplify and improve the interfaces in the textbook, although this is partly just a matter of taste. Read the javadoc comments in the code!
We've also provided test methods in
that let you check your implementation of Dijkstra's algorithm against
the airport examples illustrated in the book.
One important feature that is not in the textbook is the notion of
a keyed graph. Such a graph has names for some or all of its
vertices. The names serve as keys, so you can look up a vertex by its
name. For example, you might want to fly from
"LAX" -- and that means you need to find the vertices
corresponding to those names. This requires a hash map or something
You've actually already built keyed data structures. In the
election classes of Homeworks 7 and 8, the entries on the priority queue had names like
"Toomey" and "Born". So when saw the candidate's name on a ballot,
you could find her entry. That is really the idea of a
keyed adaptable priority queue, but you didn't write a
KeyedAdaptablePriorityQueue interface and implement it.
Rather, you used two objects -- an adaptable priority queue plus a
separate hash dictionary that maintained the mapping from keys
This time let's notice the abstract idea of keying and handle it in
a general way. You'll implement a
interface that extends
Digraph, so that
keying is an official feature that happens inside the class and
can be reused for multiple problems. In particular, once you've
tested your keyed graph implementation with airports, you will be able
to apply it to route planning - where the keys are not strings but
geographic locations of the form (longitude, latitude).
Section 12.3.2 of the textbook discusses decorating graph vertices
and edges, which is useful for many algorithms, including Dijkstra's.
In general they talk about decorating positions. And even more
generally we could decorate any object at all, so support for
decoration belongs in the
(If we'd had decorable objects before, then instead
(for adaptable priority queues) and then
TwinnedLocationAwareEntry (for priority deques), we could
have simply decorated instances of
MyEntry with extra
positions as necessary. What are the pros and cons of this
Here are some extra
container classes for
that. We've formalized the notion slightly more carefully for you
by providing a
interface that can be implemented in a general way. Your
implementation can then be extended into specific kinds of decorable
objects, like a decorable
Vertex. Moreover, we have
required the attribute names to be objects of a new class
rather than arbitrary objects. The javadoc comments explain why this
is a good idea.
You will download the data. Let's look at a few
lines from the middle of
-76618945 +39333311 -76620362 +39334208 4 98 W University Pky -76619746 +39335708 -76620493 +39335678 7 4 99 98 W 39th St -76618169 +39335774 -76619746 +39335708 1 5 W 39th St -76617992 +39332692 -76618032 +39333710 3600 3601 3698 3699 N Charles St -76617992 +39332692 -76618945 +39333311 101 109 W University Pky -76616286 +39331624 -76616273 +39331473 100 7 102 103 E University Pky -76617570 +39329265 -76616273 +39331473 3400 3401 3598 3599 Greenway
Each line represents a segment of a street, generally from one intersection to the next. There are 11 tab-separated columns in each line. The long numbers are longitudes and latitudes, and the short numbers are house numbers.
For example, the 4th line above says that you can travel from (76.617992W, 39.332692N) to (76.618032W, 39.333710N), and that if you do so, you will be going along a block of N. Charles St. whose numbers run from 3600 and 3601 (on your left and right, respectively) to 3698 and 3699. Note that some house numbers are missing: that's because some segments of University Parkway only have buildings on one side.
We have to assume that all street segments are two-way -- i.e., that you can also travel in the reverse direction. The assumption is occasionally wrong (as above), but most streets are two-way. So each line will correspond to a pair of directed edges in our graph.
For the interested: The .map file is extracted from the US
Census Bureau's TIGER/Line files for
2002 (which date back to the 2000 census). If you want to use your
route planner to tool around your hometown, or drive cross-country,
you are welcome to make other maps of the US and its territories
and commonwealths. Here's how. First look up state and
county codes for each county you want on your map. For example,
Baltimore city, Maryland is state 24, county 510. Now download each
corresponding file (in this case,
and extract just its
.rt1 file (in this case,
tgr24510.rt1). The rest of the data is interesting but
beyond the scope of this assignment. In fact, we won't even use the
.rt1 files. To boil them down into this
.map format, use this Perl script to pull out just the
relevant columns and put tabs between them.
To apply your
graph package to the above data, you will
need classes for dealing with the
.map file and with the
addresses that the user types in. Here's what's in the
I've left a few bits for you to write, as indicated below.
Read the javadoc comments in the code!
StreetSegment corresponds to a line of the
.map file, i.e., a city block. It has meaningful
fields for the different columns. Its constructor takes a string
and is able to parse that string appropriately. (If not,
Point is a point on the surface of the earth -- a
(longitude, latitude) pair. There are methods
directionTo() for finding
how to get to another point. Those methods know about the size and curvature of
the earth. There are also
equals(), so that you can hash
Intersection is what you currently get when
you look up an address. It contains a
Point and a
string that describes that point. In the example run above, looking
3500 charlas st found the nearby intersection
3499 N Charles St (near (-76.61757, 39.329265)).
AddressFinder is the magic object that lets
you look up an address. When we read the line
-76617992 +39332692 -76618032 +39333710 3600 3601 3698 3699 N Charles Stfrom the .map file, we insert the resulting
StreetSegmentinto our address finder. The address finder says "Gee! I now know 4 new addresses! For example, there is a 3600 N Charles St near the
Point(76.617992W, +39.332692N). I'll remember that later in case someone looks up that address or a similar address. Similarly, I know roughly where 3601, 3698, and 3699 N Charles St. are."
You may have noticed that in the example run above, the user
3500 charlas st -- an address that was not in
to make a reasonable guess about where it was. In general, this
process of locating addresses is called geocoding.
The geocoding strategy currently used by the
AddressFinder class is described in
It involves some ordered dictionaries (
implemented with red-black trees (
You knew those had to find their way in somehow.
Street is an inner class that is defined
and used only inside
AddressFinder. Its full name is
AddressFinder.Street. If you're not sure you understand Java's
inner classes, read the comments for
Street and for
graph.Path whose edges are streets. Your
graph.Dijkstra class will return the best path, but if
you printed that path directly, it wouldn't come out looking like
directions. So you'll turn it into a
Route, which has
a very specialized
knows, for example, that a turn of between 100 and 130 degrees
should be described as "Turn sharply left", and that a direction of
about 45 degrees is "northwest". It also knows how to combine
segments of the same street, so that the directions say "go 2 miles"
instead of reporting every block separately.
Streetmap is the class that puts it all together.
As its constructor reads
StreetSegments from a
.map file, it adds them to an
AddressFinder and also to a
graph.KeyedDigraph. Each directed edge of the digraph
holds a street name as its element. Each vertex of the digraph
Point as its element, and is also keyed by that
Putting it together, a
Streetmap can find
a route between two address strings more or less as follows:
AddressFinderto turn each address
graph.KeyedDigraph, by using the
Pointas a key.
graph.Dijkstraclass to find the shortest
graph.Pathbetween the points.
Routethat can be printed nicely
main method for
run a website or anything, but
it is more useful than most test methods. It prompts the user for
two addresses and prints out the resulting route, as shown
I think you will be impressed by how fast this is, even when it has to explore most of Baltimore to find the destination. This is partly because computers are speedy. But when you realize that there are exponentially many possible routes from point A to point B, you really have to give a lot of credit to Dijkstra, for making it possible to find the shortest one in O(m log n) time. Especially when you realize that he did it in 1959 before anyone had ever heard of a data structure.
There is relatively new little code to write. However, to fill it in, you will have to study much of the provided code, particularly the method interfaces (which are commented).
That is one of the goals of this final assignment: to deeply understand how the data structures and techniques we've used over the semester can work together in a "real" system design.
This is trivial - a mere warmup.
is basically just a wrapper around a small dictionary. The main
reason to ask you to do it is so that you have a chance to try out
Java's built-in dictionaries, which are called Maps (the
java.util.Map interface) because they "map" each key to a
value. You probably want to use a small
java.util.HashMap. Refer as necessary to their
If you prefer, you can instead use your own dictionary class from Homework 7.
First study the
Digraph (directed graph)
interface. It is about as simple as possible, because it is not very
powerful. By contrast, the book tries to give you the only
Digraph data structures you'll ever
need (sections 12.1.1 and 12.4) -- at the cost of a complicated
interface and a somewhat less efficient implementation.
(The book's authors allow various kinds of modifications to the graph, whereas we only need to build it. They support iteration over a vertex's in-edges as well as its out-edges. They store lists of all vertices and edges. Finally, they go to some trouble to allow the same graph to contain both directed and undirected edges, which is rarely useful.)
We provide basic, obvious implementations of vertices and edges:
Also simple, eh? A vertex stores an element, and an edge stores an
element plus two endpoint vertices. Also, they both support
While it's not so important for this assignment, we are careful
not to say that all vertices and edges must be objects of these
classes or their extensions. Extending these classes would make them
store more information. But for some kinds of graphs, the
vertices or edges might be able to store
less information: the elements or endpoints might be computed
on demand, or looked up in an auxiliary data structure, rather than
stored in the object itself. So we say that
StoredDirectedEdge are merely implementations
of abstract interfaces,
For generality, all your graph methods should manipulate vertices and
edges where possible through these abstract interfaces.
Your job is to build an adjacency-list implementation of directed
graphs. This involves writing two classes. You want to implement
Digraph as a class
To do so, you will need to extend
StoredVertex into a
that stores a list of its out-edges. We have given you skeletons for
Now study the
interface, which extends
Digraph. The basic idea was
discussed earlier in this assignment.
You will implement
KeyedDigraph as a wrapper class.
That is, it will take an existing graph as an argument and turn it
into a keyed graph. This doesn't care what the implementation of the
existing graph is, so it is more flexible than just extending
AdjListDigraph to implement
Instead of having a keyed
AdjListDigraph, you will
have the ability to build a keyed any-kind-of-graph.
Most of the wrapper class is already available
Just edit the parts that say
FILL THIS IN, and make sure
the test methods work.
If you prefer, you can use your dictionary classes
from Homework 7 instead of Java's
Answer in your
KeyedDigraphis manipulated only through the
KeyedDigraphinterface, then it has the following invariant: every key of the keyed digraph refers to at most one vertex.
You'll implement Dijkstra's algorithm within this
Dijkstra class. Much
of the rest of the class has already been implemented for you, in
order to lay out the design, which differs from the textbook's design
in several ways. Read the comments in the code.
The core algorithm is the main part you have to write. You just
need to edit the parts marked
FILL THIS IN. You've
already got a crucial portion -- an implementation of
pqueue.AdaptablePriorityQueue from Homework 6.
The interface is especially different from the one in the textbook.
It is more convenient for the user in several ways. The user
specifies a weight function with a
Weighter object, and
gets back a shortest
Path object. You will
need to look at those small classes. Also, the user can specify a
target vertex. The algorithm will stop when the target vertex is
found, but can continue again later if the user is interested in
distances from the same start vertex to other vertices.
Dijkstraclass contains test methods that try the toy examples from the book, so you can easily find out whether you're doing the right thing.
Answer in your
Dijkstraclass to implement the
java.util.Iteratorinterface. An instance of this iterator would iterate over graph vertices in order of increasing distance from the start vertex. Describe how you could add
hasNext()methods to your
Now download the
package, if you haven't already. Read through the mostly-finished
To hook it up to your
Dijkstra class, you will just need
to edit the few places that say
FILL THIS IN. This is
easy once you understand the design of the geography package. It's
basically a matter of figuring out which existing methods to call.
Read through the
class. This is finished except for the "interesting" calls that look
things up in the red-black trees. As usual, you should
THEM IN. You will just have to figure out how to use Java's
interface, which is somewhat different from the book's ordered
Answer in your
The implementation of
findStreetLike() obtains an
iterator over a range of keys in the red-black tree, e.g., all keys
CHARL <= key < CHARM. Roughly, how would
you implement the
next() method of such an iterator?
Your technique shouldn't need to rely on the red-black properties; it
should work for any binary search tree.
You probably used the
How can this method be implemented in a simple way that requires
O(1) space and O(1) runtime?
Extra credit: The comments in
findStreetLike() claim that if the string k falls
right between keys x and y in the sorted map, then
either x or y (or both) shares a maximal prefix with
k. That is, no other key z in the sorted map shares a
longer prefix with k. Prove it. (Hint: Proof by
You are required to make an improvement of your choice from the following list. The route planner is far from perfect, and it's appropriate to end the course with a more open-ended task. For extra credit, you can make multiple improvements.
As in Homework 8, a substantial part of your grade will depend on the clarity and elegance of your solution.
Answer in your
Describe how you carried out the improvement. Give a clear overview of your design, mentioning anything that was interesting. (You may point the graders to high-level comments in the code for this.)
Show some interesting examples of input and output -- for example, illustrating the routes you get before and after your improvement.
Tell the graders exactly which files and methods to look at to see what you've done.
You'll notice that some of the routes you get are pretty complicated. In an effort to minimize the total distance, Dijkstra's algorithm will take all kinds of little shortcuts that would make the route hard to remember or follow for a real driver.
Come up with some techniques for making the routes shorter or simpler, and implement them. The basic technique is to change the notion of the weight of a path. Instead of just considering total distance, you should use some kind of combined criterion. You might say that paths are heavier (worse) if they (1) are longer, and (2) spend a lot of time on minor roads, and (3) have a lot of turns, and (4) have a lot of street name changes.
You are already doing (1). (2) is pretty easy if you have some way
of defining a "minor" road. (Don't use
AddressFinder.Street.numKnownAddresses(), since then Jones
Falls Expressway and I-70 will come out as minor.) (3) and (4) are
harder because now the cost is associated not just with an edge, but
with a transition from one edge to the next. To handle these, you
will either have to modify the street graph in some systematic way
(how?), or modify Dijkstra's algorithm to act as if you have done
You will also need some way of combining these criteria into one definition of edge heaviness. Just make something up that seems to work well, like a linear combination of different criteria. Remember that Dijkstra's algorithm does not allow negative edge weights.
There are various ways to improve the geocoding done by
AddressFinder. For example, the
AddressFinder currently finds only intersections. But
for addresses that are in the middle of the block, it might be
helpful to return locations that are not
To see why, try planning a route from 3360 Charles to 3370 St. Paul. The system will round to the nearest intersection and send you directly across 34th St. It would be better if it sent you a little way up Charles, across 34th, and down St. Paul.
AddressFinder to do address
interpolation in order to return approximate locations that
are not Intersections.
Streetmap so that it can find the shortest
route between non-intersections. You want to temporarily add
two vertices to the graph, for the start and end locations, and then
plan your route. When you're done you should remove those vertices
again to prevent the graph from getting cluttered over time.
(Alternatively, maybe you can think of a strategy that gets the same
answer without actually modifying the graph.)
You can try looking up addresses at this geocoding website to see what the competition does. And here's an awesome feat of geocoding.
Most of the online mapping applications allow you to specify intersections in the form "University and St. Paul" or "University Pkwy & St. Paul Street".
AddressFinder so that it can do this as
efficiently as it currently finds an intersection near a house number.
The user should be able to enter either kind of address.
What if the two streets don't intersect at all? If you weren't
sure which streets the user meant, you should consider alternative
interpretations. If I say "Highfield and Norwood," and E Highfield Rd
doesn't intersect with Norwood Ave, the system can try Norwood Rd,
which is smaller but does intersect. You can use any reasonable
strategy for trying the other interpretations. If you still can't
find an intersection, return
null as documented.
Hashes are overkill for storing attribute-value pairs. An object is likely to have only 0, 1, or 2 attributes most of the time. At that size, linked lists are plenty fast. In fact they are faster: no need to compute the hash code. And they take up less space: nothing is wasted on empty buckets, or on auxiliary variables to keep track of the size and capacity.
Design and implement a hybrid implementation of
SlimDecorable, that uses a
linked list at small sizes (< 10 attributes) and starts using a hash
table at larger sizes. It should remain extremely space-efficient at
small sizes. A
Decorable object that currently has no
attributes should only require 4 bytes (1 pointer) of storage more
than an equivalent non-decorable object. Each additional attribute-value
pair (up to about 10) should only require space for one new
Comment your design well in
a test method that checks that everything goes well when the decorable
object switches to using a hash table.
Change your graph class to use
HashDecorable, and make sure your route planner
continues to work exactly as before.
Our graph implementation is a bit wasteful of storage, which might
be an issue for really big graphs. Every out-edge stored at a
DirectedAdjListVertex, v, contains a pointer to
v itself as well as to the other endpoint w. Figure out
a way to avoid storing the pointer to v.
Using your idea, construct a new implementation of
Digraph that is similar to
However, it should not store full edges
StoredDirectedEdge) -- it should only store "out edges"
destination() but not
The total memory used should be less.
Call your new class
HalfAdjListDigraph. Of course, it
will still have to implement
Digraph, so the user should
not have to know or care that you are not storing the full edge. For
need to get at the origin of an edge. They should still be able to
Make sure that
still work when you simply substitute
AdjListDigraph. Notice that you didn't have to make
a keyed version: the wrapper class
the new implementation as well as the old one.
When explaining your design in your
the implications for speed. Why will you be a bit slower?
Fortunately for speed, many algorithms mostly care about
destination() and not
example, graph search algorithms (DFS, BFS, Dijkstra's) spend most of
their time going forward on directed edges. It is very rare that they
have to reverse direction and look at
origin(). Use this
observation to come up with a clean way to restore such algorithms to
almost their original speed, while still using the more efficient
storage scheme. (Hint: Add to the
Dijkstra to use this technique so
that it works well with both
HalfAdjListDigraph. Make sure that it and
Streetmap get the same answers as before.
Finally, in your
Readme, discuss the case of a pair of
opposing directed edges. They store the same two vertices; assume
that they are also required to store the same element. (Really you
can think of them as jointly constituting a single undirected edge.)
Can you save any further memory in this case? If not, explain
why not. If so, sketch a design.
In a keyed digraph, one can use any objects as keys. Since vertices are in 1-to-1 correspondence with keys, one could conceptually think of a key object as being the vertex of the graph, instead of referring to the vertex.
This is powerful for two reasons. First, a given object (such as
the string "BWI") might appear as a vertex in many different graphs.
Second, two separately constructed objects that are
are still considered to represent the same vertex, with the same
incident edges. Neither of these is generally possible for a
Vertex as we've defined it, such as a
In fact, many mathematicians do think of graphs this way. The
mathematical definition of a graph is a set of arbitrary
objects, V, and a set of edges between the objects, E. The
objects are called "vertices" but there is no requirement that they
come from some special
Vertex class. They could
be airport names, or points on the earth.
Redesign the directed graph interface so that it directly reflects
this vision. Your interface should be similar to Digraph, but should
not mention vertices or keys at all. It should never be necessary to
go from an object to a vertex (keyed lookup) or a vertex back to an
element() method). Instead, the user should
be able to think of an edge as connecting any two objects in the
graph, such as strings.
Carry out your design as a new package
DirectedObjgraph and an implementation of
that interface. Write some test methods that show off your interface,
including the ability to decorate objects with respect to a graph (do
this by decorating the underlying vertex).
Your implementation could just be a wrapper around
graph.KeyedDigraph. Then there are
objects there; the user just doesn't see them. Alternatively, it
isn't hard to implement the idea without using
objects at all; this saves space.
Now think about Dijkstra's algorithm (you don't have to reimplement it) and
figure out why this pretty design hurts efficiency. Explain in your
This one is hard enough that it counts as more than 1 improvement (i.e., extra credit). I think it's not that hard, though.
movies package that solves the "Six Degrees of
Kevin Bacon" problem.
As we discussed in class, if you actually link all pairs of actors who appeared in each movie, you will have an enormous number of edges and you probably won't be able to handle it. Instead, you should build a bipartite graph with two kinds of vertices: actors and movies. Now each (actor,movie) pair needs only one edge. This greatly reduces the out-degree of each vertex, although it doubles the length of each path between actors.
You will need to download the Internet Movie Database (IMDB) and write a class or two to deal with its file format. Actually you might fare better with this subset of the IMDB, prepared by a CS226 student who did this problem last year (details).
You are not required to do approximate name lookup (although it would be a good idea -- see below). You can require the name to be typed exactly.
By the way, the original "six degrees of separation" study (Milgram, 1967) was not very careful -- it's now being replicated on a global scale, using email, and you can participate!
This one is also hard enough that it counts as more than 1 improvement (i.e., extra credit).
Using Java's SWING classes, make a graphical route planning class
Streetmap's methods to do the work. You can
interpret (longitude, latitude) coordinates as (x,y) coordinates.
You should prompt the user for addresses in a dialog box. Then you should display a map with the shortest route highlighted, as well as printing directions. It would be nice if you build up the map as you search, by displaying edges as you explore them.
The approximate-string-lookup technique in
geography.AddressFinder.findStringLike() finds the element
associated with a string key
k. If k isn't in the dictionary, it finds the "most
important" element associated with any key in the dictionary that shares
the longest possible prefix with k.
The approximate-number-lookup technique in
similar. It finds the element associated with a numeric key k.
If k isn't in the dictionary, it returns the element
associated with the first key in the dictionary that is as close
as possible to k.
This kind of technique might be useful elsewhere, for example, for looking up actors' names in Six Degrees of Kevin Bacon. So it really deserves to be "encapsulated" -- put into a class of its own.
Design an extension of the
interface that supports approximate lookup. Also write an
implementation that extends
files should go in the
map package. Now simplify
AddressFinder so that it just uses your new classes to
handle the approximate lookup. This should let you shorten or remove
several methods in
AddressFinder. Check that everything
still works as before.
In your design, the user of the class should be able to specify the
"similarity" function and the "importance" function, preferably when
constructing the map. (In the first case above, the similarity of two
String keys was the length of their longest common
prefix, and the importance of a
Street element was the
number of known intersections on that street.) What assumptions, if
any, must be made about the similarity function for the design to
You might also want to include the ability for the search to fail quickly if no keys in the dictionary are sufficiently similar to the search key. Currently, if the user types in "b", the address finder will do a linear search through all the streets that begin with "b", and return the biggest one. That's a lot of time to waste on a typo.
Dijkstra's algorithm finds the distance from the start vertex to
every vertex. If you know your destination, this is wasteful,
Dijkstra class knows to stop early as soon
it finds the destination.
But it's still wasteful. It would be more accurate to say that it stops as soon as it blunders into the destination. It only uses the destination to decide when to stop, not what order to visit vertices in.
In the opening example of this assignment, why did we have to explore 19386 edges to figure out how to get from JHU to 1 E Fayette St.? Because we were just spreading out from JHU equally in all directions until we happened to find the destination. We spent as much time searching north as south. Dumb move. From the coordinates, it is clear that we want to go south, and we should concentrate our search in that direction.
The A* algorithm uses this insight to do a much quicker job of finding the shortest route to a single destination. (It doesn't necessarily find the shortest route to any other destination, though.) In our example, it only has to visit 247 edges instead of 19386, or about 1.3% as many. That's a big win!
The A* algorithm is a generalization of Dijkstra's that works by
changing the definition of priority on the priority queue. The key of
a vertex v isn't just its best known distance from
vStart, denoted g(v) and stored as
v.get(attrDist). Rather, it is
f(v) = g(v)+h(v), where h(v) is an estimate of its true
best distance to
In other words, the key f(v) is an estimate of the total
length of the shortest path from
v all the way to
vEnd. If that path is
estimated to be comparatively short, then
v is a
promising choice and we should put it near the front of the priority
queue so that we explore it soon.
Another way of thinking about it: g(v) is no worse for northern vertices than for southern vertices, which is why Dijkstra's algorithm is slow. But h(v) is increasingly bad as the vertices get farther north (or in general, farther from the destination), so f(v) = g(v)+h(v) will be bad for these vertices, and A* will appropriately put them at the back of the queue.
The A* algorithm works for any graph. The only requirement is that
for all v,
h(v) >= 0 is an optimistic estimate
(i.e., an underestimate) of the true distance from v to
vEnd. Suppose v is on the shortest path. If
h(v) were a huge overestimate, then
v would get mistakenly stuck at the back of the queue forever
with a huge key, while we found and incorrectly returned another path
that did not use v. But if h(v) is a huge
underestimate, we're okay. (Note that if we underestimate all
h(v) as 0, we are back to Dijkstra's algorithm, slow but still
A proof of the A* algorithm's correctness will go here shortly.
It should be fairly easy to modify your
to implement A*. When calling
getPathTo(vEnd), the user
should supply a
Heuristic object that is specific to
vEnd, and which computes the h function on each
vertex v -- the optimistic estimate of the best distance from
vEnd. A null heuristic could be treated
as h(v) = 0, giving Dijkstra's algorithm.
Note that if the user subsequently calls
Dijkstra instance but with a different
heuristic, then anything still on the priority queue must be
reprioritized according to the new heuristic. You could either
handle this (which would be nice) or throw an exception. But
multiple calls with the same heuristic (in particular, null)
should still be allowed.
Streetmap class, a good heuristic to use is
the straight-line distance from v to the goal point. This is
optimistic, as required, because no route can be shorter than a
straight line. And it's easy to compute with
Now do a systematic comparison of Dijkstra's algorithm to the A*
algorithm for route planning. Find some way of picking a bunch of
random routing problems (you could do it by hand, but perhaps you
could get random points from the
.map file). For each
problem, record the distance
vEnd, versus the
amount of work done by each algorithm. Draw graphs if possible. How
does the amount of work performed by Dijkstra's algorithm vary with
the distance d? Explain why this might be expected. How about
the amount of work performed by A*?
If you have another idea about how to improve the system, great! Just check with me first that it is interesting enough to get full credit.
Need I say that you've come a long way from
The ideas in this course -- data structures and design principles --
will serve you well in many years of programming. A special thanks to
Mike Goodrich and Roberto Tamassia for writing a textbook that I
was actually happy to use.
Jason Eisner -
firstname.lastname@example.org - $Date: 2004/05/11 01:57:31 $