geography
Class AddressFinder

java.lang.Object
  |
  +--geography.AddressFinder

public class AddressFinder
extends java.lang.Object

Maps from a string like "3400 charles street" to a address nearby whose coordinates we actually know, such as 3398 N Charles St. (The known addresses come from StreetSegments added to this AddressFinder.) We try to make a decent guess even if the user types in "3403 charle road." or something like that.

This process of locating addresses is generally known as geocoding. We are not solving the whole geocoding problem here. An address finder for the whole country (or world) would have to locate the city first. We're just building an address finder for a single city.

A really good solution to this might use some kind of approximate string matching null; //

What we do currently is quite a bit simpler than that, but still pretty reasonable. First we look up the street in an alphabetically ordered dictionary of streets we know about. If we can't find it exactly, then we pick the closest one. Then we look up the street number in a numerically ordered dictionary for that street. Again, if we can't find it exactly, we pick the closest one.

We will use red-black trees for both dictionaries, courtesy of the java.util.TreeMap class. These implement the java.util.SortedMap interface (Java's name for an ordered dictionary).

Note that we have a nested data structure -- a big dictionary whose elements are little dictionaries. We use the big dictionary to turn an approximate street name into a Street object, which contains a little dictionary that we use to turn an approximate house number on that street into an address whose location we know.

There are a few tricks we use to make the approximate matching work better:


Nested Class Summary
protected  class AddressFinder.Street
          A class that represents information about a particular street.
 
Field Summary
protected  java.util.SortedMap smStreets
          The "big map" that stores Streets, keyed by simplified versions of their names (see simplify()).
 
Constructor Summary
AddressFinder()
          Constructor: New address finder, contains no information yet.
 
Method Summary
 void addStreetSegment(geography.StreetSegment ss)
          In general, a StreetSegment specifies 2 intersections and up to 4 addresses (up to two per intersection).
protected static java.lang.Object[] closestKeys(java.util.SortedMap sm, java.lang.Object key)
          Look up key in the SortedMap sm.
 geography.Intersection findIntersectionNear(java.lang.String address)
          Finds an intersection near this address.
protected  AddressFinder.Street findOrInsertStreet(geography.StreetSegment ss)
          Given a StreetSegment, look up its Street by name in the database.
protected  AddressFinder.Street findStreetLike(java.lang.String name)
          Find the street whose name is most similar to the given name, breaking ties in favor of streets with more addresses.
protected static java.lang.String keyFromStreetName(java.lang.String streetName)
           
private static int lcpLength(java.lang.String s1, java.lang.String s2)
           
 int numKnownStreets()
          Number of street names that this AddressFinder knows about.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

smStreets

protected java.util.SortedMap smStreets
The "big map" that stores Streets, keyed by simplified versions of their names (see simplify()). The streets are sorted in their "natural" (alphabetical) order using a comparator.

Constructor Detail

AddressFinder

public AddressFinder()
Constructor: New address finder, contains no information yet.

Method Detail

addStreetSegment

public void addStreetSegment(geography.StreetSegment ss)
In general, a StreetSegment specifies 2 intersections and up to 4 addresses (up to two per intersection). Add these addresses to our database of known addresses, associated with those intersections.


numKnownStreets

public int numKnownStreets()
Number of street names that this AddressFinder knows about. Substreets and superstreets are both counted.


findIntersectionNear

public geography.Intersection findIntersectionNear(java.lang.String address)
                                            throws DataFormatException
Finds an intersection near this address. The address comes from the user and may be grungy; we'll do our best. If it looks like a valid address but we just can't find anything remotely close in the database, we may return null.

Throws:
DataFormatException - if the address is too grungy for us to deal with.

findOrInsertStreet

protected AddressFinder.Street findOrInsertStreet(geography.StreetSegment ss)
Given a StreetSegment, look up its Street by name in the database. If there is no such street in the database, create one.


findStreetLike

protected AddressFinder.Street findStreetLike(java.lang.String name)
Find the street whose name is most similar to the given name, breaking ties in favor of streets with more addresses. Return null if there are no streets at all in the database.


keyFromStreetName

protected static java.lang.String keyFromStreetName(java.lang.String streetName)

closestKeys

protected static java.lang.Object[] closestKeys(java.util.SortedMap sm,
                                                java.lang.Object key)
Look up key in the SortedMap sm. Returns an array of the 2 closest keys (< and >=). If there is an exact match, these keys will be equal. If there are no keys < the search key, then the first element will be null, and symmetrically for >= and the second element.


lcpLength

private static int lcpLength(java.lang.String s1,
                             java.lang.String s2)