Goal of this homework: To implement graphs and Dijkstra's algorithm in the context of a larger, "more real-world" system. Get used to reading and modifying other people's code. Get used to using the built-in Java classes. Try something open-ended where you have to carry out the design yourself.
Collaboration policy for this homework: You must follow the
CS Department Academic
Integrity Code, except that for this homework, you may
optionally work with one other student in the class. You must name
your partner (if any) at the top of your Readme
file, and
both of you will receive the same grade for the assignment.
How to hand in your work: instructions, upload page
What to hand in (all packaged up in a zip or tar file):
Please do NOT include baltimore.map
or
any other large data files. We don't have enough space!
If baltimore.map
is your only large data file,
then skip this paragraph. We already have it; no need to submit
it. But perhaps you did 6 degrees of Kevin Bacon, or maybe you
made some improvement that is best demonstrated on a .map file of
your home town. If you need to submit these files in order for
us to grade your assignment, please make the large files available
by http:, ftp:, or an email attachment to cs226-staff
. Explain
in your README
where to find them. You must still
submit the entire assignment except the data files by the
usual submission system. That is, submit a zipfile containing
the README
and all code etc., as descibed below.
Your commented code:
.java
files, written either by us or by
you, that are necessary to make your route planner run correctly.
.java
files needed for your improvements.
These may be improvements to the route planner, or separate from it.
A doc
subdirectory containing HTML documentation
that you have generated with the command javadoc -private
-linksource -d doc */*.java
. The graders will look at this
documentation, so you should provide javadoc comments for all of
your methods, including private ones.
A Readme
file that contains your answers to
miscellaneous questions on the homework, and in particular
describes your improvements. The top of your Readme
should state the name of your partner (if any).
Acknowledgments: Thanks to Jim Gillispie, of the Sheridan Libraries' Maps Collection, for help finding and interpreting the data.
Since computer-generated driving directions hit the web several years ago, they have been one of the more obvious and visible applications of graph algorithms. If you haven't used one of these systems lately, experiment a bit at one or two of these sites before starting the assignment. Note that the routes are not always great -- the underlying map data can be wrong, and even when they're right, they don't always distinguish between a cow path and a thruway.
I've written some support code to help you handle messy geographical data like map files and address strings. You'll write the clean underlying part -- graph data structures and Dijkstra's algorithm.
The basic system that you will write will behave something like this. Note that it tells you to drive the wrong way down Charles St, but you're only young once.
$ java geography/Streetmap ../data/baltimore.map Reading map from ../data/baltimore.map (may take a moment) ................. Built graph with 84740 edges and 26988 vertices. Address finder knows 3769 street names. Welcome to the route planner! Enter start address: 3500 charlas st Found nearby intersection at 3499 N Charles St (near (-76.61757, 39.329265)) Enter destination address: fayette Found nearby intersection at 1 E Fayette St (near (-76.615122, 39.290436)) Explored 19386 edges from 6029 vertices; 7476 relaxations Start on N Charles St going south and continue 0.33 miles. Turn right onto E 30th St going west and continue 0.01 miles. Turn left onto N Charles St going south and continue 1.79 miles. Turn right onto W Madison St going west and continue 0.01 miles. Turn left onto Washington Pl going south and continue 0.15 miles. Bear left onto W Centre St going southeast and continue 0.01 miles. Turn right onto N Charles St going south and continue 0.41 miles. Total driving distance: 2.71 miles (Straight-line distance: 2.69 miles.) |
I'm only supplying a Baltimore city map for you to play with. However, you can also make your own US maps from data that are freely available on the web. (See below.)
Once you have got the basic system working, you are asked to make two improvements of your choice, as described below.
You can download everything you need for this assignment in one convenient zipfile, or browse the files on the web. You can also read the Javadocs for the supplied code.
You'll need to use some data structures from your past assignments, notably priority queues with locators (for Dijkstra's algorithm). As in HW7 and HW8, you can find working packages for that here. You should use the packages rather than your own HW6 code.
graph
PackageYou will finish writing this graph
package. It will include
directed graphs (Digraph
) and
Dijkstra's shortest-path algorithm (Dijkstra
).
We've provided the necessary interfaces and some class code to get you started. I have tried to simplify and improve the interfaces in the textbook, although this is partly just a matter of taste. Read the javadoc comments in the code!
We've also provided test methods in Dijkstra.java
and
HashKeyedDigraph.java
that let you check your implementation of Dijkstra's algorithm against
the airport examples illustrated in the book.
One important feature that is not in the textbook is the notion of
a keyed graph. Such a graph has names for some or all of its
vertices. The names serve as keys, so you can look up a vertex by its
name. For example, you might want to fly from "BWI"
to
"LAX"
-- and that means you need to find the vertices
corresponding to those names. This requires a hash map or something
like that.
You've actually already built keyed data structures. In the
election classes of Homeworks 7 and 8, the entries on the priority queue had names like
"Toomey" and "Born". So when saw the candidate's name on a ballot,
you could find her entry. That is really the idea of a
keyed adaptable priority queue, but you didn't write a
KeyedAdaptablePriorityQueue
interface and implement it.
Rather, you used two objects -- an adaptable priority queue plus a
separate hash dictionary that maintained the mapping from keys
to entries.
This time let's notice the abstract idea of keying and handle it in
a general way. You'll implement a KeyedDigraph
interface that extends Digraph
, so that
keying is an official feature that happens inside the class and
can be reused for multiple problems. In particular, once you've
tested your keyed graph implementation with airports, you will be able
to apply it to route planning - where the keys are not strings but
geographic locations of the form (longitude, latitude).
Section 12.3.2 of the textbook discusses decorating graph vertices
and edges, which is useful for many algorithms, including Dijkstra's.
In general they talk about decorating positions. And even more
generally we could decorate any object at all, so support for
decoration belongs in the container
package.
(If we'd had decorable objects before, then instead
of extending MyEntry
into LocationAwareEntry
(for adaptable priority queues) and then
TwinnedLocationAwareEntry
(for priority deques), we could
have simply decorated instances of MyEntry
with extra
positions as necessary. What are the pros and cons of this
strategy?)
Here are some extra container
classes for
that. We've formalized the notion slightly more carefully for you
by providing a container.Decorable
interface that can be implemented in a general way. Your
implementation can then be extended into specific kinds of decorable
objects, like a decorable Vertex
. Moreover, we have
required the attribute names to be objects of a new class
container.Attribute
,
rather than arbitrary objects. The javadoc comments explain why this
is a good idea.
You will download the data. Let's look at a few
lines from the middle of baltimore.map
:
-76618945 +39333311 -76620362 +39334208 4 98 W University Pky -76619746 +39335708 -76620493 +39335678 7 4 99 98 W 39th St -76618169 +39335774 -76619746 +39335708 1 5 W 39th St -76617992 +39332692 -76618032 +39333710 3600 3601 3698 3699 N Charles St -76617992 +39332692 -76618945 +39333311 101 109 W University Pky -76616286 +39331624 -76616273 +39331473 100 7 102 103 E University Pky -76617570 +39329265 -76616273 +39331473 3400 3401 3598 3599 Greenway |
Each line represents a segment of a street, generally from one intersection to the next. There are 11 tab-separated columns in each line. The long numbers are longitudes and latitudes, and the short numbers are house numbers.
For example, the 4th line above says that you can travel from (76.617992W, 39.332692N) to (76.618032W, 39.333710N), and that if you do so, you will be going along a block of N. Charles St. whose numbers run from 3600 and 3601 (on your left and right, respectively) to 3698 and 3699. Note that some house numbers are missing: that's because some segments of University Parkway only have buildings on one side.
We have to assume that all street segments are two-way -- i.e., that you can also travel in the reverse direction. The assumption is occasionally wrong (as above), but most streets are two-way. So each line will correspond to a pair of directed edges in our graph.
For the interested: The .map file is extracted from the US
Census Bureau's TIGER/Line files for
2002 (which date back to the 2000 census). If you want to use your
route planner to tool around your hometown, or drive cross-country,
you are welcome to make other maps of the US and its territories
and commonwealths. Here's how. First look up state and
county codes for each county you want on your map. For example,
Baltimore city, Maryland is state 24, county 510. Now download each
corresponding file (in this case, MD/tgr24510.zip
),
and extract just its .rt1
file (in this case,
tgr24510.rt1
). The rest of the data is interesting but
beyond the scope of this assignment. In fact, we won't even use the
full .rt1
files. To boil them down into this
assignment's simpler .map
format, use this Perl script to pull out just the
relevant columns and put tabs between them.
geography
PackageTo apply your graph
package to the above data, you will
need classes for dealing with the .map
file and with the
addresses that the user types in. Here's what's in the
geography
package.
I've left a few bits for you to write, as indicated below.
Read the javadoc comments in the code!
StreetSegment
corresponds to a line of the
.map
file, i.e., a city block. It has meaningful
fields for the different columns. Its constructor takes a string
and is able to parse that string appropriately. (If not,
it throws DataFormatException
.)
Point
is a point on the surface of the earth -- a
(longitude, latitude) pair. There are methods
distanceTo()
and directionTo()
for finding
how to get to another point. Those methods know about the size and curvature of
the earth. There are also hashCode()
and
equals()
, so that you can hash Point
objects.
An Intersection
is what you currently get when
you look up an address. It contains a Point
and a
string that describes that point. In the example run above, looking
up 3500 charlas st
found the nearby intersection
3499 N Charles St (near (-76.61757, 39.329265))
.
An AddressFinder
is the magic object that lets
you look up an address. When we read the line
-76617992 +39332692 -76618032 +39333710 3600 3601 3698 3699 N Charles Stfrom the .map file, we insert the resulting
StreetSegment
into our address finder.
The address finder says "Gee! I now know 4 new addresses! For example, there
is a 3600 N Charles St near the Point
(76.617992W, +39.332692N).
I'll remember that later in case someone looks up that address or a similar
address. Similarly, I know roughly where 3601, 3698, and 3699 N Charles
St. are."
You may have noticed that in the example run above, the user
looked up 3500 charlas st
-- an address that was not in
the AddressFinder
. The AddressFinder
had
to make a reasonable guess about where it was. In general, this
process of locating addresses is called geocoding.
The geocoding strategy currently used by the
AddressFinder
class is described in AddressFinder.java
.
It involves some ordered dictionaries (java.util.SortedMap
),
implemented with red-black trees (java.util.TreeMap
).
You knew those had to find their way in somehow.
Street
is an inner class that is defined
and used only inside AddressFinder
. Its full name is
actually AddressFinder.Street
. If you're not sure you understand Java's
inner classes, read the comments for Street
and for
its constructor.
A Route
is
basically a graph.Path
whose edges are streets. Your
graph.Dijkstra
class will return the best path, but if
you printed that path directly, it wouldn't come out looking like
directions. So you'll turn it into a Route
, which has
a very specialized toString()
method.
The Route.toString()
method
knows, for example, that a turn of between 100 and 130 degrees
should be described as "Turn sharply left", and that a direction of
about 45 degrees is "northwest". It also knows how to combine
segments of the same street, so that the directions say "go 2 miles"
instead of reporting every block separately.
Streetmap
is the class that puts it all together.
As its constructor reads StreetSegment
s from a
.map
file, it adds them to an
AddressFinder
and also to a
graph.KeyedDigraph
. Each directed edge of the digraph
holds a street name as its element. Each vertex of the digraph
holds a Point
as its element, and is also keyed by that
Point
.
Putting it together, a Streetmap
can find
a route between two address strings more or less as follows:
AddressFinder
to turn each address String
into an Intersection
.
Intersection
on the graph.KeyedDigraph
,
by using the Intersection
's Point
as a key.
graph.Dijkstra
class to find the shortest graph.Path
between the points.
graph.Path
into a Route
that can
be printed nicely
The main
method for Streetmap
doesn't
run a website or anything, but
it is more useful than most test methods. It prompts the user for
two addresses and prints out the resulting route, as shown
above.
I think you will be impressed by how fast this is, even when it has to explore most of Baltimore to find the destination. This is partly because computers are speedy. But when you realize that there are exponentially many possible routes from point A to point B, you really have to give a lot of credit to Dijkstra, for making it possible to find the shortest one in O(m log n) time. Especially when you realize that he did it in 1959 before anyone had ever heard of a data structure.
There is relatively new little code to write. However, to fill it in, you will have to study much of the provided code, particularly the method interfaces (which are commented).
That is one of the goals of this final assignment: to deeply understand how the data structures and techniques we've used over the semester can work together in a "real" system design.
container.HashDecorable
classThis is trivial - a mere warmup. HashDecorable
is basically just a wrapper around a small dictionary. The main
reason to ask you to do it is so that you have a chance to try out
Java's built-in dictionaries, which are called Maps (the
java.util.Map
interface) because they "map" each key to a
value. You probably want to use a small
java.util.HashMap
. Refer as necessary to their
documentation.
If you prefer, you can instead use your own dictionary class from Homework 7.
graph.Digraph
First study the Digraph
(directed graph)
interface. It is about as simple as possible, because it is not very
powerful. By contrast, the book tries to give you the only
Graph
and Digraph
data structures you'll ever
need (sections 12.1.1 and 12.4) -- at the cost of a complicated
interface and a somewhat less efficient implementation.
(The book's authors allow various kinds of modifications to the graph, whereas we only need to build it. They support iteration over a vertex's in-edges as well as its out-edges. They store lists of all vertices and edges. Finally, they go to some trouble to allow the same graph to contain both directed and undirected edges, which is rarely useful.)
We provide basic, obvious implementations of vertices and edges:
StoredVertex
and
StoredDirectedEdge
.
Also simple, eh? A vertex stores an element, and an edge stores an
element plus two endpoint vertices. Also, they both support
decorations.
While it's not so important for this assignment, we are careful
not to say that all vertices and edges must be objects of these
classes or their extensions. Extending these classes would make them
store more information. But for some kinds of graphs, the
vertices or edges might be able to store
less information: the elements or endpoints might be computed
on demand, or looked up in an auxiliary data structure, rather than
stored in the object itself. So we say that StoredVertex
and StoredDirectedEdge
are merely implementations
of abstract interfaces, Vertex
and DirectedEdge
.
For generality, all your graph methods should manipulate vertices and
edges where possible through these abstract interfaces.
Your job is to build an adjacency-list implementation of directed
graphs. This involves writing two classes. You want to implement
Digraph
as a class AdjListDigraph
.
To do so, you will need to extend StoredVertex
into a
class DirectedAdjListVertex
that stores a list of its out-edges. We have given you skeletons for
both files.
graph.KeyedDigraph
Now study the KeyedDigraph
interface, which extends Digraph
. The basic idea was
discussed earlier in this assignment.
You will implement KeyedDigraph
as a wrapper class.
That is, it will take an existing graph as an argument and turn it
into a keyed graph. This doesn't care what the implementation of the
existing graph is, so it is more flexible than just extending
AdjListDigraph
to implement KeyedDigraph
.
Instead of having a keyed AdjListDigraph
, you will
have the ability to build a keyed any-kind-of-graph.
Most of the wrapper class is already available
in HashKeyedDigraph
.
Just edit the parts that say FILL THIS IN
, and make sure
the test methods work.
If you prefer, you can use your dictionary classes
from Homework 7 instead of Java's
HashMap
class.
Answer in your Readme
file:
KeyedDigraph
is manipulated only
through the KeyedDigraph
interface, then it has the
following invariant: every key of the keyed digraph refers to at most one
vertex.
graph.Dijkstra
You'll implement Dijkstra's algorithm within this Dijkstra
class. Much
of the rest of the class has already been implemented for you, in
order to lay out the design, which differs from the textbook's design
in several ways. Read the comments in the code.
The core algorithm is the main part you have to write. You just
need to edit the parts marked FILL THIS IN
. You've
already got a crucial portion -- an implementation of
pqueue.AdaptablePriorityQueue
from Homework 6.
The interface is especially different from the one in the textbook.
It is more convenient for the user in several ways. The user
specifies a weight function with a Weighter
object, and
gets back a shortest Path
object. You will
need to look at those small classes. Also, the user can specify a
target vertex. The algorithm will stop when the target vertex is
found, but can continue again later if the user is interested in
distances from the same start vertex to other vertices.
Dijkstra
class contains test methods that try the toy
examples from the book, so you can easily find out whether you're
doing the right thing.
Answer in your Readme
file:
Dijkstra
class
to implement the java.util.Iterator
interface. An instance
of this iterator would iterate over graph vertices in order of
increasing distance from the start vertex. Describe how you could add
next()
and hasNext()
methods to your
Dijkstra
class.
geography.Streetmap
Now download the geography
package, if you haven't already. Read through the mostly-finished
Streetmap
class.
To hook it up to your Dijkstra
class, you will just need
to edit the few places that say FILL THIS IN
. This is
easy once you understand the design of the geography package. It's
basically a matter of figuring out which existing methods to call.
geography.AddressFinder
Read through the AddressFinder
class. This is finished except for the "interesting" calls that look
things up in the red-black trees. As usual, you should FILL
THEM IN
. You will just have to figure out how to use Java's SortedMap
interface, which is somewhat different from the book's ordered
dictionary interface.
Answer in your Readme
file:
The implementation of findStreetLike()
obtains an
iterator over a range of keys in the red-black tree, e.g., all keys
that satisfy CHARL <= key < CHARM
. Roughly, how would
you implement the next()
method of such an iterator?
Your technique shouldn't need to rely on the red-black properties; it
should work for any binary search tree.
You probably used the SortedMap.subMap()
method.
How can this method be implemented in a simple way that requires
O(1) space and O(1) runtime?
Extra credit: The comments in
findStreetLike()
claim that if the string k falls
right between keys x and y in the sorted map, then
either x or y (or both) shares a maximal prefix with
k. That is, no other key z in the sorted map shares a
longer prefix with k. Prove it. (Hint: Proof by
contradiction.)
You are required to make an improvement of your choice from the following list. The route planner is far from perfect, and it's appropriate to end the course with a more open-ended task. For extra credit, you can make multiple improvements.
As in Homework 8, a substantial part of your grade will depend on the clarity and elegance of your solution.
Answer in your Readme
file:
Describe how you carried out the improvement. Give a clear overview of your design, mentioning anything that was interesting. (You may point the graders to high-level comments in the code for this.)
Show some interesting examples of input and output -- for example, illustrating the routes you get before and after your improvement.
Tell the graders exactly which files and methods to look at to see what you've done.
You'll notice that some of the routes you get are pretty complicated. In an effort to minimize the total distance, Dijkstra's algorithm will take all kinds of little shortcuts that would make the route hard to remember or follow for a real driver.
Come up with some techniques for making the routes shorter or simpler, and implement them. The basic technique is to change the notion of the weight of a path. Instead of just considering total distance, you should use some kind of combined criterion. You might say that paths are heavier (worse) if they (1) are longer, and (2) spend a lot of time on minor roads, and (3) have a lot of turns, and (4) have a lot of street name changes.
You are already doing (1). (2) is pretty easy if you have some way
of defining a "minor" road. (Don't use
AddressFinder.Street.numKnownAddresses()
, since then Jones
Falls Expressway and I-70 will come out as minor.) (3) and (4) are
harder because now the cost is associated not just with an edge, but
with a transition from one edge to the next. To handle these, you
will either have to modify the street graph in some systematic way
(how?), or modify Dijkstra's algorithm to act as if you have done
that.
You will also need some way of combining these criteria into one definition of edge heaviness. Just make something up that seems to work well, like a linear combination of different criteria. Remember that Dijkstra's algorithm does not allow negative edge weights.
There are various ways to improve the geocoding done by
AddressFinder
. For example, the
AddressFinder
currently finds only intersections. But
for addresses that are in the middle of the block, it might be
helpful to return locations that are not Intersections
.
To see why, try planning a route from 3360 Charles to 3370 St. Paul. The system will round to the nearest intersection and send you directly across 34th St. It would be better if it sent you a little way up Charles, across 34th, and down St. Paul.
Improve AddressFinder
to do address
interpolation in order to return approximate locations that
are not Intersections.
Now improve Streetmap
so that it can find the shortest
route between non-intersections. You want to temporarily add
two vertices to the graph, for the start and end locations, and then
plan your route. When you're done you should remove those vertices
again to prevent the graph from getting cluttered over time.
(Alternatively, maybe you can think of a strategy that gets the same
answer without actually modifying the graph.)
You can try looking up addresses at this geocoding website to see what the competition does. And here's an awesome feat of geocoding.
Most of the online mapping applications allow you to specify intersections in the form "University and St. Paul" or "University Pkwy & St. Paul Street".
Improve AddressFinder
so that it can do this as
efficiently as it currently finds an intersection near a house number.
The user should be able to enter either kind of address.
What if the two streets don't intersect at all? If you weren't
sure which streets the user meant, you should consider alternative
interpretations. If I say "Highfield and Norwood," and E Highfield Rd
doesn't intersect with Norwood Ave, the system can try Norwood Rd,
which is smaller but does intersect. You can use any reasonable
strategy for trying the other interpretations. If you still can't
find an intersection, return null
as documented.
Hashes are overkill for storing attribute-value pairs. An object is likely to have only 0, 1, or 2 attributes most of the time. At that size, linked lists are plenty fast. In fact they are faster: no need to compute the hash code. And they take up less space: nothing is wasted on empty buckets, or on auxiliary variables to keep track of the size and capacity.
Design and implement a hybrid implementation of
Decorable
, called SlimDecorable
, that uses a
linked list at small sizes (< 10 attributes) and starts using a hash
table at larger sizes. It should remain extremely space-efficient at
small sizes. A Decorable
object that currently has no
attributes should only require 4 bytes (1 pointer) of storage more
than an equivalent non-decorable object. Each additional attribute-value
pair (up to about 10) should only require space for one new Item
and a next
pointer.
Comment your design well in SlimDecorable.java
. Write
a test method that checks that everything goes well when the decorable
object switches to using a hash table.
Change your graph class to use SlimDecorable
instead
of HashDecorable
, and make sure your route planner
continues to work exactly as before.
Our graph implementation is a bit wasteful of storage, which might
be an issue for really big graphs. Every out-edge stored at a
DirectedAdjListVertex
, v, contains a pointer to
v itself as well as to the other endpoint w. Figure out
a way to avoid storing the pointer to v.
Using your idea, construct a new implementation of
Digraph
that is similar to AdjListDigraph
.
However, it should not store full edges
(StoredDirectedEdge
) -- it should only store "out edges"
that support destination()
but not origin()
.
The total memory used should be less.
Call your new class HalfAdjListDigraph
. Of course, it
will still have to implement Digraph
, so the user should
not have to know or care that you are not storing the full edge. For
example, insertReverseOf()
and bestPathTo()
need to get at the origin of an edge. They should still be able to
do so.
Make sure that Dijkstra
and Streetmap
still work when you simply substitute HalfAdjListDigraph
for AdjListDigraph
. Notice that you didn't have to make
a keyed version: the wrapper class HashKeyedDigraph
wraps
the new implementation as well as the old one.
When explaining your design in your Readme
, discuss
the implications for speed. Why will you be a bit slower?
Fortunately for speed, many algorithms mostly care about
destination()
and not origin()
. For
example, graph search algorithms (DFS, BFS, Dijkstra's) spend most of
their time going forward on directed edges. It is very rare that they
have to reverse direction and look at origin()
. Use this
observation to come up with a clean way to restore such algorithms to
almost their original speed, while still using the more efficient
storage scheme. (Hint: Add to the Digraph
interface.) Modify Dijkstra
to use this technique so
that it works well with both AdjListDigraph
and
HalfAdjListDigraph
. Make sure that it and
Streetmap
get the same answers as before.
Finally, in your Readme
, discuss the case of a pair of
opposing directed edges. They store the same two vertices; assume
that they are also required to store the same element. (Really you
can think of them as jointly constituting a single undirected edge.)
Can you save any further memory in this case? If not, explain
why not. If so, sketch a design.
In a keyed digraph, one can use any objects as keys. Since vertices are in 1-to-1 correspondence with keys, one could conceptually think of a key object as being the vertex of the graph, instead of referring to the vertex.
This is powerful for two reasons. First, a given object (such as
the string "BWI") might appear as a vertex in many different graphs.
Second, two separately constructed objects that are equal
are still considered to represent the same vertex, with the same
incident edges. Neither of these is generally possible for a
Vertex
as we've defined it, such as a
DirectedAdjListVertex
.
In fact, many mathematicians do think of graphs this way. The
mathematical definition of a graph is a set of arbitrary
objects, V, and a set of edges between the objects, E. The
objects are called "vertices" but there is no requirement that they
come from some special Vertex
class. They could
be airport names, or points on the earth.
Redesign the directed graph interface so that it directly reflects
this vision. Your interface should be similar to Digraph, but should
not mention vertices or keys at all. It should never be necessary to
go from an object to a vertex (keyed lookup) or a vertex back to an
object (the element()
method). Instead, the user should
be able to think of an edge as connecting any two objects in the
graph, such as strings.
Carry out your design as a new package objgraph
, with
an interface DirectedObjgraph
and an implementation of
that interface. Write some test methods that show off your interface,
including the ability to decorate objects with respect to a graph (do
this by decorating the underlying vertex).
Your implementation could just be a wrapper around
graph.KeyedDigraph
. Then there are Vertex
objects there; the user just doesn't see them. Alternatively, it
isn't hard to implement the idea without using Vertex
objects at all; this saves space.
Now think about Dijkstra's algorithm (you don't have to reimplement it) and
figure out why this pretty design hurts efficiency. Explain in your
Readme
.
This one is hard enough that it counts as more than 1 improvement (i.e., extra credit). I think it's not that hard, though.
Make a movies
package that solves the "Six Degrees of
Kevin Bacon" problem.
As we discussed in class, if you actually link all pairs of actors who appeared in each movie, you will have an enormous number of edges and you probably won't be able to handle it. Instead, you should build a bipartite graph with two kinds of vertices: actors and movies. Now each (actor,movie) pair needs only one edge. This greatly reduces the out-degree of each vertex, although it doubles the length of each path between actors.
You will need to download the Internet Movie Database (IMDB) and write a class or two to deal with its file format. Actually you might fare better with this subset of the IMDB, prepared by a CS226 student who did this problem last year (details).
You are not required to do approximate name lookup (although it would be a good idea -- see below). You can require the name to be typed exactly.
By the way, the original "six degrees of separation" study (Milgram, 1967) was not very careful -- it's now being replicated on a global scale, using email, and you can participate!
This one is also hard enough that it counts as more than 1 improvement (i.e., extra credit).
Using Java's SWING classes, make a graphical route planning class
that calls Streetmap
's methods to do the work. You can
interpret (longitude, latitude) coordinates as (x,y) coordinates.
You should prompt the user for addresses in a dialog box. Then you should display a map with the shortest route highlighted, as well as printing directions. It would be nice if you build up the map as you search, by displaying edges as you explore them.
The approximate-string-lookup technique in
geography.AddressFinder.findStringLike()
finds the element
associated with a string key
k. If k isn't in the dictionary, it finds the "most
important" element associated with any key in the dictionary that shares
the longest possible prefix with k.
The approximate-number-lookup technique in
geography.AddressFinder.Street.findIntersectionNear()
is
similar. It finds the element associated with a numeric key k.
If k isn't in the dictionary, it returns the element
associated with the first key in the dictionary that is as close
as possible to k.
This kind of technique might be useful elsewhere, for example, for looking up actors' names in Six Degrees of Kevin Bacon. So it really deserves to be "encapsulated" -- put into a class of its own.
Design an extension of the java.util.SortedMap
interface that supports approximate lookup. Also write an
implementation that extends java.util.TreeMap
. These
files should go in the map
package. Now simplify
AddressFinder
so that it just uses your new classes to
handle the approximate lookup. This should let you shorten or remove
several methods in AddressFinder
. Check that everything
still works as before.
In your design, the user of the class should be able to specify the
"similarity" function and the "importance" function, preferably when
constructing the map. (In the first case above, the similarity of two
String
keys was the length of their longest common
prefix, and the importance of a Street
element was the
number of known intersections on that street.) What assumptions, if
any, must be made about the similarity function for the design to
work?
You might also want to include the ability for the search to fail quickly if no keys in the dictionary are sufficiently similar to the search key. Currently, if the user types in "b", the address finder will do a linear search through all the streets that begin with "b", and return the biggest one. That's a lot of time to waste on a typo.
Dijkstra's algorithm finds the distance from the start vertex to
every vertex. If you know your destination, this is wasteful,
so our Dijkstra
class knows to stop early as soon
it finds the destination.
But it's still wasteful. It would be more accurate to say that it stops as soon as it blunders into the destination. It only uses the destination to decide when to stop, not what order to visit vertices in.
In the opening example of this assignment, why did we have to explore 19386 edges to figure out how to get from JHU to 1 E Fayette St.? Because we were just spreading out from JHU equally in all directions until we happened to find the destination. We spent as much time searching north as south. Dumb move. From the coordinates, it is clear that we want to go south, and we should concentrate our search in that direction.
The A* algorithm uses this insight to do a much quicker job of finding the shortest route to a single destination. (It doesn't necessarily find the shortest route to any other destination, though.) In our example, it only has to visit 247 edges instead of 19386, or about 1.3% as many. That's a big win!
The A* algorithm is a generalization of Dijkstra's that works by
changing the definition of priority on the priority queue. The key of
a vertex v isn't just its best known distance from
vStart
, denoted g(v) and stored as
v.get(attrDist)
. Rather, it is
f(v) = g(v)+h(v), where h(v) is an estimate of its true
best distance to vEnd
.
In other words, the key f(v) is an estimate of the total
length of the shortest path from vStart
through
v
all the way to vEnd
. If that path is
estimated to be comparatively short, then v
is a
promising choice and we should put it near the front of the priority
queue so that we explore it soon.
Another way of thinking about it: g(v) is no worse for northern vertices than for southern vertices, which is why Dijkstra's algorithm is slow. But h(v) is increasingly bad as the vertices get farther north (or in general, farther from the destination), so f(v) = g(v)+h(v) will be bad for these vertices, and A* will appropriately put them at the back of the queue.
The A* algorithm works for any graph. The only requirement is that
for all v,
h(v) >= 0 is an optimistic estimate
(i.e., an underestimate) of the true distance from v to
vEnd
. Suppose v is on the shortest path. If
h(v) were a huge overestimate, then
v would get mistakenly stuck at the back of the queue forever
with a huge key, while we found and incorrectly returned another path
that did not use v. But if h(v) is a huge
underestimate, we're okay. (Note that if we underestimate all
h(v) as 0, we are back to Dijkstra's algorithm, slow but still
correct.
A proof of the A* algorithm's correctness will go here shortly.
It should be fairly easy to modify your Dijkstra
class
to implement A*. When calling getPathTo(vEnd)
, the user
should supply a Heuristic
object that is specific to
vEnd
, and which computes the h function on each
vertex v -- the optimistic estimate of the best distance from
v to vEnd
. A null heuristic could be treated
as h(v) = 0, giving Dijkstra's algorithm.
Note that if the user subsequently calls getPathTo
on
the same Dijkstra
instance but with a different
heuristic, then anything still on the priority queue must be
reprioritized according to the new heuristic. You could either
handle this (which would be nice) or throw an exception. But
multiple calls with the same heuristic (in particular, null)
should still be allowed.
For the Streetmap
class, a good heuristic to use is
the straight-line distance from v to the goal point. This is
optimistic, as required, because no route can be shorter than a
straight line. And it's easy to compute with
p.distanceTo(pEnd)
.
Now do a systematic comparison of Dijkstra's algorithm to the A*
algorithm for route planning. Find some way of picking a bunch of
random routing problems (you could do it by hand, but perhaps you
could get random points from the .map
file). For each
problem, record the distance
d between vStart
and vEnd
, versus the
amount of work done by each algorithm. Draw graphs if possible. How
does the amount of work performed by Dijkstra's algorithm vary with
the distance d? Explain why this might be expected. How about
the amount of work performed by A*?
If you have another idea about how to improve the system, great! Just check with me first that it is interesting enough to get full credit.
Need I say that you've come a long way from Toilet
s and Stack
s?
The ideas in this course -- data structures and design principles --
will serve you well in many years of programming. A special thanks to
Mike Goodrich and Roberto Tamassia for writing a textbook that I
was actually happy to use.
Jason Eisner - jason@cs.jhu.edu
- $Date: 2004/05/11 01:57:31 $