Goal of this homework: Learn to implement hash tables and
apply them to a realistic problem. Also familiarize yourself with
some standard Java classes, such as the java.util.List
interface and java.lang.String
.
Finally, start using packages.
Collaboration policy for this homework: You must follow the
CS Department Academic
Integrity Code and this time may not collaborate with
another student. Exception: This assignment builds on homework
6's ReportedSimpleElection
class. You may work with your
homework 6 partner to fix any bugs in HW6, or to revise it to work
with the packaged priority-queue interface. See the "Packages"
section below.
How to hand in your work: instructions, upload page
What to hand in (all packaged up in a zip or tar file):
Your commented code:
ReportedSimpleElection.java
(revised version)
HashDictionary.java
.java
files that are
necessary to make your program run correctly. Please make sure
that you include everything that is needed to make your code
compile! This includes files that we gave you, files that you
already handed in last week, etc.
Make sure that your zip file or tar file preserves the correct directory structure, with every package in its own directory!
A doc
subdirectory containing HTML documentation
that you have generated with the command javadoc -private
-linksource -d doc *.java
. The graders will look at this
documentation, so you should provide javadoc comments for all of
your methods, including private ones.
A Readme
file that contains your
answers to miscellaneous questions on the homework (i.e., answers that
don't go in any other file). Usually these are English paragraphs or small
fragments of code. The top of your Readme
should
state the name of your partner (if any).
Acknowledgments: Thanks to Josh Dunkelman for his help in preparing this assignment.
Last week in Homework 6, you built a vote-counting system where the ballots were numbers. Unfortunately, when your system was used in an election over the weekend, many voters complained. They couldn't remember their candidate's number, or they got 6's and 9's confused, or they forgot they were voting and wrote down their lucky Lotto numbers instead.
So this week, you will make your system more user-friendly. No
more candidate numbers. You'll make it work when the ballots are names. Click on the link now
to see the data. Notice that different voters use different
capitalization; so your system will have to recognize that
DECKER
and Decker
refer to the same candidate.
The output of your system will be very much like it was before, but without the candidate numbers.
Born pulls into the lead with 1 of 1 votes sullivan pulls into the lead with 2 of 4 votes ... (2730 votes) Galluccio (1669 votes) Born (1653 votes) DECKER ... |
Last week, if you saw a vote for candidate 6
, then you
looked up her locator in an array. The array was a kind of dictionary
that mapped integers to locators. This week, you will see a vote for
"decker"
instead, so you need a dictionary that can map
strings to locators. In terms of the Homework 6 diagram, you will
upgrade the blue array to become a blue dictionary.
Most of the work this week lies in writing a HashDictionary
class that implements Dictionary
. As a general
dictionary class, it will be able to map arbitrary key objects to
arbitrary element objects. At the end of the homework, you will
modify your ReportedSimpleElection
to use this dictionary
as described above.
In this assignment (and subsequent assignments), you will be
dealing with several interacting data structures. That involves a
confusing number of .java
files. Before things get out
of hand, you should start using Java's package mechanism to
keep everything organized.
Download and examine this solution to most of Homework 6, which has been reorganized into 3 packages as explained here. You should use these packages as a starting point for this assignment. (You may want to read something about how Java packages work. It's quite easy. Section 1.8 of the textbook covers it in 2 pages, or you could check out this online tutorial. The tutorial doesn't mention an alternative way to set the classpath in Windows: Start / Settings / Control Panel / System / Advanced / Environment Variables.)
You should now create a fourth package: election
.
Your election
directory should contain a copy of the
ReportedSimpleElection.java
that you wrote for Homework 6. You will need to modify this file
slightly so that it works with the package system. It should now begin
with a package
declaration and perhaps some
import
directives:
package election; // says what package this class is in import pqueue.HeapAdaptablePriorityQueue; // says what package another class is in // or just "import pqueue.*;" to import all class names from pqueue package ... |
You don't have to import class names from other packages,
but if you don't, you will have to spell them out in full every
time you use them
(e.g., new pqueue.HeapAdaptablePriorityQueue()
instead of just new HeapAdaptablePriorityQueue()
).
If your election.ReportedSimpleElection
class depends
on other supporting classes (such as special comparators), put those
into the election
class as well.
Make sure that you can compile and run the full solution to
homework 6, in its packaged form. This should all be pretty easy if
you got homework 6 working correctly. If not, finish writing,
debugging and testing ReportedSimpleElection
before you
continue.
Note: If you did the 2nd-edition version of Homework 6, you
will also have to edit your ReportedSimpleElection
slightly to work with the 3rd-edition PriorityQueue
/AdaptablePriorityQueue
interface, since that's what's in the provided packages. Prof. Eisner
or the course staff would be happy to help you with that.
One more thing: include a file election/package.html
file that briefly describes what the election
package
does. This file will be used by Javadoc. (See the other packages for
examples.)
You may work with your homework 6 partner to get the above things running. For the rest of the assignment, you are on your own.
Dictionary
Now create a fifth package, map
, to hold map and
dictionary classes. (You'll only be implementing a dictionary for
this assignment.)
Download the partial map
package that we're providing you. It holds a Map
interface extended by
Dictionary
, and an
incomplete HashDictionary
implementation for you to finish.
The Dictionary
interface has more methods than you
actually need for this application, but this homework asks you to
implement and test the entire interface anyway. We have provided
a test method in HashDictionary
.
Your hash table should have the following design:
You may not make use of any existing map or dictionary classes (e.g.,
implementations of java.util.Map
).
You should have a no-argument hash table constructor. It
should use a default equality tester that just calls a key's built-in
equals()
and hashCode()
methods.
You should also have a hash table constructor that takes an EqualityTester
argument. We have corrected the textbook's
EqualityTester
interface by adding a hash()
method. (That's because if a user provides a new definition of key
equality, he/she had better also provide a matching hash function, so
that equal keys get the same hash code!)
You may use any compression function you like to get from the hash code to a bucket number.
To handle collisions, use separate chaining. In other words,
each bucket will contain a list of entries. For convenience and for
practice, you should specifically make this a java.util.List
.
Thus the hash table is basically stored in an array like this:
import java.util.List; ... protected List[] buckets; // note: an empty bucket could hold either an empty list or null, as you prefer
The java.util.List
interface is quite easy to use.
Besides the add()
method, you will be particularly
interested in the iterator()
method, which returns an iterator
that supports remove()
. Java provides several
implementations of the java.util.List
interface; you
may choose one that you like, such as java.util.LinkedList
.
The book explains why you should probably have a prime number of buckets. When constructing the hash table, you should round the user's requested capacity up to the nearest prime number, using the following method:
import java.math.BigInteger; ... /** Returns the first probably-prime number that is >= i. */ static private int nextProbablePrime(int i) { // while i definitely isn't a prime, keep adding 1 while (!BigInteger.valueOf(i).isProbablePrime(20)) i++; // we quit the loop, so isProbablePrime must have thought i was probably a prime // (isProbablePrime(20) has error probability of // 1 in 2^20, i.e., about 1 in a million) return i; } |
Whenever the hash table's load factor exceeds 90% (see section 8.2.7 in the 3rd edition, or 8.3.7 in the 2nd edition), you should approximately double its size. This requires you to rehash all the items into the new, larger set of buckets, where they will be distributed differently.
The easiest way to do this without duplicating code is to use your
HashDictionary.insert()
method. Create a temporary new
hash table, ht
, whose requested initial capacity is
2*buckets.length
(the constructor will round this up to
the nearest prime). Insert all the entries in the current hash table
into the new hash table ht
. Then simply set
buckets = ht.buckets
. (Java will eventually
garbage-collect the old buckets
array and the temporary
ht
object, since they become inaccessible.)
If you find yourself duplicating code, reorganize it. For
example, I kept needing an expression to figure out what bucket to
look in -- so I packaged it up in a private helper method,
private int bucketOf(Object key)
.
You would expect that hash % N
would give you a
result from 0 to N
-1 that you can use as an array index.
Unfortunately, Java's version of the %
operator says that
hash % N
is negative if hash
is negative.
Adding N
to a negative result will put it back in the
right range. Example: Counting downward by 10's, if
hash
=23, 13, 3, -7, -17, -27 then hash %
10
=3, 3, 3, -7, -7, -7. Add 10 to the negative versions
to get 3, 3, 3, 3, 3, 3.
When writing iterators, you should generally not waste space or time by constructing a new list with copies of all the objects you want to iterate over. Rather, the iterator should simply remember where you are in the data structure you are iterating over, just as in Homework 4. This saves space, and it also saves time, particularly if the user might stop early instead of "using up" the iterator.
An exception to the previous point is the iterator returned by
removeAll()
, since it is iterating over elements
that are no longer in the data structure. The
removeAll()
method must build a new list of the
elements that it removes, and
return an iterator over that list.
The documentation of java.util.Iterator
says that
the remove()
method is optional; it is allowed to
simply throw UnsupportedOperationException
. If you wish,
you may do that in the iterator classes you write here. If you do
implement remove()
, read
this description first so you know how it's supposed to work.
You will need to write some special iterator classes. It is
convenient to write these as inner classes of
HashDictionary
, and in fact we've included skeletons
for them.
Finally, don't forget to test your HashDictionary
class.
A pretty good test method was provided for you above.
Answer in your Readme
file:
Why do we use a EqualityTester
rather than just a Comparator
?
The EqualityTester
documentation requires that if
equal(key1,key2)
then we need
hash(key1)==hash(key2)
. In other words, there is a
compatibility requirement between the two methods of
EqualityTester
. What, exactly, would go wrong if this
requirement were violated?
All objects in Java have a hashCode
method. We
use that method by default, since
EqualityTester.hashValue(key)
simply returns
key.hashCode()
unless it is overridden. For example, our
hash code for "decker"
will be
"decker".hashCode()
.
By experimentation (just write a small test class), figure out what
Java's String.hashCode()
actually does. Write a short,
fast Java function that ought to compute the same thing. Hand in your
function as part of your Readme
file.
Is it possible to check whether the load factor is under 90% without using the relatively slow operation of division? (Hint: Use multiplication.)
You have already implemented
HashDictionary.replaceValue()
. Sketch how you would
implement HashDictionary.replaceKey()
. It is tempting
for replaceKey()
to call insert(k,v)
to insert
a new key/value pair; why would that be wrong?
You have already implemented
HashDictionary.entries()
. Sketch how you would implement
a HashDictionary.keys()
iterator that returns each key
only once, even if there are multiple entries with that key.
Assume you are stuck with the current design of
HashDictionary
, in which each bucket simply holds a
linked list of Entry
s.
Same as the previous question, but this time you are also
allowed to change the design of HashDictionary
. Can you come
up with an easier and more efficient way to support iteration over
the unique keys? (There is more than one good answer - extra credit for
two!)
Now to use your hash table class in the service of society! Modify
your ReportedSimpleElection
class from last week so that
it can handle the new, user-friendlier format of the ballots
file. This will be an
amazingly easy change. The output format and implementation were
discussed in the introduction to this homework.
Your class should probably have a countVote(String
name)
method that looks up a candidate in the dictionary and
gives him or her an extra vote. (If you didn't do something like this
last week, it's time to improve your design - you will be graded on
style too.)
Every candidate you have seen will have an entry on the priority
queue, keyed by the number of votes he or she has so far. So
countVote()
has to look up this entry and increase its
key, thereby moving it ahead on the priority queue.
You don't have a list of candidates ahead of time -- you will
become aware of new candidates only as you see votes for them. The
first time that countVote()
receives a vote for a new
candidate, it should enqueue that candidate with a key of one vote.
The resulting entry on the priority queue should be inserted into the
dictionary, keyed by the candidate's name so that it can be found
again when it the candidate gets additional votes. Don't get confused
between the Entries in the dictionary and the Entries on the priority
queue:
Remember that you should ignore uppercase/lowercase distinctions in
candidate names. As discussed in class, there are multiple ways to
solve this - you could change the EqualityTester
or you
could change the keys. There are methods of String
and Character
that can help you convert or ignore case.
But when you print a candidate's name, you should use the capitalization from the first ballot you saw for that candidate. This policy means that if people are pretty consistent about capitalizing as "O'Malley," then you will probably see "O'Malley" first, which means you'll nicely print "O'Malley" rather than "O'malley" or "o'malley" or something. The result is illustrated in the introduction.
Note: Get rid of the array of candidate names,
CANDIDATES
, that you used last week. You don't need it
anymore.
As before, test your class by running java
ReportedSimpleElection 9 ballots
.
Extra credit: Improve your hash table so that its iterators are fail-fast.
What does that mean? A user is not supposed to modify the hash
table in between calls to next()
. So it is "okay" if the
iterator does something weird in that case. However, the user might
accidentally do this, leading to a bug that would be hard to
find.
We generally strive to design classes that are hard to misuse, even
accidentally. The methods of a fail-fast
iterator are
designed to throw an exception right away if they are misused, rather
than returning a bogus value. Each public method of the iterator
starts by checking that the hash table has not changed since the
iterator was created. If it has, the iterator throws a ConcurrentModificationException
.
The usual implementation is for the hash table to maintain a "modification count." The iterator just checks whether this modification count is the same as it was when the iterator was created.
Answer in your Readme
file (if you are doing the extra
credit):
The user is allowed to modify the hash table via the
iterator's remove()
method. Discuss how this should
affect the modification count design. Consider especially the
possibility of two iterators over the same hash table that exist at
the same time, one of which calls remove()
without the
other's knowledge.
Coming in Homework 8!
Jason Eisner - jason@cs.jhu.edu
- $Date: 2004/04/01 04:23:32 $