600.226 Data Structures (Spring 2004 - Prof. Eisner)

Homework 7 ... due 4pm Friday 2 Apr 2004

Summary

This is the second part of a three-part project.

Goal of this homework: Learn to implement hash tables and apply them to a realistic problem. Also familiarize yourself with some standard Java classes, such as the java.util.List interface and java.lang.String. Finally, start using packages.

Collaboration policy for this homework: You must follow the CS Department Academic Integrity Code and this time may not collaborate with another student. Exception: This assignment builds on homework 6's ReportedSimpleElection class. You may work with your homework 6 partner to fix any bugs in HW6, or to revise it to work with the packaged priority-queue interface. See the "Packages" section below.

How to hand in your work: instructions, upload page

What to hand in (all packaged up in a zip or tar file):

Your commented code:
- ReportedSimpleElection.java (revised version)
- HashDictionary.java
- Any other .java files that are necessary to make your program run correctly. Please make sure that you include everything that is needed to make your code compile! This includes files that we gave you, files that you already handed in last week, etc.
Make sure that your zip file or tar file preserves the correct directory structure, with every package in its own directory!
A doc subdirectory containing HTML documentation that you have generated with the command javadoc -private -linksource -d doc *.java. The graders will look at this documentation, so you should provide javadoc comments for all of your methods, including private ones.
A Readme file that contains your answers to miscellaneous questions on the homework (i.e., answers that don't go in any other file). Usually these are English paragraphs or small fragments of code. The top of your Readme should state the name of your partner (if any).

Acknowledgments: Thanks to Josh Dunkelman for his help in preparing this assignment.

Introduction

Last week in Homework 6, you built a vote-counting system where the ballots were numbers. Unfortunately, when your system was used in an election over the weekend, many voters complained. They couldn't remember their candidate's number, or they got 6's and 9's confused, or they forgot they were voting and wrote down their lucky Lotto numbers instead.

So this week, you will make your system more user-friendly. No more candidate numbers. You'll make it work when the ballots are names. Click on the link now to see the data. Notice that different voters use different capitalization; so your system will have to recognize that DECKER and Decker refer to the same candidate.

The output of your system will be very much like it was before, but without the candidate numbers.

Born pulls into the lead with 1 of 1 votes
sullivan pulls into the lead with 2 of 4 votes
   ...
(2730 votes) Galluccio
(1669 votes) Born
(1653 votes) DECKER
   ...

Last week, if you saw a vote for candidate 6, then you looked up her locator in an array. The array was a kind of dictionary that mapped integers to locators. This week, you will see a vote for "decker" instead, so you need a dictionary that can map strings to locators. In terms of the Homework 6 diagram, you will upgrade the blue array to become a blue dictionary.

Most of the work this week lies in writing a HashDictionary class that implements Dictionary. As a general dictionary class, it will be able to map arbitrary key objects to arbitrary element objects. At the end of the homework, you will modify your ReportedSimpleElection to use this dictionary as described above.

Packages

In this assignment (and subsequent assignments), you will be dealing with several interacting data structures. That involves a confusing number of .java files. Before things get out of hand, you should start using Java's package mechanism to keep everything organized.

Download and examine this solution to most of Homework 6, which has been reorganized into 3 packages as explained here. You should use these packages as a starting point for this assignment. (You may want to read something about how Java packages work. It's quite easy. Section 1.8 of the textbook covers it in 2 pages, or you could check out this online tutorial. The tutorial doesn't mention an alternative way to set the classpath in Windows: Start / Settings / Control Panel / System / Advanced / Environment Variables.)

You should now create a fourth package: election. Your election directory should contain a copy of the ReportedSimpleElection.java that you wrote for Homework 6. You will need to modify this file slightly so that it works with the package system. It should now begin with a package declaration and perhaps some import directives:

package election;                          // says what package this class is in
import pqueue.HeapAdaptablePriorityQueue;  // says what package another class is in
// or just "import pqueue.*;" to import all class names from pqueue package
...

You don't have to import class names from other packages, but if you don't, you will have to spell them out in full every time you use them (e.g., new pqueue.HeapAdaptablePriorityQueue() instead of just new HeapAdaptablePriorityQueue()).

If your election.ReportedSimpleElection class depends on other supporting classes (such as special comparators), put those into the election class as well.

Make sure that you can compile and run the full solution to homework 6, in its packaged form. This should all be pretty easy if you got homework 6 working correctly. If not, finish writing, debugging and testing ReportedSimpleElection before you continue.

Note: If you did the 2nd-edition version of Homework 6, you will also have to edit your ReportedSimpleElection slightly to work with the 3rd-edition PriorityQueue/AdaptablePriorityQueue interface, since that's what's in the provided packages. Prof. Eisner or the course staff would be happy to help you with that.

One more thing: include a file election/package.html file that briefly describes what the election package does. This file will be used by Javadoc. (See the other packages for examples.)

You may work with your homework 6 partner to get the above things running. For the rest of the assignment, you are on your own.

Implementing a `Dictionary`

Now create a fifth package, map, to hold map and dictionary classes. (You'll only be implementing a dictionary for this assignment.)

Download the partial map package that we're providing you. It holds a Map interface extended by Dictionary, and an incomplete HashDictionary implementation for you to finish.

The Dictionary interface has more methods than you actually need for this application, but this homework asks you to implement and test the entire interface anyway. We have provided a test method in HashDictionary.

Your hash table should have the following design:

You may not make use of any existing map or dictionary classes (e.g., implementations of java.util.Map).
You should have a no-argument hash table constructor. It should use a default equality tester that just calls a key's built-in equals() and hashCode() methods.

You should also have a hash table constructor that takes an EqualityTester argument. We have corrected the textbook's EqualityTester interface by adding a hash() method. (That's because if a user provides a new definition of key equality, he/she had better also provide a matching hash function, so that equal keys get the same hash code!)
You may use any compression function you like to get from the hash code to a bucket number.
To handle collisions, use separate chaining. In other words, each bucket will contain a list of entries. For convenience and for practice, you should specifically make this a java.util.List. Thus the hash table is basically stored in an array like this:
```
 
  import java.util.List; 
    ...  
  protected List[] buckets; // note: an empty bucket could hold either an empty list or null, as you prefer
```
The java.util.List interface is quite easy to use. Besides the add() method, you will be particularly interested in the iterator() method, which returns an iterator that supports remove(). Java provides several implementations of the java.util.List interface; you may choose one that you like, such as java.util.LinkedList.

The book explains why you should probably have a prime number of buckets. When constructing the hash table, you should round the user's requested capacity up to the nearest prime number, using the following method:

  import java.math.BigInteger;

     ...

  /** Returns the first probably-prime number that is >= i. */
  static private int nextProbablePrime(int i) {

    // while i definitely isn't a prime, keep adding 1
    while (!BigInteger.valueOf(i).isProbablePrime(20))  
      i++;  

    // we quit the loop, so isProbablePrime must have thought i was probably a prime 
    //   (isProbablePrime(20) has error probability of 
    //   1 in 2^20, i.e., about 1 in a million)
    return i;
  }

Whenever the hash table's load factor exceeds 90% (see section 8.2.7 in the 3rd edition, or 8.3.7 in the 2nd edition), you should approximately double its size. This requires you to rehash all the items into the new, larger set of buckets, where they will be distributed differently.

The easiest way to do this without duplicating code is to use your HashDictionary.insert() method. Create a temporary new hash table, ht, whose requested initial capacity is 2*buckets.length (the constructor will round this up to the nearest prime). Insert all the entries in the current hash table into the new hash table ht. Then simply set buckets = ht.buckets. (Java will eventually garbage-collect the old buckets array and the temporary ht object, since they become inaccessible.)
If you find yourself duplicating code, reorganize it. For example, I kept needing an expression to figure out what bucket to look in -- so I packaged it up in a private helper method, private int bucketOf(Object key).
Overall, I used several helper methods to make the major code nicer.
You would expect that hash % N would give you a result from 0 to N-1 that you can use as an array index. Unfortunately, Java's version of the % operator says that hash % N is negative if hash is negative. Adding N to a negative result will put it back in the right range. Example: Counting downward by 10's, if hash=23, 13, 3, -7, -17, -27 then hash % 10=3, 3, 3, -7, -7, -7. Add 10 to the negative versions to get 3, 3, 3, 3, 3, 3.

Hints on the iterator methods:

When writing iterators, you should generally not waste space or time by constructing a new list with copies of all the objects you want to iterate over. Rather, the iterator should simply remember where you are in the data structure you are iterating over, just as in Homework 4. This saves space, and it also saves time, particularly if the user might stop early instead of "using up" the iterator.
An exception to the previous point is the iterator returned by removeAll(), since it is iterating over elements that are no longer in the data structure. The removeAll() method must build a new list of the elements that it removes, and return an iterator over that list.
The documentation of java.util.Iterator says that the remove() method is optional; it is allowed to simply throw UnsupportedOperationException. If you wish, you may do that in the iterator classes you write here. If you do implement remove(), read this description first so you know how it's supposed to work.
You will need to write some special iterator classes. It is convenient to write these as inner classes of HashDictionary, and in fact we've included skeletons for them.

Finally, don't forget to test your HashDictionary class. A pretty good test method was provided for you above.

Answer in your Readme file:

Why do we use a EqualityTester rather than just a Comparator?
The EqualityTester documentation requires that if equal(key1,key2) then we need hash(key1)==hash(key2). In other words, there is a compatibility requirement between the two methods of EqualityTester. What, exactly, would go wrong if this requirement were violated?
All objects in Java have a hashCode method. We use that method by default, since EqualityTester.hashValue(key) simply returns key.hashCode() unless it is overridden. For example, our hash code for "decker" will be "decker".hashCode().

By experimentation (just write a small test class), figure out what Java's String.hashCode() actually does. Write a short, fast Java function that ought to compute the same thing. Hand in your function as part of your Readme file.
Is it possible to check whether the load factor is under 90% without using the relatively slow operation of division? (Hint: Use multiplication.)
You have already implemented HashDictionary.replaceValue(). Sketch how you would implement HashDictionary.replaceKey(). It is tempting for replaceKey() to call insert(k,v) to insert a new key/value pair; why would that be wrong?
You have already implemented HashDictionary.entries(). Sketch how you would implement a HashDictionary.keys() iterator that returns each key only once, even if there are multiple entries with that key. Assume you are stuck with the current design of HashDictionary, in which each bucket simply holds a linked list of Entrys.
Same as the previous question, but this time you are also allowed to change the design of HashDictionary. Can you come up with an easier and more efficient way to support iteration over the unique keys? (There is more than one good answer - extra credit for two!)

A User-Friendly Election

Now to use your hash table class in the service of society! Modify your ReportedSimpleElection class from last week so that it can handle the new, user-friendlier format of the ballots file. This will be an amazingly easy change. The output format and implementation were discussed in the introduction to this homework.

Your class should probably have a countVote(String name) method that looks up a candidate in the dictionary and gives him or her an extra vote. (If you didn't do something like this last week, it's time to improve your design - you will be graded on style too.)

Every candidate you have seen will have an entry on the priority queue, keyed by the number of votes he or she has so far. So countVote() has to look up this entry and increase its key, thereby moving it ahead on the priority queue.

You don't have a list of candidates ahead of time -- you will become aware of new candidates only as you see votes for them. The first time that countVote() receives a vote for a new candidate, it should enqueue that candidate with a key of one vote. The resulting entry on the priority queue should be inserted into the dictionary, keyed by the candidate's name so that it can be found again when it the candidate gets additional votes. Don't get confused between the Entries in the dictionary and the Entries on the priority queue:

Remember that you should ignore uppercase/lowercase distinctions in candidate names. As discussed in class, there are multiple ways to solve this - you could change the EqualityTester or you could change the keys. There are methods of String and Character that can help you convert or ignore case.

But when you print a candidate's name, you should use the capitalization from the first ballot you saw for that candidate. This policy means that if people are pretty consistent about capitalizing as "O'Malley," then you will probably see "O'Malley" first, which means you'll nicely print "O'Malley" rather than "O'malley" or "o'malley" or something. The result is illustrated in the introduction.

Note: Get rid of the array of candidate names, CANDIDATES, that you used last week. You don't need it anymore.

As before, test your class by running java ReportedSimpleElection 9 ballots.

Extra Credit

Extra credit: Improve your hash table so that its iterators are fail-fast.

What does that mean? A user is not supposed to modify the hash table in between calls to next(). So it is "okay" if the iterator does something weird in that case. However, the user might accidentally do this, leading to a bug that would be hard to find.

We generally strive to design classes that are hard to misuse, even accidentally. The methods of a fail-fast iterator are designed to throw an exception right away if they are misused, rather than returning a bogus value. Each public method of the iterator starts by checking that the hash table has not changed since the iterator was created. If it has, the iterator throws a ConcurrentModificationException.

The usual implementation is for the hash table to maintain a "modification count." The iterator just checks whether this modification count is the same as it was when the iterator was created.

Answer in your Readme file (if you are doing the extra credit):

The user is allowed to modify the hash table via the iterator's remove() method. Discuss how this should affect the modification count design. Consider especially the possibility of two iterators over the same hash table that exist at the same time, one of which calls remove() without the other's knowledge.

Electoral Reform

Coming in Homework 8!

Jason Eisner - jason@cs.jhu.edu - $Date: 2004/04/01 04:23:32 $