600.226 Data Structures (Spring 2004 - Prof. Eisner)

Homework 7 ... due 4pm Friday 2 Apr 2004

Summary

This is the second part of a three-part project.

Goal of this homework: Learn to implement hash tables and apply them to a realistic problem. Also familiarize yourself with some standard Java classes, such as the java.util.List interface and java.lang.String. Finally, start using packages.

Collaboration policy for this homework: You must follow the CS Department Academic Integrity Code and this time may not collaborate with another student. Exception: This assignment builds on homework 6's ReportedSimpleElection class. You may work with your homework 6 partner to fix any bugs in HW6, or to revise it to work with the packaged priority-queue interface. See the "Packages" section below.

How to hand in your work: instructions, upload page

What to hand in (all packaged up in a zip or tar file):

Acknowledgments: Thanks to Josh Dunkelman for his help in preparing this assignment.

Introduction

Last week in Homework 6, you built a vote-counting system where the ballots were numbers. Unfortunately, when your system was used in an election over the weekend, many voters complained. They couldn't remember their candidate's number, or they got 6's and 9's confused, or they forgot they were voting and wrote down their lucky Lotto numbers instead.

So this week, you will make your system more user-friendly. No more candidate numbers. You'll make it work when the ballots are names. Click on the link now to see the data. Notice that different voters use different capitalization; so your system will have to recognize that DECKER and Decker refer to the same candidate.

The output of your system will be very much like it was before, but without the candidate numbers.

Born pulls into the lead with 1 of 1 votes
sullivan pulls into the lead with 2 of 4 votes
   ...
(2730 votes) Galluccio
(1669 votes) Born
(1653 votes) DECKER
   ...

Last week, if you saw a vote for candidate 6, then you looked up her locator in an array. The array was a kind of dictionary that mapped integers to locators. This week, you will see a vote for "decker" instead, so you need a dictionary that can map strings to locators. In terms of the Homework 6 diagram, you will upgrade the blue array to become a blue dictionary.

Most of the work this week lies in writing a HashDictionary class that implements Dictionary. As a general dictionary class, it will be able to map arbitrary key objects to arbitrary element objects. At the end of the homework, you will modify your ReportedSimpleElection to use this dictionary as described above.

Packages

In this assignment (and subsequent assignments), you will be dealing with several interacting data structures. That involves a confusing number of .java files. Before things get out of hand, you should start using Java's package mechanism to keep everything organized.

Download and examine this solution to most of Homework 6, which has been reorganized into 3 packages as explained here. You should use these packages as a starting point for this assignment. (You may want to read something about how Java packages work. It's quite easy. Section 1.8 of the textbook covers it in 2 pages, or you could check out this online tutorial. The tutorial doesn't mention an alternative way to set the classpath in Windows: Start / Settings / Control Panel / System / Advanced / Environment Variables.)

You should now create a fourth package: election. Your election directory should contain a copy of the ReportedSimpleElection.java that you wrote for Homework 6. You will need to modify this file slightly so that it works with the package system. It should now begin with a package declaration and perhaps some import directives:

package election;                          // says what package this class is in
import pqueue.HeapAdaptablePriorityQueue;  // says what package another class is in
// or just "import pqueue.*;" to import all class names from pqueue package
...

You don't have to import class names from other packages, but if you don't, you will have to spell them out in full every time you use them (e.g., new pqueue.HeapAdaptablePriorityQueue() instead of just new HeapAdaptablePriorityQueue()).

If your election.ReportedSimpleElection class depends on other supporting classes (such as special comparators), put those into the election class as well.

Make sure that you can compile and run the full solution to homework 6, in its packaged form. This should all be pretty easy if you got homework 6 working correctly. If not, finish writing, debugging and testing ReportedSimpleElection before you continue.

Note: If you did the 2nd-edition version of Homework 6, you will also have to edit your ReportedSimpleElection slightly to work with the 3rd-edition PriorityQueue/AdaptablePriorityQueue interface, since that's what's in the provided packages. Prof. Eisner or the course staff would be happy to help you with that.

One more thing: include a file election/package.html file that briefly describes what the election package does. This file will be used by Javadoc. (See the other packages for examples.)

You may work with your homework 6 partner to get the above things running. For the rest of the assignment, you are on your own.

Implementing a Dictionary

Now create a fifth package, map, to hold map and dictionary classes. (You'll only be implementing a dictionary for this assignment.)

Download the partial map package that we're providing you. It holds a Map interface extended by Dictionary, and an incomplete HashDictionary implementation for you to finish.

The Dictionary interface has more methods than you actually need for this application, but this homework asks you to implement and test the entire interface anyway. We have provided a test method in HashDictionary.

Your hash table should have the following design:

Hints on the iterator methods:

Finally, don't forget to test your HashDictionary class. A pretty good test method was provided for you above.

Answer in your Readme file:

  1. Why do we use a EqualityTester rather than just a Comparator?

  2. The EqualityTester documentation requires that if equal(key1,key2) then we need hash(key1)==hash(key2). In other words, there is a compatibility requirement between the two methods of EqualityTester. What, exactly, would go wrong if this requirement were violated?

  3. All objects in Java have a hashCode method. We use that method by default, since EqualityTester.hashValue(key) simply returns key.hashCode() unless it is overridden. For example, our hash code for "decker" will be "decker".hashCode().

    By experimentation (just write a small test class), figure out what Java's String.hashCode() actually does. Write a short, fast Java function that ought to compute the same thing. Hand in your function as part of your Readme file.

  4. Is it possible to check whether the load factor is under 90% without using the relatively slow operation of division? (Hint: Use multiplication.)

  5. You have already implemented HashDictionary.replaceValue(). Sketch how you would implement HashDictionary.replaceKey(). It is tempting for replaceKey() to call insert(k,v) to insert a new key/value pair; why would that be wrong?

  6. You have already implemented HashDictionary.entries(). Sketch how you would implement a HashDictionary.keys() iterator that returns each key only once, even if there are multiple entries with that key. Assume you are stuck with the current design of HashDictionary, in which each bucket simply holds a linked list of Entrys.

  7. Same as the previous question, but this time you are also allowed to change the design of HashDictionary. Can you come up with an easier and more efficient way to support iteration over the unique keys? (There is more than one good answer - extra credit for two!)

A User-Friendly Election

Now to use your hash table class in the service of society! Modify your ReportedSimpleElection class from last week so that it can handle the new, user-friendlier format of the ballots file. This will be an amazingly easy change. The output format and implementation were discussed in the introduction to this homework.

Your class should probably have a countVote(String name) method that looks up a candidate in the dictionary and gives him or her an extra vote. (If you didn't do something like this last week, it's time to improve your design - you will be graded on style too.)

Every candidate you have seen will have an entry on the priority queue, keyed by the number of votes he or she has so far. So countVote() has to look up this entry and increase its key, thereby moving it ahead on the priority queue.

You don't have a list of candidates ahead of time -- you will become aware of new candidates only as you see votes for them. The first time that countVote() receives a vote for a new candidate, it should enqueue that candidate with a key of one vote. The resulting entry on the priority queue should be inserted into the dictionary, keyed by the candidate's name so that it can be found again when it the candidate gets additional votes. Don't get confused between the Entries in the dictionary and the Entries on the priority queue:

Remember that you should ignore uppercase/lowercase distinctions in candidate names. As discussed in class, there are multiple ways to solve this - you could change the EqualityTester or you could change the keys. There are methods of String and Character that can help you convert or ignore case.

But when you print a candidate's name, you should use the capitalization from the first ballot you saw for that candidate. This policy means that if people are pretty consistent about capitalizing as "O'Malley," then you will probably see "O'Malley" first, which means you'll nicely print "O'Malley" rather than "O'malley" or "o'malley" or something. The result is illustrated in the introduction.

Note: Get rid of the array of candidate names, CANDIDATES, that you used last week. You don't need it anymore.

As before, test your class by running java ReportedSimpleElection 9 ballots.

Extra Credit

Extra credit: Improve your hash table so that its iterators are fail-fast.

What does that mean? A user is not supposed to modify the hash table in between calls to next(). So it is "okay" if the iterator does something weird in that case. However, the user might accidentally do this, leading to a bug that would be hard to find.

We generally strive to design classes that are hard to misuse, even accidentally. The methods of a fail-fast iterator are designed to throw an exception right away if they are misused, rather than returning a bogus value. Each public method of the iterator starts by checking that the hash table has not changed since the iterator was created. If it has, the iterator throws a ConcurrentModificationException.

The usual implementation is for the hash table to maintain a "modification count." The iterator just checks whether this modification count is the same as it was when the iterator was created.

Answer in your Readme file (if you are doing the extra credit):

  1. The user is allowed to modify the hash table via the iterator's remove() method. Discuss how this should affect the modification count design. Consider especially the possibility of two iterators over the same hash table that exist at the same time, one of which calls remove() without the other's knowledge.

Electoral Reform

Coming in Homework 8!


Jason Eisner - jason@cs.jhu.edu - $Date: 2004/04/01 04:23:32 $