Goal of this homework: This is the third part of a 3-part assignment on election systems. You'll put several of your standard data structures together into a genuinely useful system! In a way, the course has been building up to this sort of thing. Data structures are valuable because they make applications like this much easier and more efficient. The interesting part here is in making multiple data structures work together.
Collaboration policy for this homework: You must follow the
CS Department Academic
Integrity Code, except that for this homework, you may
optionally work with one other student in the class. You must name
your partner (if any) at the top of your Readme
file, and
both of you will receive the same grade for the assignment.
IMPORTANT:
How to hand in your work: instructions, upload page
What to hand in (all packaged up in a zip or tar file):
Your commented code: Any .java
files, written either by us or by
you, that are necessary to make your program run correctly.
Please make sure that you include everything that is needed
to make your code compile!
Also make sure that your zip file or tar file preserves the correct directory structure, with every package in its own directory.
A doc
subdirectory containing HTML documentation
that you have generated with the command javadoc -private
-linksource -d doc *.java
. The graders will look at this
documentation, so you should provide javadoc comments for all of
your methods, including private ones.
A Readme
file that contains your
answers to miscellaneous questions on the homework (i.e., answers that
don't go in any other file). Usually these are English paragraphs or small
fragments of code. The top of your Readme
should
state the name of your partner (if any).
Acknowledgments: Thanks to Teresa Neighbor, the Executive Director of the Cambridge Election Commission, for help with the dataset.
An election system is a method for defining society's preferences by combining individual preferences. One way to get a better election system is to allow individuals to express more detailed preferences. For example, here are the actual ballots from the 1999 City Council elections in Cambridge, MA:
Born, Davis, Braude, Snowberg, Winters Toomey, Sullivan, Maher, Triantafillou sullivan, galluccio, peixoto, born Sullivan, Toomey, Galluccio, Born MAHER, SULLIVAN, TOOMEY, GOODWIN, WINTERS, GALLUCCIO HOICKA, GIACOBBE, WILLIAMSON, NIDLE, DIXON, SNOWBERG Braude, Davis, Born, Triantafillou, Winters, Hoicka Galluccio, Toomey, Sullivan maher, galluccio, sullivan Galluccio, Peixoto, Sullivan ... |
Notice that each ballot lists several candidates: 1st choice, 2nd choice, 3rd choice, until the voter doesn't care anymore. That's allowed because Cambridge uses the Single Transferable Vote (STV) system.
The idea behind STV is that you should never waste your vote. In most American elections, your vote could go to waste if you spend it on a third-party candidate ... or on someone who was going to win anyway. This distorts the results. Some voters don't get to express a preference between the major candidates (e.g., Nader voters in 2000 or Perot voters in 1992). Other voters choose to hold their noses and vote for the "wrong" person rather than waste their vote.
But in STV, if you vote for someone who turns out to be a loser, your vote automatically gets transferred to your second choice. More subtly, if you vote for someone who wins with 50% more votes than she needed, then she only needed 2/3 of your vote, and the other 1/3 is automatically transferred to your second choice. It's great!
The procedure is formally like this. Suppose we are trying to elect an n-person City Council. To win a seat, a candidate must get more than 1/(n+1) of the ballots cast. This number of votes is called the quota. For our dataset, to win a seat on a 9-person City Council, a candidate must get more than the quota of 1877.7 of the 18,777 ballots cast. However, a candidate can get these ballots by transfer from other candidates.
Why is the quota formula 1/(n+1)? You are already familiar with the n=1 case -- to win the presidency, you have to get more than 1/2 of the votes. For general n, the quota formula is motivated by the observation that no more than n people can possibly achieve the quota. You can't have 10 people each getting more than 1/10 of the vote -- only 9 of them can do that.
Here's the procedure:
As the ballots arrive at election headquarters, divide them into piles according to first choice.
Compute the quota based on the total number of votes.
Your job now is to figure out who is elected:
Otherwise, no candidate is over quota. To make progress, eliminate the candidate with the smallest pile, and redistribute all the ballots in that pile. (This may push someone else over quota.)
Repeat step 3 until n candidates have been elected.
To redistribute a ballot, you cross off names (starting at the top) until the top name left is someone who is still in the running -- that is, someone who hasn't been either elected or eliminated yet. Then you transfer it to that person's pile. (If all the names get crossed off, you have to throw that ballot in the trash. The voter should have listed some more choices - too bad.)
When a candidate is eliminated, you redistribute all of his or her ballots (step 3b above).
When a candidate wins, you redistribute just his or her "excess" ballots (step 3a above). That's tricky: which ballots are the excess ones? If someone needed 1877.7 votes to make quota, but got 1900 votes, how do you choose which 22.3 ballots to redistribute? It matters which ones you pick, because they might list different next choices.
Cambridge's system is to choose 22 of the 1900 ballots at random, and redistribute those. We will do something fairer and less capricious -- we will redistribute a fraction (22/1900) of each ballot.
As another example, if candidate A wins with 50% too many votes, we don't redistribute a random 1/3 of the ballots. Instead, we mark each individual ballot in the pile as only counting as 1/3 vote (since the other 2/3 of its vote was used up in helping to elect candidate A), and redistribute all of these "reduced" ballots.
Actually it's a little more interesting: some of the ballots in candidate A's pile might already have been fractional ballots. So they are marked as counting 1/3 as much as they did before.
Your job in this assignment is to write a class
STVElection
that can be run at election headquarters to
conduct this procedure. If you run
java STVElection 9
ballots
26 candidates, 18777 votes, 9 seats. A candidate needs to obtain more than 1877.7 votes to get elected. first & last are (Galluccio, 2716.0 votes) & (Write-in-05, 1.0 votes) +++++ Electing Galluccio to seat #1 +++++ first & last are (Born, 1662.0 votes) & (Write-in-05, 1.0 votes) Redistributing 0.30865246 of each ballot for Galluccio to next choice if any first & last are (BORN, 1696.5625 votes) & (Write-in-05, 1.0 votes) ----- Eliminating Write-in-05 ----- first & last are (BORN, 1696.5625 votes) & (Write-in-01, 28.617306 votes) Redistributing 1.0 of each ballot for Write-in-05 to next choice if any first & last are (BORN, 1696.5625 votes) & (Write-in-01, 29.617306 votes) ----- Eliminating Write-in-01 ----- first & last are (BORN, 1696.5625 votes) & (wormwood-malone, 32.543262 votes) Redistributing 1.0 of each ballot for Write-in-01 to next choice if any first & last are (Born, 1697.5625 votes) & (WORMWOOD-MALONE, 33.543262 votes) [... a bunch more stuff here ...] +++++ Electing DAVIS to seat #7 +++++ first & last are (reeves, 2236.742 votes) & (Maher, 1953.6962 votes) Redistributing 0.17724937 of each ballot for DAVIS to next choice if any first & last are (reeves, 2351.3967 votes) & (MAHER, 2017.7767 votes) +++++ Electing reeves to seat #8 +++++ first & last are (MAHER, 2017.7767 votes) & (MAHER, 2017.7767 votes) Redistributing 0.20145339 of each ballot for reeves to next choice if any first & last are (maher, 2090.009 votes) & (maher, 2090.009 votes) +++++ Electing maher to seat #9 +++++ first & last are nonexistent -- no more candidates! Redistributing 0.101582825 of each ballot for maher to next choice if any Final list: [Galluccio, BORN, SULLIVAN, Decker, TOOMEY, BRAUDE, DAVIS, reeves, maher] |
There are lots of other explanations of STV on the web, if you want to read more. (But be warned that there are some other variants of STV - you should implement the one described here.)
You can check your output against the detailed counting in the Cambridge elections, available here and here. The numbers are not quite identical, because the Cambridge system redistributes votes randomly instead of fractionally. (Also, the Cambridge system has some cosmetic differences.) But the numbers will be very close.
A note on capitalization: The name associated with each pile is easily taken from the ballot on top of that pile. In the trace above, you can see Born's name changing capitalization as more ballots are added to her pile.
On this assignment, a substantial part of your grade will depend on the elegance of your solution. Be simple, general, and safe in your design. Divide the problem up into intuitive, reusable classes. Divide each class up sensibly into nice methods. Use meaningful names And especially, avoid duplicate code!
You can download some data and source code. Follow these instructions carefully. In particular, you will have to re-download the packages we gave you for HW7; we had to change them slightly (because the obfuscation was interfering with your ability to extend them).
(The original data files for the election are also online, if you're curious, but we have preprocessed them for you. We replaced candidates' numbers with their real names, and interpreted (or removed) spoiled ballots according to Cambridge's election rules.)
Let's think about design. In my opinion, there are about five major kinds of objects involved in this problem: ballots, piles, the election itself, an ordering of piles by size, and an association of piles with candidates.
Those would be the main conceptual objects if you were counting STV votes by hand. In object-oriented programming, each kind of conceptual object should correspond to a class. The class should support methods appropriate for that kind of object. As you work on the assignment, you will find yourself adding new methods to do what's necessary.
Ballots. You can think of a ballot as a queue of names (Strings). As you read the successive candidates from a ballot (1st choice, 2nd choice, 3rd choice ...), you will put them on the back of the queue. To find out what pile the ballot should go in, you look at the front of the queue. To cross off the top-ranked candidate if he/she is elected or eliminated, you just dequeue a string from the front of the queue.
To implement the queue, use the ArrayQueue
class you
wrote in Homework 3. You should put this code in
a queue
package. We have actually provided you with a
simple Ballot
class that extends ArrayQueue
. The main extension is
that a ballot isn't just a queue of names; it also has a
fractional weight. A ballot's weight is initially 1 -- i.e.,
one person, one vote -- but it may get reduced if only part of that
vote is redistributed to a new pile.
Piles. A pile is collection of ballots. The order of
the ballots in a pile doesn't matter, so any collection type will do.
java.util.Stack
is probably easiest and most
efficient.
But we also need to be able to find out quickly how big the pile
is. Stack.size()
will only tell you the number of
ballots, without taking into account that some of them are fractional.
You want the total weight of the ballots. It is probably wise to
store this total weight as part of the pile. When ballots are added,
this total weight must be updated.
You may find that it is useful for ballot piles to
support interesting constructors, and other methods. My personal
implementation evolved to include several of these. For example,
Pile.toString()
made it easy to print the phrases like
(Galluccio, 2716.0 votes)
in the output above.
In general, when you need to write a new method, ask yourself which class it should go into.
An ordering of piles by size. We want to be able to find and remove the biggest pile, as in the last 2 assignments. But we also want to be able to find and remove the smallest pile ... so a priority queue of piles isn't quite enough. We need something like a priority deque! See the section "Implementing a Priority Deque" below.
An association between candidate names and piles. This
is basically just the dictionary that you used in Homework 7 to map
from candidate names to Locator
s. Now you'll map
candidate names to BallotPile
s, whch is almost the same
thing. You won't have to write any new classes for this, but you'll
want to use your HashDictionary
class from Homework 7.
If you didn't finish Homework 7, you may use java.util.HashMap
instead.
Once you elect or eliminate a candidate, I suggest that you change
that candidate's dictionary entry to have a value()
of null
.
This records that the pile has been eliminated. (Having an entry
with value null
is different from not having an entry
at all. A candidate will have an entry if he or she has ever received
any ballots.)
The election. You must write an
STVElection
class. Note that like any class, it might
have many instances; each instance is created with new
STVElection()
and corresponds to a different election. An
election needs a bunch of piles, stored in a priority deque and a
dictionary as described above. It will also have some internal
variables for the number of votes, the number of seats to fill, etc.
Some useful methods you might consider writing, other than
main()
:
STVElection(int numSeats) ... // constructor BallotPile pileOf(Ballot b) ... // finds or adds the appropriate pile, unless previously eliminated void addBallot(Ballot b) ... void addBallots(BufferedReader br) ... void dispersePile(BallotPile pile, float multiplier) ... // deletes pile and redistributes its ballots String electBestCandidate() ... String eliminateWorstCandidate() ... float quota() ... void printBestWorst() ... List runElection() ... // returns a java.util.List of elected candidates |
Let's draw the above design carefully (details of the shaded objects are omitted, and some classes are drawn schematically). We can now see that it's a bit awkward. One wasteful aspect is that we are storing 1696.5625 twice. It's also a little odd that those two redundant objects are pointing to each other.
The problem is that we decided to set up a priority deque of ballot piles. If the piles are the values on the priority deque, then their weights must be the keys. But then we end up storing two copies of the pile weight 1696.5625, as shown above.
Furthermore, we have to keep those two copies of the weight in
sync. When the pile's weight increases (because we add new ballots),
we must increase the entry's key correspondingly (via
replaceKey()
). That's why the pile has to point to the
entry. (Conversely, the deque entry has to point back to the pile, so
that when it's removed from the deque we can find the
ballots and redistribute them.)
Can we rearrange this diagram to avoid these duplicate copies and extra pointers? Yes: we'll merge the two red things that point to each other. In the new design, the pile is the entry -- its weight and ballot stack are stored as the entry's key and value. It looks like this:
BallotPile
is more than just an implementation of
Entry
. It should also have nicely named methods for
adding ballots to the pile (by modifying the entry value), checking
the weight of the pile (by looking at the entry key), and so forth.
The reason to have a proper BallotPile
class is so you'll
have a sensible place to put such methods. Remember, each kind of
conceptual object in the real world should correspond to a Java
class.
Unfortunately, the design drawn above doesn't work well if we want
to use our existing priority queue (or deque) classes. That's
because the priority queue's insert()
method creates the
entries itself. It's not going to create a BallotPile
for us. It'll create some other implementation of Entry
that suits its own needs, such as the inner class
PriorityQueue.MyEntry
or
AdaptablePriorityQueue.LocationAwareEntry
.
It is possible to implement the above picture, but the
solution is ugly. The election programmers would have to write a
specialized priority queue class that uses BallotPile
entries. (Yuck! That's awfully specialized.) Their class would
extend the priority deque class, extending or overriding certain
details of it. But then the election
package's
programmers would have to know the implementation details of the
pqueue
package, which isn't fair to them. They should
just be able to use that package through its public interfaces.
It's much better to change the picture slightly. We can use the
adapter pattern, and have BallotPile
be a wrapper
around Entry
(rather than an extension of a
particular implementation of Entry
). When the pqueue's
insert()
method returns the new Entry
, we
can just wrap it up into a BallotPile
and put that into
the dictionary. Now we, as election programmers, don't have to know
how the pqueue works internally or what class of Entry
it
returns. The result looks like this:
As before, BallotPile
should have some nice methods
for dealing with ballot piles. But now these methods will manipulate
the Entry
that is wrapped inside the pile. We've gotten
you started on an implementation of BallotPile
.
I've never heard of a priority deque. But it's obviously just what we need for this problem, since we want to remove both the smallest and the largest pile. So we may as well make up the word, and do it in a general way!
Here are the PriorityDeque
and AdaptablePriorityDeque
interfaces:
/** A "priority deque" is like a priority queue, except that * items can be removed from either end. * * We start with a priority queue, and add max() and removeMax() * methods to get the entry with maximum key. We already have a way * to get the entry with minimum key. */ public interface PriorityDeque extends PriorityQueue { /** Returns but does not remove an entry with maximum key. */ public Entry max() throws EmptyPriorityQueueException; /** Removes and returns an entry with maximum key. */ public Entry removeMax() throws EmptyPriorityQueueException; } public interface AdaptablePriorityDeque extends AdaptablePriorityQueue, PriorityDeque { // Inherit all the methods from both the interfaces it extends. // No other methods are needed. } |
One could extend HeapAdaptablePriorityQueue
with a
max()
method that simply does a complete search through
the heap tree (or its external nodes only) to find the entry of
maximum key. You can do it that way if you're desperate, but then
max()
is O(n) rather than O(log n), so you won't get full
credit for it.
The right solution is to implement a repriority deque using two repriority queues. One lets us find the minimum, and the other lets us find the maximum. (Their comparators are mirror images of each other.)
To insert a new entry into the deque, we should actually insert "twin" entries on both queues, with the same key and value. To change the deque entry's key, we have to reprioritize both twins on their queues. And to remove a deque entry, we have to remove both twins from their queues.
The min()
and max()
methods only find an
entry on one of the queues. If we want to remove the min or max from
the deque entirely, we also have to find its twin on the other queue.
So the two twins need to point to each other.
Where should each entry store the pointer to its twin?
We'll have to continue the progression of fancier and fancier
implementations of Entry
:
Pqueue Class | Nested Entry Class |
Contents of Entry class |
HeapPriorityQueue |
MyEntry |
key, value |
HeapAdaptablePriorityQueue |
LocationAwareEntry |
key, value, position |
TwoHeapAdaptablePriorityDeque |
TwinnedLocationAwareEntry |
key, value, position, twin entry |
We have written most of a TwoHeapAdaptablePriorityDeque
class for you, including a test method. But you have to fill in some
little bits. One thing you will have to write is the nested class
ReverseComparator
, which produces the mirror image of an
existing comparator. For inspiration on how to do that, check out
JavaComparator
and NullComparator
on the Homework 6 webpage.
Inside TwoHeapAdaptablePriorityDeque
, we extended the
inner class LocationAwareEntry
into
TwinnedLocationAwareEntry
, and overrode
insert()
so that it inserts the latter. This was a
little trickier than you might expect. Read the code to figure out
how it's done.
Answer in your
Readme
file:
There is apparently some redundancy in this priority deque implementation. The same key and value are being stored on both heaps. How much extra memory does it take to store the key and value twice instead of once?
What if you stored (key, value) on the first heap, but (key,
null) on the second heap. Would that work? Does this save any
memory? Does it make any of the methods any faster? Would the
entries returned by the priority deque still correctly implement the
Entry
interface?
Each TwinnedLocationAwareEntry
stores a key, an
value, a position, and a pointer to another
TwinnedLocationAwareEntry
. So our design requires us to
create 2 objects storing 8 pointers each time we put something on the
priority queue. Can you think of a priority deque design where
insertion requires only half as much memory -- i.e., 1 object storing
4 pointers? (Ignore the space required for the
CompleteBinaryTree
s; just pay attention to what is stored
in those trees.) Hint: Think about the original design for location-aware entries.
When we first discussed comparators in class, some students
argued that they were unnecessary. They suggested that we could write
our priority queue to use the object's native compareTo
method; if we wanted to use a different comparison function, we could
subclass the object and override its compareTo
method.
Would that work well in this case? Discuss.
Earlier in this assignment, I
argued that it would be bad design to make a special priority queue
class that extended LocationAwareEntry
into a specialized
kind of entry, BallotPile
. But here we are making such
a class that extends LocationAwareEntry
into another
specialized kind of entry, TwinnedLocationAwareEntry
.
This requires the same trickery (overriding insert()
,
calling upgradeLocator()
). So why isn't it just as
bad?
In the implementation we gave you,
TwinnedLocationAwareEntry
extends
LocationAwareEntry
. Suppose you changed it to be a
wrapper around the LocationAwareEntry
(just as we
ultimately did with BallotPile
). Would that let you
eliminate the inelegant upgradeLocator()
code? Would it
make your code nicer or uglier in other ways?
Try running your STVElection
main program with
different numbers of council seats. Sometimes the program won't fill
all the seats - e.g., java STVElection 6 ballots
only
elects 5 candidates, not 6. Why does this happen? How might you
modify the STV procedure to fix it?
Congratulations! You've solved a non-trivial real-world problem. I trust you were pleased with the speed with which your program was able to tabulate 18,000 ballots according to this complicated procedure.
There are actually companies that make a living by selling software to do this kind of thing ... and now you know enough to do so, too!
Or if you don't want to start a business, you can use your program to elect the leadership of your extracurricular club. Seriously. Just test it carefully first so you don't get run out of town on a rail by angry club members.
Jason Eisner - jason@cs.jhu.edu
- $Date: 2004/04/08 03:58:52 $