600.226 Data Structures (Spring 2004 - Prof. Eisner)

Homework 8 ... due 4pm Friday 9 Apr 2004

Summary

Goal of this homework: This is the third part of a 3-part assignment on election systems. You'll put several of your standard data structures together into a genuinely useful system! In a way, the course has been building up to this sort of thing. Data structures are valuable because they make applications like this much easier and more efficient. The interesting part here is in making multiple data structures work together.

Collaboration policy for this homework: You must follow the CS Department Academic Integrity Code, except that for this homework, you may optionally work with one other student in the class. You must name your partner (if any) at the top of your Readme file, and both of you will receive the same grade for the assignment. IMPORTANT:

How to hand in your work: instructions, upload page

What to hand in (all packaged up in a zip or tar file):

Acknowledgments: Thanks to Teresa Neighbor, the Executive Director of the Cambridge Election Commission, for help with the dataset.

Introduction

Building a Better Ballot

An election system is a method for defining society's preferences by combining individual preferences. One way to get a better election system is to allow individuals to express more detailed preferences. For example, here are the actual ballots from the 1999 City Council elections in Cambridge, MA:

Born, Davis, Braude, Snowberg, Winters
Toomey, Sullivan, Maher, Triantafillou
sullivan, galluccio, peixoto, born
Sullivan, Toomey, Galluccio, Born
MAHER, SULLIVAN, TOOMEY, GOODWIN, WINTERS, GALLUCCIO
HOICKA, GIACOBBE, WILLIAMSON, NIDLE, DIXON, SNOWBERG
Braude, Davis, Born, Triantafillou, Winters, Hoicka
Galluccio, Toomey, Sullivan
maher, galluccio, sullivan
Galluccio, Peixoto, Sullivan
  ...
click for full set of ballots

Notice that each ballot lists several candidates: 1st choice, 2nd choice, 3rd choice, until the voter doesn't care anymore. That's allowed because Cambridge uses the Single Transferable Vote (STV) system.

Why STV?

The idea behind STV is that you should never waste your vote. In most American elections, your vote could go to waste if you spend it on a third-party candidate ... or on someone who was going to win anyway. This distorts the results. Some voters don't get to express a preference between the major candidates (e.g., Nader voters in 2000 or Perot voters in 1992). Other voters choose to hold their noses and vote for the "wrong" person rather than waste their vote.

But in STV, if you vote for someone who turns out to be a loser, your vote automatically gets transferred to your second choice. More subtly, if you vote for someone who wins with 50% more votes than she needed, then she only needed 2/3 of your vote, and the other 1/3 is automatically transferred to your second choice. It's great!

How Does STV Work?

The procedure is formally like this. Suppose we are trying to elect an n-person City Council. To win a seat, a candidate must get more than 1/(n+1) of the ballots cast. This number of votes is called the quota. For our dataset, to win a seat on a 9-person City Council, a candidate must get more than the quota of 1877.7 of the 18,777 ballots cast. However, a candidate can get these ballots by transfer from other candidates.

Why is the quota formula 1/(n+1)? You are already familiar with the n=1 case -- to win the presidency, you have to get more than 1/2 of the votes. For general n, the quota formula is motivated by the observation that no more than n people can possibly achieve the quota. You can't have 10 people each getting more than 1/10 of the vote -- only 9 of them can do that.

Here's the procedure:

  1. As the ballots arrive at election headquarters, divide them into piles according to first choice.

  2. Compute the quota based on the total number of votes.

  3. Your job now is to figure out who is elected:

    1. Look at the biggest pile. If that person is over quota, he or she is elected. Any excess ballots in that pile are redistributed.
    2. Otherwise, no candidate is over quota. To make progress, eliminate the candidate with the smallest pile, and redistribute all the ballots in that pile. (This may push someone else over quota.)

    3. Repeat step 3 until n candidates have been elected.

Notice that as the pile sizes change, you always need to be able to find both the biggest pile (step 3a) and the smallest pile (step 3b). That's interesting from a data structures point of view. We'll come back to this point.

How Do You Redistribute Ballots?

To redistribute a ballot, you cross off names (starting at the top) until the top name left is someone who is still in the running -- that is, someone who hasn't been either elected or eliminated yet. Then you transfer it to that person's pile. (If all the names get crossed off, you have to throw that ballot in the trash. The voter should have listed some more choices - too bad.)

When a candidate is eliminated, you redistribute all of his or her ballots (step 3b above).

When a candidate wins, you redistribute just his or her "excess" ballots (step 3a above). That's tricky: which ballots are the excess ones? If someone needed 1877.7 votes to make quota, but got 1900 votes, how do you choose which 22.3 ballots to redistribute? It matters which ones you pick, because they might list different next choices.

Cambridge's system is to choose 22 of the 1900 ballots at random, and redistribute those. We will do something fairer and less capricious -- we will redistribute a fraction (22/1900) of each ballot.

As another example, if candidate A wins with 50% too many votes, we don't redistribute a random 1/3 of the ballots. Instead, we mark each individual ballot in the pile as only counting as 1/3 vote (since the other 2/3 of its vote was used up in helping to elect candidate A), and redistribute all of these "reduced" ballots.

Actually it's a little more interesting: some of the ballots in candidate A's pile might already have been fractional ballots. So they are marked as counting 1/3 as much as they did before.

Implementing an STV Election

Your job in this assignment is to write a class STVElection that can be run at election headquarters to conduct this procedure. If you run

java STVElection 9 ballots
then here is what the output should look like. ("First" and "last" refer to the candidates who are still in the running who have the most and fewest votes, respectively.)

26 candidates, 18777 votes, 9 seats.
A candidate needs to obtain more than 1877.7 votes to get elected.
first & last are (Galluccio, 2716.0 votes) & (Write-in-05, 1.0 votes)

   +++++ Electing Galluccio to seat #1 +++++
first & last are (Born, 1662.0 votes) & (Write-in-05, 1.0 votes)
   Redistributing 0.30865246 of each ballot for Galluccio to next choice if any
first & last are (BORN, 1696.5625 votes) & (Write-in-05, 1.0 votes)

   ----- Eliminating Write-in-05 -----
first & last are (BORN, 1696.5625 votes) & (Write-in-01, 28.617306 votes)
   Redistributing 1.0 of each ballot for Write-in-05 to next choice if any
first & last are (BORN, 1696.5625 votes) & (Write-in-01, 29.617306 votes)

   ----- Eliminating Write-in-01 -----
first & last are (BORN, 1696.5625 votes) & (wormwood-malone, 32.543262 votes)
   Redistributing 1.0 of each ballot for Write-in-01 to next choice if any
first & last are (Born, 1697.5625 votes) & (WORMWOOD-MALONE, 33.543262 votes)

            [... a bunch more stuff here ...]

   +++++ Electing DAVIS to seat #7 +++++
first & last are (reeves, 2236.742 votes) & (Maher, 1953.6962 votes)
   Redistributing 0.17724937 of each ballot for DAVIS to next choice if any
first & last are (reeves, 2351.3967 votes) & (MAHER, 2017.7767 votes)

   +++++ Electing reeves to seat #8 +++++
first & last are (MAHER, 2017.7767 votes) & (MAHER, 2017.7767 votes)
   Redistributing 0.20145339 of each ballot for reeves to next choice if any
first & last are (maher, 2090.009 votes) & (maher, 2090.009 votes)

   +++++ Electing maher to seat #9 +++++
first & last are nonexistent -- no more candidates!
   Redistributing 0.101582825 of each ballot for maher to next choice if any

Final list: [Galluccio, BORN, SULLIVAN, Decker, TOOMEY, BRAUDE, DAVIS, reeves, maher]

General Hints

A Note on Grading

On this assignment, a substantial part of your grade will depend on the elegance of your solution. Be simple, general, and safe in your design. Divide the problem up into intuitive, reusable classes. Divide each class up sensibly into nice methods. Use meaningful names And especially, avoid duplicate code!

Files Available

You can download some data and source code. Follow these instructions carefully. In particular, you will have to re-download the packages we gave you for HW7; we had to change them slightly (because the obfuscation was interfering with your ability to extend them).

(The original data files for the election are also online, if you're curious, but we have preprocessed them for you. We replaced candidates' numbers with their real names, and interpreted (or removed) spoiled ballots according to Cambridge's election rules.)

Data Structures You Should Use

Let's think about design. In my opinion, there are about five major kinds of objects involved in this problem: ballots, piles, the election itself, an ordering of piles by size, and an association of piles with candidates.

Those would be the main conceptual objects if you were counting STV votes by hand. In object-oriented programming, each kind of conceptual object should correspond to a class. The class should support methods appropriate for that kind of object. As you work on the assignment, you will find yourself adding new methods to do what's necessary.

Improving the Design

Let's draw the above design carefully (details of the shaded objects are omitted, and some classes are drawn schematically). We can now see that it's a bit awkward. One wasteful aspect is that we are storing 1696.5625 twice. It's also a little odd that those two redundant objects are pointing to each other.

The problem is that we decided to set up a priority deque of ballot piles. If the piles are the values on the priority deque, then their weights must be the keys. But then we end up storing two copies of the pile weight 1696.5625, as shown above.

Furthermore, we have to keep those two copies of the weight in sync. When the pile's weight increases (because we add new ballots), we must increase the entry's key correspondingly (via replaceKey()). That's why the pile has to point to the entry. (Conversely, the deque entry has to point back to the pile, so that when it's removed from the deque we can find the ballots and redistribute them.)

Can we rearrange this diagram to avoid these duplicate copies and extra pointers? Yes: we'll merge the two red things that point to each other. In the new design, the pile is the entry -- its weight and ballot stack are stored as the entry's key and value. It looks like this:

BallotPile is more than just an implementation of Entry. It should also have nicely named methods for adding ballots to the pile (by modifying the entry value), checking the weight of the pile (by looking at the entry key), and so forth. The reason to have a proper BallotPile class is so you'll have a sensible place to put such methods. Remember, each kind of conceptual object in the real world should correspond to a Java class.

Unfortunately, the design drawn above doesn't work well if we want to use our existing priority queue (or deque) classes. That's because the priority queue's insert() method creates the entries itself. It's not going to create a BallotPile for us. It'll create some other implementation of Entry that suits its own needs, such as the inner class PriorityQueue.MyEntry or AdaptablePriorityQueue.LocationAwareEntry.

It is possible to implement the above picture, but the solution is ugly. The election programmers would have to write a specialized priority queue class that uses BallotPile entries. (Yuck! That's awfully specialized.) Their class would extend the priority deque class, extending or overriding certain details of it. But then the election package's programmers would have to know the implementation details of the pqueue package, which isn't fair to them. They should just be able to use that package through its public interfaces.

It's much better to change the picture slightly. We can use the adapter pattern, and have BallotPile be a wrapper around Entry (rather than an extension of a particular implementation of Entry). When the pqueue's insert() method returns the new Entry, we can just wrap it up into a BallotPile and put that into the dictionary. Now we, as election programmers, don't have to know how the pqueue works internally or what class of Entry it returns. The result looks like this:

As before, BallotPile should have some nice methods for dealing with ballot piles. But now these methods will manipulate the Entry that is wrapped inside the pile. We've gotten you started on an implementation of BallotPile.

Implementing a Priority Deque

I've never heard of a priority deque. But it's obviously just what we need for this problem, since we want to remove both the smallest and the largest pile. So we may as well make up the word, and do it in a general way!

Here are the PriorityDeque and AdaptablePriorityDeque interfaces:

/** A "priority deque" is like a priority queue, except that
 * items can be removed from either end.  
 * 
 * We start with a priority queue, and add max() and removeMax()
 * methods to get the entry with maximum key.  We already have a way
 * to get the entry with minimum key.
 */

public interface PriorityDeque extends PriorityQueue {

  /** Returns but does not remove an entry with maximum key. */
  public Entry max() throws EmptyPriorityQueueException;

  /** Removes and returns an entry with maximum key. */
  public Entry removeMax() throws EmptyPriorityQueueException;
}


public interface AdaptablePriorityDeque extends AdaptablePriorityQueue, PriorityDeque {

  // Inherit all the methods from both the interfaces it extends.
  // No other methods are needed.
}

One could extend HeapAdaptablePriorityQueue with a max() method that simply does a complete search through the heap tree (or its external nodes only) to find the entry of maximum key. You can do it that way if you're desperate, but then max() is O(n) rather than O(log n), so you won't get full credit for it.

The right solution is to implement a repriority deque using two repriority queues. One lets us find the minimum, and the other lets us find the maximum. (Their comparators are mirror images of each other.)

To insert a new entry into the deque, we should actually insert "twin" entries on both queues, with the same key and value. To change the deque entry's key, we have to reprioritize both twins on their queues. And to remove a deque entry, we have to remove both twins from their queues.

The min() and max() methods only find an entry on one of the queues. If we want to remove the min or max from the deque entirely, we also have to find its twin on the other queue. So the two twins need to point to each other.

Where should each entry store the pointer to its twin? We'll have to continue the progression of fancier and fancier implementations of Entry:

Pqueue Class Nested Entry Class Contents of Entry class
HeapPriorityQueue MyEntry key, value
HeapAdaptablePriorityQueue LocationAwareEntry key, value, position
TwoHeapAdaptablePriorityDeque  TwinnedLocationAwareEntry  key, value, position, twin entry

We have written most of a TwoHeapAdaptablePriorityDeque class for you, including a test method. But you have to fill in some little bits. One thing you will have to write is the nested class ReverseComparator, which produces the mirror image of an existing comparator. For inspiration on how to do that, check out JavaComparator and NullComparator on the Homework 6 webpage.

Inside TwoHeapAdaptablePriorityDeque, we extended the inner class LocationAwareEntry into TwinnedLocationAwareEntry, and overrode insert() so that it inserts the latter. This was a little trickier than you might expect. Read the code to figure out how it's done.

Answer in your Readme file:

  1. There is apparently some redundancy in this priority deque implementation. The same key and value are being stored on both heaps. How much extra memory does it take to store the key and value twice instead of once?

  2. What if you stored (key, value) on the first heap, but (key, null) on the second heap. Would that work? Does this save any memory? Does it make any of the methods any faster? Would the entries returned by the priority deque still correctly implement the Entry interface?

  3. Each TwinnedLocationAwareEntry stores a key, an value, a position, and a pointer to another TwinnedLocationAwareEntry. So our design requires us to create 2 objects storing 8 pointers each time we put something on the priority queue. Can you think of a priority deque design where insertion requires only half as much memory -- i.e., 1 object storing 4 pointers? (Ignore the space required for the CompleteBinaryTrees; just pay attention to what is stored in those trees.) Hint: Think about the original design for location-aware entries.

  4. When we first discussed comparators in class, some students argued that they were unnecessary. They suggested that we could write our priority queue to use the object's native compareTo method; if we wanted to use a different comparison function, we could subclass the object and override its compareTo method. Would that work well in this case? Discuss.

  5. Earlier in this assignment, I argued that it would be bad design to make a special priority queue class that extended LocationAwareEntry into a specialized kind of entry, BallotPile. But here we are making such a class that extends LocationAwareEntry into another specialized kind of entry, TwinnedLocationAwareEntry. This requires the same trickery (overriding insert(), calling upgradeLocator()). So why isn't it just as bad?

  6. In the implementation we gave you, TwinnedLocationAwareEntry extends LocationAwareEntry. Suppose you changed it to be a wrapper around the LocationAwareEntry (just as we ultimately did with BallotPile). Would that let you eliminate the inelegant upgradeLocator() code? Would it make your code nicer or uglier in other ways?

  7. Try running your STVElection main program with different numbers of council seats. Sometimes the program won't fill all the seats - e.g., java STVElection 6 ballots only elects 5 candidates, not 6. Why does this happen? How might you modify the STV procedure to fix it?

Conclusion

Congratulations! You've solved a non-trivial real-world problem. I trust you were pleased with the speed with which your program was able to tabulate 18,000 ballots according to this complicated procedure.

There are actually companies that make a living by selling software to do this kind of thing ... and now you know enough to do so, too!

Or if you don't want to start a business, you can use your program to elect the leadership of your extracurricular club. Seriously. Just test it carefully first so you don't get run out of town on a rail by angry club members.


Jason Eisner - jason@cs.jhu.edu - $Date: 2004/04/08 03:58:52 $