[EM] A Deterministic Version of Rob LeGrand's Ballot by Ballot DSV

Thu Jul 1 15:53:03 PDT 2004

In my humble opinion one of the best methods ever invented is Rob
LeGrand's Ballot by Ballot DSV based on Approval Strategy A.

Here's my rendition of the stochastic version (with apologies to Rob):

1. Shuffle the ranked ballots (equal rankings allowed) into random order.

2. [my initialization procedure] Use the Borda Count to initialize
approvals; each candidate's initial approval is its Borda Count normalized
by dividing by ten times the largest such count, so all initial approvals
are decimals with first decimal place occupied by a digit less than two.

[From this point on all approval increments will be whole numbers, so
there will be no tied approvals, assuming there were no tied Borda
Counts.]

3. While any ballots are left to be processed,

     Take the next ballot B and increment (by one) the approval of
     each candidate ranked (on B) above the current approval champion, and

       If there is some candidate ranked (on ballot B) below this "champ"
       that has more current approval than any candidate ranked (on B)
       above this champ,
       Then also increment the approval for the current champ.

4. The winner is the approval champ after all the ballots have been
processed.

This method is stochastic because of the first step, randomization of the
order of the ballots.

But what if we replaced the random shuffle of the ballots by a pseudo
random shuffle?  Then we would have a deterministic method that would
be statistically indistinguishable from the stochastic version.

There are two details to worry about.

(1) Given a list L of ballots, how do we do a "deterministic shuffle?"

(2) How do we determine a starting order L to which we may apply our
shuffle?

The first question is the easiest to answer.  Suppose we have a list L of
ballots.  We can get a shuffled list L' by the "taffy pull" method of
folding and blending:

Suppose, for example, that the original list has 100 ballots, then the
order for L' would be 100, 1, 99, 2, 98, 3, 97, 4, ... 51, 50 .

No matter what deterministic order we start with for L, four applications
of this shuffle L -> L' -> L'' -> L''' -> L'''' would be a pseudo
randomization adequate for statistical purposes.

The second question is a little more difficult, because we want to get the
same answer no matter the order in which the ballots were cast or
collected.  We want this to work for secret ballots, and if the names of
the candidates were permuted, we would want the name of the winner to
follow the same permutation.

So first we find the Border order of the candidates, and assign to each
candidate a letter of the alphabet corresponding to his placement by
Borda.  The candidate with the highest Borda score gets the label A, the
second highest, B, etc.

Next encode each ballot by making use of the labels according to our
our customary usage on the EM list.

For example,  B > A = D > C .

Finally, form the list L by sorting the coded ballots lexicographically
[using the ASCII order for the blanks and symbols like ">" and "=" ].

Note that if there had to be a recount, and the ballot boxes got dumped on
the floor between the original and the recount, the recount would assign
the same labels to the same candidates, it would encode the ballots in the
same way, it would list them in the same order L, and the "taffy pulled"
list L'''' would be the same, so the outcome of the ballot-by-ballot
approval count would be the same.

If an unordered copy of the original ballots were sent to some
international election watch organization for independent certification of
the outcome, they would find the same winner if they followed the same
procedure.

The results are indeed reproducible, because the method is deterministic,
but in practice it would be as impossible to second guess as Rob's
stochastic version.

I doubt that the general public is ready for this kind of method, but
perhaps some association of software engineers or statisticians might be
charmed by its beauties, with or without the stochastic reproducibility
feature.

Comments? Suggestions?

Forest