# [EM] A Clone Free Metric on ballots

Forest Simmons fsimmons at pcc.edu
Wed Mar 3 23:50:02 PST 2021

```A lot of good ideas have died on the vine because clones were ignored. The
three best ones are Borda, Copeland, and Kemeny.

The key to fixing them is to make appropriate use of lotteries that
re-distribute the probability of an alternative to the members of the clone
set that takes its place. We have mentioned a few of them and have shown
how to use them to fix Borda and Copeland in recent posts to this EM list.

Some people will quit reading at the first mention of probabilities because
they think the only use for probabilities is to introduce randomness into
the determination of the outcome ... but that confuses the probabilities
with the random variables; for example let X be one or zero depending on
the outcome of a coin toss ... heads or tails respectively. The X is a
Bernoulli random variable with parameter p, the probability of "success"
and q equal to 1-p, the probability of "failure." If the coin is "fair"
then both p abd q are equal to 50 percent. These parameters p and q are not
not random variables ... any more than the fraction 1/2 is a random
variable.

Once all ballots have been counted, the probability that a random ballot
ranks alternative A in first place is just a well defined number ... the
fraction or percentage of the ballots that rank X first. Just because we
use probabalistic language to define a number doesn't mean that number is
going to make our method outcomes random. In all of my recent posts
referring to lotteries the only purpose of the lotteries has been to define
deterministic probabilities for use in deterministic methods.

So in this post to be definite let P(X) be the probability that X will be
chosen by randomly drawn ballots .. if the first ballot is ambiguous,
additional ballots are drawn at random to narrow down to one alternative
... this is a well defined Markov process, so for any given set of marked
ballots, and candidate X the probability P(X) is a well defined number
between zero and one, a number that can be calculated by elementary
arithmetic.

Without further ado, let's de-clone Kemeny. Kemeny finds the ranking of the
candidates that minimizes the of the distances from that ranking to the
ballot rankings. The problem with classical Kemeny is that it uses a clone
dependent distance metric, simply counting the number of adjacent pairs in
one order that need to be swapped to transform it into the other order.
When you replace one alternative with a clone set it increases the number
of needed transpositions, thus distorting the metric.

Here's how to do it right:

1. Construct the pairwise matrices M1and M2 for the respect orderings
(rankings) of the alternatives:
The entry in row i column j of M1 is a 1 or a zero respectively, depending
on whether the first ranking ranks alternative i ahead of j.  M2 is defined
similarly from the second ranking.

2. Let M be (M1 -M2) minus its transpose.

3. Remove all of the minus signs from the entries of M and divide it by two
to get the matrix A.

4. Let D be the matrix product bAb', where b is the benchmark lottery in
the form of a row vector and b' is its transpose.  This number D is the
decloned distance between the two rankings.

Let's say that the distance between two ranked preference ballots is the
distance D between their rankings as defined above. The pairwise matrices
used above in no way required complete rankings, so the distance is well
defined between any two ballots, complete or not!

Kemeny is just one potential use of a decent metric on the ballots.  How
would you use it for designing a PR multiwinner method?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20210303/eabaf58d/attachment.html>
```