Unidentified subject!

Blake Cretney bcretney at postmark.net
Sun Feb 28 14:33:51 PST 1999


First off, a warning.  The following is highly mathematical.  If you
don't like or understand this kind of stuff, don't read it.

The purpose of this article is to provide a stronger foundation for
the belief that the beat-path winner, as described by Markus, is the
most probable best candidate.

First, some definitions.

I usually measure the strength of a pair-wise victory by the margin
of majority.  Others, including Markus, prefer to use only the number
on the winning side.  I don't intend to debate this point here.  

If there is a series of pair-wise victories leading from one
candidate to another, for example
A->B, B->C, C->D  "->" means beats pair-wise
then there is a path from C to D.
The strength of a path is equal to the strength of its weakest
victory.
If the strongest path from A to B is stronger than the strongest path
from B to A, then I say that A has a Schulze victory over B.  Assuming
there are no ties, we can create a complete ranking of the candidates
by ranking a candidate over every candidate they Schulze-defeat.

Next, some premises

I start with the assumption that people will have a tendency towards
making correct decisions.  So, if presented with two options, they
will be slightly more likely to pick the correct one.

This means that the more people support a proposition, the more
likely it is to be true.  And the more who oppose, the less likely. 
One, although not the only, way we can use this is by saying that the
probability of a statement being correct increases with the margin by
which it is supported.

I should point out that the kind of statement we have available to
use this kind of test is the A>B kind, with every person who ranks A
over B being evidence for this proposition and every person who ranks
B over A being evidence against.

Kemeny's method takes these pair-wise victories and assumes that they
are independent.  That is, that a person's favouring of A>B is
independent of their favouring of C>D.  In fact, this is a very poor
assumption.  Since decisions are being made by the same people, on the
same basis, often about similar candidates, independence is unlikely. 
For example, if someone rates candidates A, B, and C low on their
ballot because they are all conservatives, all want to decrease farm
subsidies, or are all women, this should not be assumed to be an
independent decision.  

As a result, Kemeny has some clone problems.  For example, if someone
submits a proposals identical to another, this could change the
results, although not by causing this proposal to be elected or not
elected.  Kemeny's method assumes that the decisions voters are making
on the two identical proposals are in fact made independently.

So, if there are three groups of twins A, B and C
with Ax1->Bx2->Cx3->Ax4 for all x1,x2,x3,x4
Ai denotes a twin of twin set A for all A.
 
Then, the more twins of set C there are, the worse the alternatives
of set A are perceived, since each new twin is considered to be
independent information against the twins in A.

So, the question is, is it possible to make a method on a
probabilistic basis which does not assume independence.  I suggest
that Schulze is such a method, possibly the only one, at least for
ranked ballots.

---

If we have a majority decision A>B, this can be taken as evidence in favour
of the proposition that A is better than B.

So, it follows that if we have a path,
A>B B>C C>D D>E
that this is evidence that A is better than E.

If we write [A>B] to mean the margin by which A defeats B, we could write

F([A>B],[B>C],[C>D],[D>E]) 

to mean the increasing function which assigns a strength to this evidence.

Let me now suggest a useful analogy.  Consider a lottery that awards a free
ticket 1/2 the time, and $1000 dollars 1/10000 of the time.  What is your
chance of winning both?  If you assume they are independent, 1/20000. 
However, what if every $1000 dollar winner automatically receives a free
ticket?  This would mean the chance would drop to 1/1000.  We know, however
that it is impossible that every free ticket winner automatically wins $1000
dollars because the free ticket probability is greater.

Now, we want F to be unaffected by any assumption of independence.  For
example, if we knew that C>D was dependent on B>C, so that these voters by
deciding B>C automatically decided C>D, we would want 

F([A>B],[B>C],[C>D],[D>E]) = F'([A>B],[B>C],[D>E])

That is, we wouldn't want F to make use of the extraneous information that
C>D.  Of course if [B>C]<[C>D], we would know that it is not true that
B>C->C>D, at least for some people.  However, if there are two majorities,
there is no way to know that the larger majority is not dependent on the
lesser.  Therefore, if we don't want to make any assumptions about dependence,
we have to use a function F that only depends on the strength of the smallest
majority in the path.  This is how Schulze defines the strength of a beat
path.

Note that this is not exactly the same as assuming dependence.  We aren't
actually saying that
p(A>B and B>C) = p(A>B)
just that the function F will only be based on evidence we know not to be
affected by level of dependency.  In the same way, when we count each persons
vote as equal, we aren't assuming that they are all equally likely to be right
or wrong.  We just aren't assuming anything about who is more likely to be
right.

This brings up the question of what to do if you have several paths from A to
E.  The argument is the similar.  If you don't want to assume any independence
you have to use only the maximum beat path.  After all, the beat paths which
provide less strong evidence could be redundant (dependant) information.  It
therefore makes sense to do as Schulze describes, comparing the strongest beat
paths between A and B is equivalent to comparing the strength of the evidence
for A>B with the strength of the evidence that B>A, without assuming any level
of independence.

-----

To help explain this argument, I will suggest the following situation.  We
have several statements made by an oracle.  The oracle does not promise that
she is stating the truth.  Sometimes she ignores the truth and replies
randomly.  However, she always gives a probability that the statement was
based on truth instead of chance.

Imagine that you have these oracle statements about candidates A, B, C and D,
and you want to know what the most likely best candidate is.

1. A>B 60%
2. B>C 70%
3. C>A 80%
4. A>D 55%
5. B>D 55%
6. C>D 55%

If we use a Minimax (Simple Condorcet) procedure to resolve this, we would
say that D is the most likely winner.  After all, we would argue that
statement #1 shows that we have a 60% certainty statement saying that B is not
the winner.  #2 gives 70% that C is not.  #3 gives 80% that A is not.  If we
don't want to assume any level of independence, we have to say that there is
only 55% certainty against D.

But we are making the mistake of assuming that the not B, not C, and not A
statements are independent.  Although each when considered independently has a
high chance of truth, we know that one of them is false, and that one must be
false whether or not D wins.

Many of the points I've made here have suffered due to my desire for brevity.
 If anyone has any questions, I'm happy to explain myself more thoroughly.  I
hope to kick off a discussion about probabilistic justifications for voting
methods, particularly Schulze's method.


---
Blake Cretney
See the EM Resource:  http://www.fortunecity.com/meltingpot/harrow/124



More information about the Election-Methods mailing list