[EM] Societal ranking from incomplete pairwise information. (Pinewood derby.)

Sat Mar 17 11:16:51 PDT 2012

Kristofer, (and others too)

If I recall, you were recently experimenting with how to best determine a
winner (or was it a full ranking) from incomplete pairwise information.
 What are the methods that you (or others) consider best for that?  It
seems like Kemeny would be a good fit (if it weren't computationally
prohibitive): for each pairwise contest you have data on, you just add one
point to each possible societal ranking which agrees with that pairwise
contest.  If your pairwise information has a notion of defeat strength,
then you can sum the defeat strengths to get a Borda-like method.
 Otherwise you can count the pairwise defeats and have a Copeland-like
method.  But for those last two each candidate should have about the same
number of contests, right?

Does the Schulze method extend naturally to incomplete data?

In your experimentation, were you the one who decided which pairwise
contests to run or was it decided by some other actor and you just had to
live with whatever data was generated?

A couple weeks ago, I found myself keeping score for a pinewood derby,
where 8-11 year old boys race wooden cars down a track.  When we thought
there were only 20 entrants, we were going to try to run the entire
pairwise matrix (190 races).  When 28 boys showed up, that became
impossible.  I quickly drew up a racing schedule.  Each boy got a number
and ended up racing against the next five and the previous five cars (mod
28).  (Then we did a quick tournament afterwards.)

It occurred to me that it would be better to use the outcomes of the early
races to decide who races against each other in the later races.  (Let A>B
signify that A has raced and won against B.)  If A>B, A>C, B>D, and C>D,
then there is really no point racing A against D.  You really want to use
the early race outcomes to determine which cars are comparable and race
those cars against each other.  This increases the information content of
each outcome.  It also contributes to the enjoyment by racing cars of the
same caliber and getting every boy a win if possible.  I've been thinking
about good methods for attacking this.

One other important constraint is that all cars should have about the same
amount of races, which rules out an "insertion-sort" type algorithm.

If the number of cars is 2^n, then I think the first n rounds are pretty
obvious.  For 32 cars, for example:  (Assuming no ties)

First round: Just pair them up randomly and race them.  There will be 16
winners and 16 losers.
Second round:  Race the 16 winners against each other (randomly) and the 16
losers against each other (randomly).
Third round:  Race the 8 undefeated cars against each other (randomly), the
8 winless cars against each other (randomly), and the 16 one-and-one cars
against each other (randomly).
etc.

(Instead of pure randomness, I would avoid racing cars that had faced each
other before if possible.)

After five rounds, you have one undefeated car, 5 cars with four wins, 10
cars with three wins, 10 cars with two wins, 5 cars with one win, and one
car with zero wins.

In each round, you're basically dividing the cars into tiers based on their
win-loss record and then racing cars in the same tier against each other
randomly.  And, for at least the first n rounds, there will always be an
even number of cars in each tier.  This is equivalent to a pure binary
tournament that has a losers' bracket as well as a two-loss bracket, a
three-loss bracket, etc. .

The question is what to do if you want to run more than n rounds, or if the
number of cars is not exactly a power of two.  I think the idea of dividing
them into tiers (or brackets) and racing them against other cars in the
same tier is still good, but using just their win-loss record is not
enough.  Lots of tiers will have an odd number of cars, so we need to know
which are the best and worst cars in each tier and have a few inter-tier
races.

So we can think of it as needing to come up with a secondary sort criterion
to use inside the win-loss tiers.

Or we can generalize and say that after each round we just come up with a
full societal ranking and then race the first against the second, the third
against the fourth, the fifth against the sixth, etc.  So I'm back to
needing to know the best way to come up with a societal ranking from
incomplete pairwise data.

Thoughts?

~ Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20120317/3a3f5aee/attachment-0003.htm>