# [EM] Societal ranking from incomplete pairwise information. (Pinewood derby.)

Kristofer Munsterhjelm km_elmet at lavabit.com
Sat Mar 17 13:39:45 PDT 2012

```On 03/17/2012 07:16 PM, Andy Jennings wrote:
> Kristofer, (and others too)
>
> If I recall, you were recently experimenting with how to best determine
> a winner (or was it a full ranking) from incomplete pairwise
> information.

I don't seem to recall that offhand, but life has been somewhat chaotic
the last few weeks, so I may simply not be remembering what I did say.

> What are the methods that you (or others) consider best for that?  It
> seems like Kemeny would be a good fit (if it weren't computationally
> prohibitive): for each pairwise contest you have data on, you just
> add one point to each possible societal ranking which agrees with
> that pairwise contest.

Kemeny isn't that unreasonable in practical use. My integer linear
programming implementation even manages 20-30 candidates, though it does
take quite a bit of time on the high end.

So if you have an ordinary election, you'll likely have fewer than 30
candidates -- or, to be more precise, fewer than 30 candidates in a
cycle -- so you could use Kemeny if you wanted.

> If your pairwise information has a notion of defeat strength, then
> you can sum the defeat strengths to get a Borda-like method.

Just summing them up could produce a bias, however. If A wins half his
contests and B does the same, but B has participated in fewer contests
than A, B might be as good as A but summation will consistently rank A
above B.

You might respond with Laplacian rule-of-succession logic, saying that
it's more probable that A is the winner than that B is. Yet if the
Laplacian logic is based on something else than raw summation, you could
always adjust the numbers so that A wins by summation but the heuristic
says A and B are tied.

> Otherwise you can count the pairwise defeats and have a Copeland-like
> method. But for those last two each candidate should have about the
> same number of contests, right?

Yes.

> Does the Schulze method extend naturally to incomplete data?

I don't think so, since it makes use of shortest paths. I suppose you
could set the unknown pairwise contests to have infinite strength
against the current member of the beatpath so they never figure into the
beatpath calculation, but that isn't very elegant.

It would be easier to extend Ranked Pairs. Just take the contests you do
know about and put them in order of strength. Go through the list like
in ordinary Ranked Pairs. If you're unlucky, at the end you will have a
number of incomplete relations (e.g. A>B, A>C, B>D, D>E would give
A>B>D>E but say nothing about C except that A is above C). If that
happens, it means there's not enough information to tell, and any method
would be limited by it. Perhaps one could use the incomplete relations
to find out what contest would give the most additional information, but
that's tricky.

Another option, if you have truly huge datasets, would be to use the
PageRank algorithm, since it's designed for that sort of problem. If
webpages vote on other webpages by linking to them, it would be
unreasonable to expect every (webpage, webpage) pair to have a link
either in one or the other direction. Warren has implemented some
similar methods (e.g. Sinkhorn), but to my knowledge, they're not Condorcet.

> In your experimentation, were you the one who decided which pairwise
> contests to run or was it decided by some other actor and you just had
> to live with whatever data was generated?

In every "have user judge between two candidates" task I've run, I
decided contests to run. Now that I think about it, I remember two types
I ran. In the first case, I used binary insertion sort, which
(obviously) dictated which pairwise contests to run. In the second, I
ran all O(n^2) contests and pushed the result through Schulze. In the
latter type, the contests were arranged to make the distance between a
contest involving A and the next contest involving A be very large. My
idea in doing so was to limit bias. If a user got a string of
A-vs-someone-else contests, he might get used to A and so his responses
could be affected.

> A couple weeks ago, I found myself keeping score for a pinewood derby,
> where 8-11 year old boys race wooden cars down a track.  When we thought
> there were only 20 entrants, we were going to try to run the entire
> pairwise matrix (190 races).  When 28 boys showed up, that became
> impossible.  I quickly drew up a racing schedule.  Each boy got a number
> and ended up racing against the next five and the previous five cars
> (mod 28).  (Then we did a quick tournament afterwards.)
>
> It occurred to me that it would be better to use the outcomes of the
> early races to decide who races against each other in the later races.
>   (Let A>B signify that A has raced and won against B.)  If A>B, A>C,
> B>D, and C>D, then there is really no point racing A against D.  You
> really want to use the early race outcomes to determine which cars are
> comparable and race those cars against each other.  This increases the
> information content of each outcome.  It also contributes to the
> enjoyment by racing cars of the same caliber and getting every boy a win
> if possible.  I've been thinking about good methods for attacking this.

It depends on whether you think there are going to be cycles. If there
aren't going to be any cycles, then you can reduce the problem to one of
sorting. On the other hand, if there might be cycles, that won't work
and something more sophisticated would be needed.

I think cycles would be unlikely in racing, because so much of the
racing skill involves something that stays static whatever your opponent
is. But if you were ranking chessplayers, for instance, it might be the
case that different playing styles produce a rock-paper-scissors effect.

Also, even if there are no cycles, sorting might still not do it if the
results are sufficiently noisy. If each contest may go the wrong way by
some probability, then sorting can propagate that error arbitrarily.
Some Condorcet methods (Kemeny, for instance) are maximum likelihood
estimators over noise models, and so would fit better, but I'm not
certain how they do when given incomplete information.

> One other important constraint is that all cars should have about the
> same amount of races, which rules out an "insertion-sort" type algorithm.

How about merge sort? That works like an elimination tournament. You
could get some imbalance if the number of contestants isn't a power of
two, but you'd get that in an ordinary elimination tournament as well,
through byes.

Alternately, you could "overfulfill" on a binary insertion sort. In an
ordinary binary insertion sort, the last contestant needs to go through
ceil(log2(n)) contests. So say that some contestant who's not the last
finds his final position after k < ceil(log2(n)) contests. Then you say
"what if the next-to-last contest had gone the other way, who would he
have for his last contest?". Do that contest. Then do the
next-to-next-to-last (which will give two alternate contests), and so
on. These additional contests are not needed to pinpoint the
contestant's position, but you could use them as additional information
for Kemeny or Ranked Pairs. If the contests are noisy, then the "what
if" explores the neighborhood.
Ideally, you'd have a way of finding the contestant's true position so
the noise resistance propagates, but I'm not sure how to combine that
with binary insertion sort. Use only the contests involving the newcomer
and those already sorted, put those into Kemeny, and use the partial
social ordering as the order for "already sorted" for next contestant,
perhaps?

Yet another option would be to look into sorting networks. Sorting
networks are parallel sorting algorithms (more or less), and so should
have better distribution properties.

> So we can think of it as needing to come up with a secondary sort
> criterion to use inside the win-loss tiers.
>
> Or we can generalize and say that after each round we just come up with
> a full societal ranking and then race the first against the second, the
> third against the fourth, the fifth against the sixth, etc.  So I'm back
> to needing to know the best way to come up with a societal ranking from
> incomplete pairwise data.

Couldn't you just insert some lose-to-all candidates to pad the number
up to a power of two? When a contestant faces a lose-to-all virtual
candidate, he just goes on to the next round. Then arrange the seeding
so that these candidates are distributed throughout the lower tiers,
i.e. that they produce as little mess as possible. It will cause some
departure from an even number of matches, but I think that's unavoidable
unless you want to add matches that would not be strictly needed (in a
variant of the noise-proofing idea).

```