[EM] Societal ranking from incomplete pairwise information. (Pinewood derby.)

Sat Mar 17 13:39:45 PDT 2012

On 03/17/2012 07:16 PM, Andy Jennings wrote:
> Kristofer, (and others too)
>
> If I recall, you were recently experimenting with how to best determine
> a winner (or was it a full ranking) from incomplete pairwise
> information.

I don't seem to recall that offhand, but life has been somewhat chaotic 
the last few weeks, so I may simply not be remembering what I did say.

> What are the methods that you (or others) consider best for that?  It
> seems like Kemeny would be a good fit (if it weren't computationally
> prohibitive): for each pairwise contest you have data on, you just
> add one point to each possible societal ranking which agrees with
> that pairwise contest.

Kemeny isn't that unreasonable in practical use. My integer linear 
programming implementation even manages 20-30 candidates, though it does 
take quite a bit of time on the high end.

So if you have an ordinary election, you'll likely have fewer than 30 
candidates -- or, to be more precise, fewer than 30 candidates in a 
cycle -- so you could use Kemeny if you wanted.

> If your pairwise information has a notion of defeat strength, then
> you can sum the defeat strengths to get a Borda-like method.

Just summing them up could produce a bias, however. If A wins half his 
contests and B does the same, but B has participated in fewer contests 
than A, B might be as good as A but summation will consistently rank A 
above B.

You might respond with Laplacian rule-of-succession logic, saying that 
it's more probable that A is the winner than that B is. Yet if the 
Laplacian logic is based on something else than raw summation, you could 
always adjust the numbers so that A wins by summation but the heuristic 
says A and B are tied.

> Otherwise you can count the pairwise defeats and have a Copeland-like
> method. But for those last two each candidate should have about the
> same number of contests, right?

Yes.

> Does the Schulze method extend naturally to incomplete data?

I don't think so, since it makes use of shortest paths. I suppose you 
could set the unknown pairwise contests to have infinite strength 
against the current member of the beatpath so they never figure into the 
beatpath calculation, but that isn't very elegant.

It would be easier to extend Ranked Pairs. Just take the contests you do 
know about and put them in order of strength. Go through the list like 
in ordinary Ranked Pairs. If you're unlucky, at the end you will have a 
number of incomplete relations (e.g. A>B, A>C, B>D, D>E would give 
A>B>D>E but say nothing about C except that A is above C). If that 
happens, it means there's not enough information to tell, and any method 
would be limited by it. Perhaps one could use the incomplete relations 
to find out what contest would give the most additional information, but 
that's tricky.

Another option, if you have truly huge datasets, would be to use the 
PageRank algorithm, since it's designed for that sort of problem. If 
webpages vote on other webpages by linking to them, it would be 
unreasonable to expect every (webpage, webpage) pair to have a link 
either in one or the other direction. Warren has implemented some 
similar methods (e.g. Sinkhorn), but to my knowledge, they're not Condorcet.

> In your experimentation, were you the one who decided which pairwise
> contests to run or was it decided by some other actor and you just had
> to live with whatever data was generated?

In every "have user judge between two candidates" task I've run, I 
decided contests to run. Now that I think about it, I remember two types 
I ran. In the first case, I used binary insertion sort, which 
(obviously) dictated which pairwise contests to run. In the second, I 
ran all O(n^2) contests and pushed the result through Schulze. In the 
latter type, the contests were arranged to make the distance between a 
contest involving A and the next contest involving A be very large. My 
idea in doing so was to limit bias. If a user got a string of 
A-vs-someone-else contests, he might get used to A and so his responses 
could be affected.

> A couple weeks ago, I found myself keeping score for a pinewood derby,
> where 8-11 year old boys race wooden cars down a track.  When we thought
> there were only 20 entrants, we were going to try to run the entire
> pairwise matrix (190 races).  When 28 boys showed up, that became
> impossible.  I quickly drew up a racing schedule.  Each boy got a number
> and ended up racing against the next five and the previous five cars
> (mod 28).  (Then we did a quick tournament afterwards.)
>
> It occurred to me that it would be better to use the outcomes of the
> early races to decide who races against each other in the later races.
>   (Let A>B signify that A has raced and won against B.)  If A>B, A>C,
> B>D, and C>D, then there is really no point racing A against D.  You
> really want to use the early race outcomes to determine which cars are
> comparable and race those cars against each other.  This increases the
> information content of each outcome.  It also contributes to the
> enjoyment by racing cars of the same caliber and getting every boy a win
> if possible.  I've been thinking about good methods for attacking this.

It depends on whether you think there are going to be cycles. If there 
aren't going to be any cycles, then you can reduce the problem to one of 
sorting. On the other hand, if there might be cycles, that won't work 
and something more sophisticated would be needed.

I think cycles would be unlikely in racing, because so much of the 
racing skill involves something that stays static whatever your opponent 
is. But if you were ranking chessplayers, for instance, it might be the 
case that different playing styles produce a rock-paper-scissors effect.

Also, even if there are no cycles, sorting might still not do it if the 
results are sufficiently noisy. If each contest may go the wrong way by 
some probability, then sorting can propagate that error arbitrarily. 
Some Condorcet methods (Kemeny, for instance) are maximum likelihood 
estimators over noise models, and so would fit better, but I'm not 
certain how they do when given incomplete information.

> One other important constraint is that all cars should have about the
> same amount of races, which rules out an "insertion-sort" type algorithm.

How about merge sort? That works like an elimination tournament. You 
could get some imbalance if the number of contestants isn't a power of 
two, but you'd get that in an ordinary elimination tournament as well, 
through byes.

Alternately, you could "overfulfill" on a binary insertion sort. In an 
ordinary binary insertion sort, the last contestant needs to go through 
ceil(log2(n)) contests. So say that some contestant who's not the last 
finds his final position after k < ceil(log2(n)) contests. Then you say 
"what if the next-to-last contest had gone the other way, who would he 
have for his last contest?". Do that contest. Then do the 
next-to-next-to-last (which will give two alternate contests), and so 
on. These additional contests are not needed to pinpoint the 
contestant's position, but you could use them as additional information 
for Kemeny or Ranked Pairs. If the contests are noisy, then the "what 
if" explores the neighborhood.
Ideally, you'd have a way of finding the contestant's true position so 
the noise resistance propagates, but I'm not sure how to combine that 
with binary insertion sort. Use only the contests involving the newcomer 
and those already sorted, put those into Kemeny, and use the partial 
social ordering as the order for "already sorted" for next contestant, 
perhaps?

Yet another option would be to look into sorting networks. Sorting 
networks are parallel sorting algorithms (more or less), and so should 
have better distribution properties.

> So we can think of it as needing to come up with a secondary sort
> criterion to use inside the win-loss tiers.
>
> Or we can generalize and say that after each round we just come up with
> a full societal ranking and then race the first against the second, the
> third against the fourth, the fifth against the sixth, etc.  So I'm back
> to needing to know the best way to come up with a societal ranking from
> incomplete pairwise data.

Couldn't you just insert some lose-to-all candidates to pad the number 
up to a power of two? When a contestant faces a lose-to-all virtual 
candidate, he just goes on to the next round. Then arrange the seeding 
so that these candidates are distributed throughout the lower tiers, 
i.e. that they produce as little mess as possible. It will cause some 
departure from an even number of matches, but I think that's unavoidable 
unless you want to add matches that would not be strictly needed (in a 
variant of the noise-proofing idea).