[EM] Kemeny update

Sat Sep 24 14:42:43 PDT 2011

Warren Smith wrote:
> --seems to me, KM is correct; if the Smith Set has less than about 20
> members (and if this also is true recursively upon removing it), then
> it always will be feasible to find the Kemeny order.

There are two related criteria in play here. First is plain old Smith. 
If the Kemeny method passes Smith, then we know all Smith set members 
must rank above non-Smith-set members (and so on recursively). Kemeny 
does pass Smith. If Kemeny also passes ISDA, then we can outright remove 
all the candidates that aren't in the Smith set because they won't alter 
the outcome anyway. So if Kemeny passes ISDA, your "and this also is 
true recursively" property doesn't have to hold.

> How often will a (>=20)-member Smith set arise in practice?
> 
> I do not know.   KM "can't imagine" it will happen, but that's his guess.
> I think if you take a 100-candidate election with random-uniform
> votes, it'd happen pretty often. Such elections have Condorcet winners
> about 9-12% of the time (says my computer), which leads me to guess
> that they have (>=20)-member Smith sets about 10% of the time.  See
> also puzzles
> 25 and 26 here for some theorems about the closely related problem of
> random "round robin tournaments" rather than random "elections":
>    http://www.rangevoting.org/PuzzlePage.html

I think a ballot with significantly more than 20 members will at least 
be hard to fill out (and the Australian ballots tend to support this 
claim). Say each voter fills out the ranks of k candidates, where k is 
1...i (random), i < 20 <= number of candidates. Does that decrease the 
probability of a >= 20 member Smith set?

> But it could be argued/hoped that 100-candidate elections, when they
> occur, will have vote distributions quite dissimilar to
> random-uniform.
> It could be counterargued that humanity's experience with
> 100-candidate elections is small, and with 100-candidate
> rank-order-ballot
> elections is even smaller, so there is little basis for KM's guess.

I'm basing my guess on what election data exists (the vast majority of 
which have singular Condorcet winners), but it is of course possible 
that the voters will vote in a way that leads to a greater Smith set - 
either because there are many candidates or because the candidates are 
shielded from vote-splitting strategy.

> Tideman in his book has got a statistical semiempirical model, based
> on real election statistics, of rank-order ballot elections.  His
> model is described/corrected/recalculated here:
>     http://rangevoting.org/TidemanElModel.html
> and it finds the probability, in a 100-canddt election,that a
> Condorcet Winner exists, is 27%, and for a 200-canddt
> Tideman-statistics election this probability is 5.6%.

Then I suppose it should be possible to make a simulator to see how 
often it generates size k Smith sets. If only I had the time :-)

>> I think I solved a few instances of Warren's 27x27 matrix on [0...1
>> billion], but I can't be sure since I can't confirm that my Kemeny cost is
>> indeed the least possible. I am, however, assuming that it is, and so I'm
>> looking at larger sizes. Since these will be harder, I've thought of adding
>> an initialization step where the program tells the IP solver of a reasonably
>> low-cost guess. This could speed up the process,
> 
> --certainly it could.  If you use some nonrigorous heuristic to find a
> good order, then you could then use the fact that the optimal order
> must be at least that good, to help "prune" a subsequent exhaustive
> search.
> Lower bounds (i.e. in the other direction) can also be used for pruning
> (if you have any).
> 
> If you have an integer program it can sometimes happen that the optimal
> solution of the linear program (without constraint of having to have integers)
> happens by luck to be all-integer.  In such a case, that proves it is optimal
> for the integer program too.   However, in general integer programming
> is NP-hard and such luck cannot be guaranteed.
> 
> Sometimes you can get partial such luck, e.g. you exhaustively try all
> integer possibilities for the first two variables, and then you get
> lucky as above about all the remaining variables in each case.   That
> also would
> find the optimum and yield an optimality proof.
> 
> If that is what happened for KM, then we could ask: what is the
> "success rate," i.e. what percentage of the time do his integer
> programs enjoy these
> kinds of luck?

To the extent that branch-and-bound or branch-and-cut strategies can be 
considered to have luck, they get lucky to some degree on all problems, 
but they get more lucky on some than on others. If that wasn't the case 
and integer programming in n variables was as hard as brute force, 
Kemeny would be dead in the water.

As for linear programming being integer optimal or close enough that I 
don't notice any difference in solution speed, that *seems* to happen to 
me in the vast majority (again >90%, and I'd guess >95%, but I haven't 
tested) of the randomized instances on the distributions generated by my 
voting program.
I do know your 27x27 matrix is not one of these instances, since the IP 
solver goes past the LP stage there, but it's still solvable. Perhaps 
better integer programming libraries could solve them faster - I use 
glpk. I don't know the details of branch depth, extent of pruning, and 
so on, though.