[EM] Whoops! Correction to Statistical Condorcet

Wed Sep 3 11:10:47 PDT 2014

Apparently the unconstrained MLE for the multinomial distribution with 
given probabilities isn't the Sainte-Laguë or Webster apportionment, but 
the D'Hondt or Jefferson apportionment[1].

This came as quite a surprise to me, given that the chi-squared and 
G-tests are claimed to approach the exact test as n goes to infinity. I 
may look into this in detail later, but I suspect the situation is that 
although the exact test value, call it x, and the G-test value, call it 
y, obey lim n->inf x_n-y_n = 0, for any finite n, the exact test will 
give the D'Hondt assignment a greater value (more likely draw) than the 
Sainte-Laguë one, whereas it's the other way around for the G-test (or 
chi-squared test).

I am not a formal statistician, though! And since I got the implications 
of the convergence wrong, I might be wrong about this as well.

For clarification purposes, letting s be the number of seats, the exact 
test for the multinomial distribution is: given a draw vector x, (i.e. 
so many of x_1, so many of x_2, etc) and a probability vector p = (p_1, 
p_2, ...), and the multinomial pmf P

Pr(x) = sum [for all y so that P(y; s, p) <= P(x; s, p)] P(y;s,p)

And letting s be the number of seats, define chi-squared test statistic as

chi(x, p) = SUM i=1...n (x_i/s - p_i)^2/(p_i)

Then, say, for the following votes: (10, 9, 8, 5, 4) and 5 seats, we have:

p-vector: 0.28, 0.25, 0.22, 0.14, 0.11

Sainte-Lague: (1, 1, 1, 1, 1)
D'Hondt:      (2, 1, 1, 1, 0)

Exact test value for Sainte-Lague: 0.899
Exact test value for D'Hondt: 1.0

multinomial pmf for the Sainte-Lague assignment: 0.028
multinomial pmf for the assignments with greater probability than this:

0.03234 for [1, 2, 1, 1, 0]
0.03622 for [2, 1, 1, 1, 0]
0.03234 for [2, 2, 1, 0, 0]

But the chi-squared statistic for Sainte-Lague is 0.13403 while the one 
for the D'Hondt apportionment is 0.19896, thus ranking the former higher 
than the latter.

This is true even for large s, e.g.:
	p = (0.3786, 0.245265, 0.1846, 0.06637, 0.06583, 0.059335)
	150 seats
	Sainte-Lague: [56, 37, 28, 10, 10, 9]
	D'Hondt: [57, 37, 28, 10, 9, 9]
	pmf for Sainte-Lague: 1.64*10^-5
	pmf for D'Hondt: 1.66*10^-5
	chi-square for Sainte-Lague: 0.00012
	chi-square for D'Hondt: 0.00058

-

Of course, if you like D'Hondt (for stability reasons or otherwise), you 
don't need to do anything to Statistical Condorcet to fix the above. 
Because it's Condorcet-based, it should also favor compromise parties 
rather than parties that get large numbers of first preference votes, so 
it is better than ordinary D'Hondt in that respect.

But if you don't, then the elegance of maximizing the pmf falls. So we'd 
have to find some way of using, say, the global optimality properties 
mentioned in http://rangevoting.org/Apportion.html directly. But this is 
tricky because they are all minimization properties, which means that 
the optimizer might just decide to set zero voters to participate and 
thus get a perfect zero every time.
That is again something to investigate later. Perhaps taking the area of 
the chi-squared distribution above the point given by the chi-squared or 
G-test would work: that turns it into a maximization problem again. But 
since the cdf for chi-squared involves gamma functions, optimizing that 
might be rather difficult.

Alternatively, we might go deeper. Why choose Sainte-Laguë to begin 
with? Because it's unbiased: it doesn't consistently favor small or 
large parties (and, because it's a divisor method, it has certain 
favorable properties we'd like to carry over). So find something that is 
unbiased. But the problem with that is that we might lose the "reduction 
to Webster when everybody plumps" property.

-

[1] I uncovered this when reading "A fast and simple algorithm for 
finding the modes of a multinomial distribution" by White and Hendy. It 
gives an algorithm for finding a mode of the multinomial, i.e. an 
apportionment that maximizes the exact test value. The paper is 
paywalled, but the algorithm is essentially a combination of Jefferson 
and D'Hondt: first they get a Jefferson solution for a number of seats 
that's close enough to the number of seats specified, and then they run 
D'Hondt either forwards or in reverse until they get the number of seats 
you want. The authors don't appear to recognize the solution as D'Hondt, 
though.