[EM] Unbiasing Statistical Condorcet

Mon Oct 27 12:14:29 PDT 2014

Some time ago, I wrote about a method based on Schulze STV and on 
statistics (more specifically, the relation between the chi-squared 
test, the exact multinomial test, and Sainte-Laguë/Webster). I came up 
with a MLE approach where one maximizes the pmf of the multinomial with 
the "samples" fixed to the council composition and the probability 
parameters adjusted to maximize the pmf subject to certain constraints 
derived from the ballots.

But then I found out that the probability parameters that would maximize 
the pmf in the ideal case (that the ballots permitted it to be fully 
maximized, i.e. that they posed no constraints whatsoever) would lead to 
the D'Hondt allocation, not the Sainte-Laguë allocation. And I know that 
the D'Hondt allocation is biased in favor of large parties, which I 
consider undesirable.

Then I got kinda confused because the relation between the chi-square 
test and the Sainte-Laguë allocation is exact. How could the 
approximation (the chi-square test) be better than the real thing (the 
multinomial pmf)?

But I think I have an idea now. Maximum likelihood estimators can be 
biased for finite samples (in this case, a finite number of seats). MLEs 
are unbiased in the limit, but they may stay quite biased before one 
gets there. For instance, the MLE for the variance of a Gaussian is 
biased for finite n.

Something similar might be the case for the multinomial. If so, the 
imprecision in the chi-square just happens to cancel out the bias in the 
"real thing", which means that we should try to cancel out the bias in 
the exact method in some other way to achieve the same effect.

(Another approach is to just use the chi-squared test right away. That 
would be more pragmatic and we wouldn't need to go through all the 
derivation, but I think an explicit canceling would be more 
well-founded, or at least simpler to generalize.)

-

If the number of parties is k and the number of seats is n, and the 
number of members of party r in the council we're considering is c_r, 
then the uncorrected score function is |v_x| * f(c_1,...,c_k; p_1,...,p_k).

Here v_x is the number of voters that we can marshal in support of this 
council (against the other council that has one candidate that differs), 
and p_1...p_k are the parameters to alter to maximize f, where f is the 
multinomial pmf.

Then, if we hold all values of p constant but one (say it's p_1), the 
optimum value of p_1 should have the property that floor(n * p_1) = c_1. 
This is what causes the D'Hondt reduction. (NOTE: I have not rigorously 
proven this. I only know that when you're free to set all ps, this holds 
in aggregate. An actual proof would probably be based on the multinomial 
pmf's monotonicity.)

What we would like is to have the property that floor(n * (q + p_1)) = 
c_1 instead, where q is chosen to cancel out the bias, which means the 
p_1 we should give to the pmf should instead be p_1' = q + p_1.

If we just want Sainte-Laguë right away, it's pretty easy to start off 
with floor(n * p_1 + 0.5) = c_1, i.e. floor(n * (0.5/n + p_1) = c_1 
which gives an adjustment of p_1' = 1/2n + p_1.

But there's another snag: this might cause a violation of two 
constraints on a probability parameter. First, the sum of all the 
parameters must be one, and second, no single parameter can exceed one. 
So let's find out a better p_i' by making it of the form

p_i' = (1/2n + p_i)/x

where x is chosen to make the sum come out to one. We have:

SUM i=1..k p_i' = 1
SUM i=1..k (1/2nx) + SUM i=1..k (p_i/x) = 1
k/(2nx) + 1/x = 1
x = 1 + k/2n

p_i' = (1/2n + p_i)/(1 + k/2n)     ("1")
or
p_i' = (2n*p_i + 1) / (k + 2n).    ("2")

(If I've got this wrong, do tell me.)

In the limit of n->inf, ("1") becomes: (0 + p_i)/(1 + 0) = p_i. So the 
MLE is asymptotically unbiased as expected.

---

Some might object that I just plucked the Sainte-Laguë divisor out of 
nowhere, so I'll give a quick reasoning for why it is unbiased.

Let X be a random variable (a party's support, in terms of the fraction 
of the total electorate) on 0..1. Then the transformation g is unbiased 
with respect to x if the expected value of g(X) is equal to the expected 
value of X. In other words, if you draw lots of different party 
fractions and run them through the rounding function, you should, on 
average, get the same value as if you did nothing to it.

So let g(X) = floor(n(X + r)) where n is the number of seats and r is 
some adjustment constant to unbias.

Then the expected value of g(X) is the integral from 0 to 1 of g(X). 
Intuitively, this is the same as saying: the mean value is: the 
probability of getting a party fraction that is rounded to 0 seats, 
times 0, plus the probability of getting a party fraction that is 
rounded to 1 seat, times 1, plus... plus the probability of getting a 
party fraction that is rounded to n seats, times n.

Since g(X) gives a result from 0..n, we also have to scale X by n when 
doing the "if you did nothing to it" comparison. So floor(n(x+r)) is 
unbiased if the integral from 0 to 1 of it is equal to the integral from 
0 to 1 of nx dx - which gives the expected value if fractional seats 
were allowed.

The integral from 0 to 1 of nx dx is simply n/2, which is reasonable 
enough: the average number of seats is half of the maximum.
The integral from 0 to 1 of floor(nx) is (n-1)/2, which clearly does not 
equal n/2. So D'Hondt is biased.

Now consider integral from 0 to 1 of floor(n(x+r)) dx. This is clearly 
equal to integral from r to 1+r of floor(nx) dx. If r < 1/n, then we can 
further simplify this to

integral from 0 to 1 of floor(nx) dx + integral from 1 to 1+r of 
floor(nx) dx

which is

(n-1)/2 + integral from 1 to 1+r of floor(nx) dx

but because r < 1/n, floor(nx) is simply n for x on [1 ... 1+r], so

= (n-1)/2 + integral from 1 to 1+r of n dx

= (n-1)/2 + n * (1+r) - n
= (n-1)/2 + nr

Set this equal to n/2 to cancel out bias, and solve for r:

(n-1)/2 + nr = n/2

r = 1/2n = 0.5/n

which is the Sainte-Laguë correction.

It should in theory be possible to apply this to other distributions as 
well. The above implicitly assumes the uniform distribution, but you 
could weight the probability of getting 0 seats, 1 seats etc according 
to any distribution. However, there are two problems with this. First: 
since Statistical Condorcet uses pairwise comparisons, it's far from 
obvious what kind of distribution to use, and even less so when you 
consider that voting patterns might change as a result of using the 
method. Second: it's really easy to get lost in this rabbit hole. I did, 
and it ate a week of my time just like that. Since probabilities can 
only range from 0 to 1, the distribution also has to be somehow divided 
by the number of people that are voting in total, and constructing ratio 
distributions in general is pretty hard.

I did attempt to do so with a toy distribution with cdf F(r) = x^2, but 
my notes are kind of messy. Let me know if you'd like to see them.