[EM] The Sainte-Lague index and proportionality

Wed Jul 18 06:05:55 PDT 2012

On 07/15/2012 11:47 AM, Michael Ossipoff wrote:
>>> If unbias in each allocation is all-important, then can anything else be
>>> as good as trial-and-error minimization of the measured correlation
>>> between q and s/q, for each allocation?
>>
>>
>> You answered this below. If you know the distribution, then you can directly
>> find out what kind of rounding rule would be best and you'd use that one.
>
> Yes, but that's a big "if".  But if you, by trial and error, in each
> allocation, minimize the measured correlation between q and s/q, then
> you're achieving unbias, and you needn't know or assume anything about
> that distribution.
>
> Besides, Pearson correlation is well-known, and WW is new. And
> minimizing correlation is obvious and easily explained.  Of course
> you're losing the minimization of the deviation of  states' s/q from
> its ideal equal value.
>
> On the other hand, if an one exponential function, over the entire
> range of states, is a good approximation, then we have a constant p.
> And that p that is just very slightly less than .5 wouldn't be so hard
> to get acceptance for, if it's explained that it gets rid of Webster's
> tiny bias of about 1/3 of one percent.   ...to better attain more true
> unbias.
>
> So either approach would be proposable, if that one overall
> exponential is a good approximation. But Warren himself admitted that,
> at the low-population end, it isn't accurate, because the states, at
> some point, stop getting smaller. But Warren said that his single
> exponential function worked pretty well in his tests.

Gibrat's law ( https://en.wikipedia.org/wiki/Gibrat%27s_law ) suggests a 
log-normal distribution, and indeed the tail of such a distribution is 
exponential.

After reading about it, I did some tests with US state populations (of 
latest census), and a kernel density estimate of the logarithms of the 
populations show a distribution that looks a lot like a sum of 
Gaussians. So log-normal is a reasonable first approximation. To get a 
better approximation, use the exponential of a fitted sum of Gaussians, 
which would be log-normals multiplied -- I think.

I haven't tested on past populations, so perhaps my sample size is 
insufficient. Still, if it is not, that would explain why Warren's 
exponential function worked well -- and presumably, a log-normal fit 
would work better yet.