[EM] Some possibilities for less biased divvisor methods

Fri Jul 20 10:56:38 PDT 2012

Even among the divisor methods, there are all sorts of possibilities.

BF is in the spirit of Hill. While the traditional divisor methods all
have a fixed rounding point, BF, like Hill, uses a formula, and claims
less bias. (But BF really is unbiased in the sense of having the
average s/q equal to 1 in each interval)
(instead of seeking _expected_ s/q to be 1, based on a non-uniform
probability-density)
.
Kristofer used "p" to stand for the fractional part of q, and I'll
keep that convention.

The traditional methods have fixed p. If fixed p is desired, then it
could be the one that BF gives, for the lowest interval that is among
those that are expected to have equal s/q.  It could be the one that
BF gives for the 1 to 2 interval. That would put the fixed p at about
.47   For a fixed p proposal, I'd propose p = .47

The lower intervals, with lower q, are the ones in which changing the
rounding point has the greatest effect on s/q. So it's better to have
BF's accuracy in the 1-2 interval than in the higher ones.

For that matter, one could choose a fixed p that would give the least
maximum magnitude of bias, given some non-uniform probability density.

Kristofer described some complicated probability-density
approximations that have good justification from principles, would
probably be more accurate. I'll ll call those "the complicated
functions".   As I was saying, WBF integrates (s/q)F(q), with respect
to q, between the two consecutive integers a and b, where F(q) is the
probability-density function. I don't know if the integration and the
solution of the resulting equation for R (the rounding point) could be
done analytically, with an exact solution by formula, when the
complicated functions are used. Probably not, I would expect.

But it could be done numerically, and the solution for R could be done
numerically too--with a new numerical integration being used in each
iteration of the numerical equation-solving method, when solving for
R.

Or a simpler approximation of F could be used, to make possible a
numerical solution for R. A polynomial approximation of F would work,
for that purpose.

Say G(q) is the cumulative state-number function. Number the states,
starting with the smallest. Those numbers are the cumulative state
numbers for those states. G(q) is that number as a function of q.  q
is the quotient resulting from dividing a state's population by some
common divisor (used to calculate all of the state's q values).

Over some range of q, one could approximate G(q) with a polynomial. It
could be an interpolation, or a least-squares approximation based on a
larger range of q. Differentiating G(q) gives the probability density
funciton, F(q),  implied by or consistent with G(q).

With the complicated functions, (if I understand what the complicated
functions look like) there two points of inflection, and, there, dF/dq
has its greatest magnitudes. Maybe near those points are the points
where s/q differs from 1` by the greatest factor. Maybe that occurs
when abs(dF/dq)/q is the greatest.

If all one wants to do is compare the performance of low-bias
proposals, then those two points are the ones of interest. Finding s/q
at those two points, with some particular apportionment method,  is of
course a lot easier than solving for R.

One thing that one could do would be to find, by trial and error, what
constant p would give the least maximum factor by which the s/q, for
two (a,b) intervals somewhere in q's range, could differ.

Also, one could compare BF, Hill, and Webster,to find which of those
would have the last maximum factor by which s/q could differ for two
(a,b) intervals, somewhere in q's range.

Or one could do WBF, by finding the R, for each a to b interval, that
makes the expected s/q equal to 1 in that interval. With the
complicated functions that would surely be a numerical problem, but it
could be analytically solved if F(q) is gotten by differentiating a
polynomial approximation of G(q).

My first apportionment proposal would be BF. If more simplicity were
desired, I'd suggest p = .47

If unbias judged based on expected s/q in all of the intervals being
1, based on non-uniform F(q), than WBF would be the best. For more
simplicity, the other solutions described above could be proposed, the
ones that minimize the maximum factor by which expected s/q in two
intervals can differ, when p is some constant, or when one must choose
between BF, Hill and Webster.

But, though it isn't part of the subject of this posting, because it
isn't a divisor method, one could also propose minimizing, by trial
and error, the Pearson correlation between q and s/q among the states.
Or between total s and total q in the intervals. Correlation is a
familiar concept,and a simple principle to describe and propose, and
doesn't need a probability-density function approximation.

Of course these methods could be used for PR too.

Mike  Ossipoff