# [EM] s/q distance minimizations are incompatible with unbias. Some bias discussion.

MIKE OSSIPOFF nkklrp at hotmail.com
Fri Jan 12 23:34:23 PST 2007

```I've sent this message to a few individuals. My apologies if you receive it
individually and on the list.

Let me start with a conclusion, and then justify it later. Webster's and
Hill's minimizations of the distance of states' s/q from 1, or from
eachother, are incompatible with unbias. If you choose those Hill or Webster
optimizations for single states or pairs of states, then you're also
choosing bias. Maybe some believe that those 1 and 2 state s/q distance
minimizations are more important than unbias. Hello? Wouild anyone want to
try to justify intentionally systematically giving small states more s/q
than large states? (or vice versa)

Maybe a good starting definition of biass is: "That which, in PR, would give
the smallest states incentive to coalesce, or give the largest states
incentive to split, in order to maximize their s/q".

When I found out that, with a flat state-size frequency distribution,
Webster is slightly large-biased, I posted about it to EM. That's when I
first proposed Bias-Free. I don't know if all of you are on EM, and so I
feel that I should repeat that discussion in this letter.

Say we graph Webster's s(q) step-function, with the range, the q scale,
labled in quotas, and the vertical scale labled in seats. The
1-seat-per-quota line, the s=q line, of course rises at 45 degrees, from the
origin. At first glance, Webster looks unbiased, because its step-function
is perfectly symmetrical about the s=q line.  You can't get any closer to 1
seat per quota than that, right? But there's a problem. Let me define a few
terms:

A cycle is the range between two integer values of q. For example, between 4
quotas and 5 quotas. A cycle's lower section is the part below the rounding
point. Its upper sectioni is the part above the rounding point.

Consider two corresponding points in a cycle's upper and lower sections--two
points equidistant from the rounding point (which in Webster is in the
middle of the cycle). Their seats differ equally and oppositely from what 1
seat per quota would give them. But, for the state in the lower section,
that represents a greater loss of s/q, because q is lower.

So the overall s/q in the cycle is less than 1. That problem is more
pronounced for low-population cycles, because q differs by a greater factor
in the lower and upper sections. If the state-size frequency distribution is
flat, then the lower cycles have less overall s/q.

One way to solve that: (Computer keyboards don't have "delta", and so I'm
going to use "D" to stand for finite differences). Sum Ds/q over a cycle.
Set it equal to zero, and solve for R, the rounding point. That gives
Bias-Free. Bias-Free's rounding point, between the consecutive integers a
and b, is ((b^b)/(a^a))(1/e). Bias-Free ensures that a cycle's overall s/q
is 1 (or as close to it as possible).

Cycle-Webster accomplishes the same thing by applying Webster to cycles

Looking at that Hill's s(q) step function graph, Hill departs blatantly
asymmetrically from the s=q line, tending to be above it, moreso for the
lower population cycles.

Webster's and Hill's rounding points differ from those of Bias-Free. (Hill's
differ by about twice as much as Webster's). Both the graphs and the
differences in the rounding points tell that Webster and Hill are biased.
Their s/q distance minimizations are incompatible with unbias, as I said
earlier.

When we speak of bias, isn't it understood that we're speaking of a tendency
that is consistent in its direction (favoring larger or smaller states)?
That we're speaking of something that has its effect even over greeat
populatioin differences?

The trouble is that, if we measure bias as the correlation between states' q
and s/q, we aren't just looking at that long-range consistent trend. We're
also including a different kind of bias, within the cycles, a bias that
reverses itself wilthin each cycle. I'll call that "micro-bias". In any
cycle, states above the rounding point have more s/q than states below the
rounding point. But, just looking at states below the rounding poiont (or
above it), s/q decreases with increasing q.

So I suggest that that intra-cycle micro-bias is not what we mean by bias.
It isn't even _part_ of what we mean by bias, because bias is a trend that
is consistent in its direction over long ranges of population.

Earlier I said that a good starting definition of bias is "That which, in
PR, would give the smallest states incentive to coalesce, or give the
largest states ilncentive to split, in order to maximize their s/q".
Micro-bias doesn't do that. Bias that's consistent in its direction over
long ranges of popoulation does that.

Jefferson is strongly large-biased, but it's small-biased within each cycle.

If you agree with that starting definition, then you agree that bias should
be measured on the large scale, ignoring intra-cycle micro-bias. That cycles
should be the smallest units looked at to measure bias. And that
Cycle-Webster is the unbiased method. And Bias-Free, when the distribution
is flat.

I agree with Warren that, if we measure bias as _states'_ correlation of q
and s/q, then, with micro-bias in the mix, yes it would be very difficult to
say something theoretical about bias, and only empirical measurement can say
anything. But if we leave out micro-bias, looking only at large-scale bias,
on a scale no finer than cycles, then it becomes simpler, and theory can say
a few things.

A few things still remain to be found out by empirical testing, such as
whether our census' state-size frequency distribition, tending to cause some
large-bias, can save Hill from its small-bias. Well, every empirical result
I've heard of says "No". Even with the distribution's large-bias, Hill is
still much more biased than Webster.

I suggest that bias-testsing should mean looking at the correlation of
_cycles'_ average q and their average s/q. I suggest that, as long as we
aren't calculating the probability of the correlation, the more sensitive
Pearson correlation should be used.

When looking at the correlation with respect to individual states, maybe
Spearman's rank correlation, by ignoring some detail, might ignore some
micro-bias, and that would be a good thing, suggesting that Spearman is
right for correlation measured with respect to individual states.

By the way, Cycle Webster can have two versionis. In one version, which I'll
call "Hare Cycle-Webster", the cycles defined according to the states' Hare
quotas remain the cycles that Webster is being applied to, thoughout the
Webster process. So, since we're talking about Hare quotas, the cycles
consitune to be the same as initially, and they contain the same states they
initially did, and have the same total quotas as they initially did. Of
course, when Webster is applied to the cycles, changing quotas are applied
to give the right housle-size. The same iterative process that is used for
ordinary Webster (and Hill and Jefferson, etc.) can of course then be used
when Hare Cycle-Webster applies Webster to the cycles.

The alternaative would be to make the cycles, and their states and their
total quota, be based on the current quota being used in the Webster
process. Much more work to handcount or program. Almost surely not
necessary.

So when I speak of Cycle-Webster, I mean Hare Cycle-Webster.

No doubt the 2 versions could give different results. That doesn't mean that
one is biased: Webster and Hamilton sometimes give different results, but
they don't differ in their longterm bias. Hamilton is more random. Then, is
one of the Cycle-Webster versions more random than the other? Maybe one
steady and one random? Well all methods havre an unavoidable random
component. The 2 Cycle-Webster versions could be equally random, and get
different random results, to the extent that they're random. That's how I
expect it is.

Mike Ossipoff

_________________________________________________________________