[EM] s/q distance minimizations are incompatible with unbias. Some bias discussion.

Fri Jan 12 23:34:23 PST 2007

I've sent this message to a few individuals. My apologies if you receive it 
individually and on the list.

Let me start with a conclusion, and then justify it later. Webster's and 
Hill's minimizations of the distance of states' s/q from 1, or from 
eachother, are incompatible with unbias. If you choose those Hill or Webster 
optimizations for single states or pairs of states, then you're also 
choosing bias. Maybe some believe that those 1 and 2 state s/q distance 
minimizations are more important than unbias. Hello? Wouild anyone want to 
try to justify intentionally systematically giving small states more s/q 
than large states? (or vice versa)

Maybe a good starting definition of biass is: "That which, in PR, would give 
the smallest states incentive to coalesce, or give the largest states 
incentive to split, in order to maximize their s/q".

When I found out that, with a flat state-size frequency distribution, 
Webster is slightly large-biased, I posted about it to EM. That's when I 
first proposed Bias-Free. I don't know if all of you are on EM, and so I 
feel that I should repeat that discussion in this letter.

Say we graph Webster's s(q) step-function, with the range, the q scale, 
labled in quotas, and the vertical scale labled in seats. The 
1-seat-per-quota line, the s=q line, of course rises at 45 degrees, from the 
origin. At first glance, Webster looks unbiased, because its step-function 
is perfectly symmetrical about the s=q line.  You can't get any closer to 1 
seat per quota than that, right? But there's a problem. Let me define a few 
terms:

A cycle is the range between two integer values of q. For example, between 4 
quotas and 5 quotas. A cycle's lower section is the part below the rounding 
point. Its upper sectioni is the part above the rounding point.

Consider two corresponding points in a cycle's upper and lower sections--two 
points equidistant from the rounding point (which in Webster is in the 
middle of the cycle). Their seats differ equally and oppositely from what 1 
seat per quota would give them. But, for the state in the lower section, 
that represents a greater loss of s/q, because q is lower.

So the overall s/q in the cycle is less than 1. That problem is more 
pronounced for low-population cycles, because q differs by a greater factor 
in the lower and upper sections. If the state-size frequency distribution is 
flat, then the lower cycles have less overall s/q.

One way to solve that: (Computer keyboards don't have "delta", and so I'm 
going to use "D" to stand for finite differences). Sum Ds/q over a cycle. 
Set it equal to zero, and solve for R, the rounding point. That gives 
Bias-Free. Bias-Free's rounding point, between the consecutive integers a 
and b, is ((b^b)/(a^a))(1/e). Bias-Free ensures that a cycle's overall s/q 
is 1 (or as close to it as possible).

Cycle-Webster accomplishes the same thing by applying Webster to cycles 
instead of individual states.

Looking at that Hill's s(q) step function graph, Hill departs blatantly 
asymmetrically from the s=q line, tending to be above it, moreso for the 
lower population cycles.

Webster's and Hill's rounding points differ from those of Bias-Free. (Hill's 
differ by about twice as much as Webster's). Both the graphs and the 
differences in the rounding points tell that Webster and Hill are biased. 
Their s/q distance minimizations are incompatible with unbias, as I said 
earlier.

When we speak of bias, isn't it understood that we're speaking of a tendency 
that is consistent in its direction (favoring larger or smaller states)? 
That we're speaking of something that has its effect even over greeat 
populatioin differences?

The trouble is that, if we measure bias as the correlation between states' q 
and s/q, we aren't just looking at that long-range consistent trend. We're 
also including a different kind of bias, within the cycles, a bias that 
reverses itself wilthin each cycle. I'll call that "micro-bias". In any 
cycle, states above the rounding point have more s/q than states below the 
rounding point. But, just looking at states below the rounding poiont (or 
above it), s/q decreases with increasing q.

So I suggest that that intra-cycle micro-bias is not what we mean by bias. 
It isn't even _part_ of what we mean by bias, because bias is a trend that 
is consistent in its direction over long ranges of population.

Earlier I said that a good starting definition of bias is "That which, in 
PR, would give the smallest states incentive to coalesce, or give the 
largest states ilncentive to split, in order to maximize their s/q". 
Micro-bias doesn't do that. Bias that's consistent in its direction over 
long ranges of popoulation does that.

Jefferson is strongly large-biased, but it's small-biased within each cycle.

If you agree with that starting definition, then you agree that bias should 
be measured on the large scale, ignoring intra-cycle micro-bias. That cycles 
should be the smallest units looked at to measure bias. And that 
Cycle-Webster is the unbiased method. And Bias-Free, when the distribution 
is flat.

I agree with Warren that, if we measure bias as _states'_ correlation of q 
and s/q, then, with micro-bias in the mix, yes it would be very difficult to 
say something theoretical about bias, and only empirical measurement can say 
anything. But if we leave out micro-bias, looking only at large-scale bias, 
on a scale no finer than cycles, then it becomes simpler, and theory can say 
a few things.

A few things still remain to be found out by empirical testing, such as 
whether our census' state-size frequency distribition, tending to cause some 
large-bias, can save Hill from its small-bias. Well, every empirical result 
I've heard of says "No". Even with the distribution's large-bias, Hill is 
still much more biased than Webster.

I suggest that bias-testsing should mean looking at the correlation of 
_cycles'_ average q and their average s/q. I suggest that, as long as we 
aren't calculating the probability of the correlation, the more sensitive 
Pearson correlation should be used.

When looking at the correlation with respect to individual states, maybe 
Spearman's rank correlation, by ignoring some detail, might ignore some 
micro-bias, and that would be a good thing, suggesting that Spearman is 
right for correlation measured with respect to individual states.

By the way, Cycle Webster can have two versionis. In one version, which I'll 
call "Hare Cycle-Webster", the cycles defined according to the states' Hare 
quotas remain the cycles that Webster is being applied to, thoughout the 
Webster process. So, since we're talking about Hare quotas, the cycles 
consitune to be the same as initially, and they contain the same states they 
initially did, and have the same total quotas as they initially did. Of 
course, when Webster is applied to the cycles, changing quotas are applied 
to give the right housle-size. The same iterative process that is used for 
ordinary Webster (and Hill and Jefferson, etc.) can of course then be used 
when Hare Cycle-Webster applies Webster to the cycles.

The alternaative would be to make the cycles, and their states and their 
total quota, be based on the current quota being used in the Webster 
process. Much more work to handcount or program. Almost surely not 
necessary.

So when I speak of Cycle-Webster, I mean Hare Cycle-Webster.

No doubt the 2 versions could give different results. That doesn't mean that 
one is biased: Webster and Hamilton sometimes give different results, but 
they don't differ in their longterm bias. Hamilton is more random. Then, is 
one of the Cycle-Webster versions more random than the other? Maybe one 
steady and one random? Well all methods havre an unavoidable random 
component. The 2 Cycle-Webster versions could be equally random, and get 
different random results, to the extent that they're random. That's how I 
expect it is.

Mike Ossipoff

_________________________________________________________________
Communicate instantly! Use your Hotmail address to sign into Windows Live 
Messenger now. http://get.live.com/messenger/overview