[EM] Re: redistricting - objectively

Sat Jan 8 10:22:24 PST 2005

Hi Andrew,

On Jan 8, 2005, at 5:58 AM, Andrew Myers wrote:
> Brian Olson replies:
>>
>> This sounds like one of the parts of a classic set of problems in 
>> Computer
>> Science that are like cryptography in that a solution is hard to 
>> find, but
>> easy to verify.
> Actually, it sounds to me like a minimum graph cut problem. There are
> efficient network flow algorithms for solving these problems.

Hmm, I'd be interested in hearing that.  My gut reaction is that those 
algorithms work if you're optimizing a single variable that can be 
calculated locally (e.g., compactness), but not if you want to optimize 
multiple variables simultaneously.  I'd love to be proven wrong, 
though.

However, I think Brian makes an excellent point:  the most important 
question is what is the *goal* of redistricting.  If we can come up 
with an objective criteria for measuring the 'goodness' of a map, then 
it really doesn't matter how it is drawn or by whom. That would 
effectively decouple the political issues from the technical ones.

In the debate (generally, not just here) I've heard a range of 
criteria/concerns that people want to maximize:

1. Equality Population (P)
2. Community Connectedness (C)
3. Geographic Compactness (G)
4. Consistency with existing Districts (D)

Is that a fairly complete list?   I think these all have some (though 
not equal) validity.   The question then becomes:
	a) How to measure each of these
	b) How to normalize them in a scale-independent way
	c) How to weight them relative to each other

I think it is best to (a) define these in terms of quantities to be 
minimized, and (b) scale them each to be of order (1):

	P[i] = population of a district / average population - 1
	C[i] = (lanes of traffic cut by circumference + 
modifiers*)/normalization**
	G[i] = moment of inertia / inertia of a disk of the same area
	D[i] = % overlap with the largest existing district

The hard one is C, both politically and technically.  One of the chief 
gripes against brute-force equal-sized districts is that they don't 
respect existing political and ethnic associations (admittedly a touch 
subject).  Traffic lanes (L) are a "reasonable" proxy for city and 
county connectivity, but such measures don't always respect ethnic and 
civic boundaries.  Should they?  That's a political question, not a 
technical one.  One could imagine something like:

	C[i] = w1 * L[same city] + w2*L[same demographics] + w3*L[same county] 
+ w4*L[other]

Again, that's not my preference, but if that's the political mandate 
its best to embed it explicitly here where it can be factored in ahead 
of time, rather than leave it as a court challenge for later (perhaps 
duplicit) use.

The harder challenge is actually the technical one: how do you 
normalize C?   The others are all scale independent -- they should be 
similarly sized no matter how big or urbanized it becomes.  But, while 
C[i] can easily be compared to other C[j], how do you compare it to the 
other factors?

This is crucial because the ultimate goal is to come up with a single 
figure of merit Q, defined as the weighted sum of the other parameters:

	Q = w_p * |P| + w_c *|C| + w_g * |G|  + w_d *|D|

where |P| would probably be both the range (Pmax - Pmin) and the 
standard deviation (sigma)  of that value, and thus always positive.   
Assuming appropriate normalizations, I would propose a weighting along 
the lines of:
	w_p: 40%
	w_c: 30%
	w_g: 20%
	w_d: 10%

But again, those are political decisions; the goal is to set them first 
based on objective criteria, rather than a posteriori to justify a 
politically-motivated outcome.

The point is that we need a scale-independent way to define C, so that 
it doesn't overwhelm (or underwhelm) the other factors.  How can we do 
that?

Well, one option is to normalize by the number of total roads (R) 
within the district.   That is,
	C = L / sqrt(R)

The square root is because the number of roads (presumably defined as 
named streets, which is not perfect but probably good enough) goes 
roughly as the area of (r^2) whereas the number of cut lanes would go 
as the circumference (r).

One would need to try this against actual data to ensure the parameters 
were appropriate tuned, but the number of them is small enough that it 
should be infeasible to choose them in a way that strongly favors one 
group over another (at least for reasonable weights: w_d =100% would of 
course be silly).     We could perhaps calibrate against something like 
Iowa (with w_d=0) to see how well it does against what's widely 
considered non-partisan redistricting.

Thoughts?

-- Ernie P.