[EM] Re: Compactness

Tue Jan 20 19:35:02 PST 2004

Hi Joe,

On Jan 20, 2004, at 5:30 PM, Joe Weinstein wrote:
> However, after posting the message I started wondering whether I got 
> carried away on my point (3) as to supposed advantage of DC over BC in 
> computation.  So I’ve since tried to subject raw intuition to a bit of 
> precision and reality check.  Having done so to a point, perhaps not 
> enough, I must revise what I implied, and go halfway to your position.

Thanks for being big enough to revise your views based on the evidence 
(not something to be taken for granted in this forum :-).  I think I'm 
starting to appreciate the virtue of your position, as well.

> Very likely, in your context, BC is easier to handle than DC, or 
> anyhow is no harder to handle.  It sounds like you have a good 
> algorithm, and for good reasons. ....
> In this latter case especially, it would pay to pre-process the 
> specification of plan P into convenient supplementary tables.  In 
> particular, one could prepare a list L of 5-tuples, one for each of 
> the at most 3T boundaries.  Each 5-tuple would list not only the three 
> data in the boundary’s triple, but also the two atoms’ current 
> district IDs.  Then, a small change in plan P would require but small 
> changes in L, and an easy pass through L would readily sum the 
> boundary values for different-district atom pairs - or, even faster, 
> would get the complementary sum of all values for same-district pairs.

Actually, I think there's a really easy trick to make it easy to 
calculate boundary changes.   To find the effect of transferring node A 
from District X to District Y, we can just calculate A(Y) - A(X) (that 
is, A's edge-weight to nodes in Y vs. A's edge-weight to other nodes in 
X).  With an object-oriented approach, both node A and node B would 
carry the edge-weight from A to B, so its easy to look it up.

> On the other hand (details below) there seems to be just about equally 
> good news from the DC side too, at least if DC uses squared distance.  
> ... But there is good news from the DC side too, if the compactness 
> criterion be to minimize the average SQUARED distance between 
> same-district atom pairs.  This criterion amounts to minimizing 
> average within-district atom-centroid squared distance, alias average 
> district (moment of) inertia.  It turns out that one can readily 
> account for small changes in districts by storing and updating for 
> each district the X- and Y- components both of its centroid and of its 
> inertia.  (Thanks to Pythagoras, the X- and Y- inertias sum to the 
> total inertia.)  For those who want a named theorem behind the 
> technique, it’s the  Parallel Axis Theorem.  Years ago I saw a 
> programmer’s writeup of the one-dimensional analog: for the sake of 
> scratch-memory conservation, he calculated the mean and variance of a 
> sequence of read-in numbers, by updating simultaneously the running 
> mean and running variance.

That's a very good point.  The use of moment of inertia as the measure 
of compactness (if I understand you correctly) seems like a good 
metric.  So, we can treat each atom as if its entire population (w) was 
concentrated at a single point p =  (x,y) (ideally the centroid, though 
if its a small enough region the geographical center will do fine). 
Thus, for each district we maintain these vector sums:
	P = sum(p) # centroid
          W = sum(w) # mass
	PW = sum(p*w) # moment
	P2W = sum(p^2 * w) # variance
I believe this gives us:
	C (center of mass) = PW / W
so the moment of inertia M would be:
	M = sum (w * (p-C)^2) = sum (w*(p^2 - 2*p*C + C^2))
	= P2W - 2 W * P *C + W*C^2

Is that pretty much what you had in mind?   That implies that if adding 
node A (p_A, w_A) to District Y doesn't significantly change the mass W 
or Center of Mass C, then:
	dM_Y = (P2W' - P2W')  - 2 (P'-P) *C * W = w*p^2 - 2 p * C * W

Does that seem right?

The only thing I'm not sure about is how to define a simple 
standard-deviation of M to measure the impact of a given change (like 
sqrt(N) would for a population N), but hopefully I can figure something 
out.

> Anyhow, Ernie, I do not doubt that you have an excellent algorithm for 
> updating the BC effect of small changes in the districting plan, and I 
> thank you for your corrective!

Thank you.  That seems simple enough; I'll see if I can add those 
calculations into my program.   Ultimately I'd like it to be a runtime 
option which to use, since the 'right' measure is ultimately  more of a 
political one than a technical one.

Cheers,
- Ernie P.