[EM] A computationally feasible method (algorithmic redistricting)

Wed Sep 3 05:57:19 PDT 2008

On Sep 3, 2008, at 5:32 AM, Raph Frank wrote:

> On 9/3/08, Brian Olson <bql at bolson.org> wrote:
>> Anyway, what my pseudo-voronoi solver does is this:
>>
>> Initialization:
>>        • Assign district centers to random points (block  
>> coordinates).
>
>>        • Assign census blocks to nearest district center.
>>
>> Repeat until done:
>>        • Move district centers towards center of population they were
>> actually assigned.
>
> Is this guaranteed to improve the district quality?
>
> In fact, do you only make an update if that improves quality by your  
> measure?

I think it's not guaranteed to be an improvement because the  
simultaneous update of all the other district centers and weights  
could wind up creating overshoots or other destructive interference. I  
always make the update. It may actually not be terribly well thought  
out beyond 'seems like an intuitively good idea' and 'seems to work  
pretty well'.

>
>>        • Adjust district weight, nudging weight up or down if there  
>> are too
>> few or two many people assigned to it.
>
> Do you do this until it completely converges, or just 1 nudge?

1 nudge per cycle of all the "Repeat until done" stages. 10000 to  
20000 cycles are needed to settle depending on the state.

>>        • (optional) Nudge district centers away from underpopulated
>> district centers and towards overpopulated district centers.
>>        • For each census block, assign census block to the district  
>> with
>> the lowest ((distance to block from district center) * weightd)
>
> You would get straight edges if you used
>
> lowest( (distance^2) - weight )
>
> This also saves you having to do a square-root, though I guess your
> method could be implemented as
>
> lowest( (distance^2)*(weight)^2 )
>
> This would speed up the calculations as there would be no need for
> finding the square root of the distance, while still giving the same
> result.

I checked my code and I'm not doing the expensive square root. It's  
not quite the second though, it's actually:
((dx*dx + dy*dy) * weight)

The weight gets nudged by multiplying by 1.01 or 0.99. Squaring the  
weight or not and how it's nudged is probably just an efficiency issue  
and not a correctness issue, it should still get to the weight it  
needs, just maybe along some suboptimal curve.

I think multiplicative weight makes more sense. No chance of  
underflow, and might scale better between districts with different  
densities.

>
>>        • Fixup district contiguity (because straight-line-nearest  
>> can jump
>> across water and other gaps in odd ways)
>
> I think water counts as part of a district, so technically that isn't
> a breech of contiguity.
>
> What could be an issue if if a State has a concave boundary.
>
> Btw, do you parse the block geography file to work out which blocks
> border each other?

Yes, the block shape file determines adjacency. They handily label  
every line segment with what block is on either side of it, so I can  
pretty quickly filter that to keep a list of all pairings I've seen.  
Of course I do also wind up going back and rendering that geometry to  
an actual map.

>> (That's now enshrined here:
>> http://code.google.com/p/redistricter/wiki/NearestNeighborSolver
>> )
>>
>> Oh, and when it's done (an hour or two for most states), start it  
>> over
>> again with a different random starting state because it may have  
>> gotten
>> stuck in a local optimum.
>
> My splitline software was able to do all the States in one evening.
> That is the benefit of a non-search algorithm.

I can get decent mappings for 90% of the states in a couple days. CA  
and TX are tough. Lots of data and lots of of districts to balance.  
There are some tunable parameters in the search process that I should  
try adjusting specifically for solving those states.

Brian Olson
http://bolson.org/