[EM] Redistricting, now with racial demographics

Sun Jul 19 14:37:43 PDT 2009

Brian Olson wrote:
> I've updated my redistricting site ( http://bolson.org/dist/ ) to 
> include the racial breakdown of all current congressional districts 
> (sometimes interesting by itself) and that of the compactness based 
> districts I have come up with. If you want you can jump directly to 
> http://bolson.org/dist/ XX where XX is any US state abbreviation.

As I've mentioned before, I've been doing something like this, but out 
of no other purpose than abstract interest, and applied to the world at 
large.

Thus, I think I can give some ideas as to how to improve it. The first 
and most immediate one is to use K-means++ clustering. For my own 
simulator, this immediately gave an order of magnitude standard 
deviation improvement.
See http://en.wikipedia.org/wiki/K-means%2B%2B for that. Basically, 
K-means++ picks the first center randomly. Then it picks the next center 
with a probability for each point proportional to the distance from the 
closest existing center, times a power (originally squared, but my 
program uses 0.25). Set the power to whatever gives the best results. 
The simplest way of picking a new center is to just record the distance 
to closest centers (you have to do that anyway to draw the map), then 
use roulette selection for the probabilities.

Second, by recording distances in such a manner, further optimization is 
possible. When adding a new district, you can simply grow outwards from 
its centroid (center point) without having to recalculate all the other 
districts. This kind of heap optimization enables my "global 
redistricter" to work with distances based on elevation deltas (a rough 
approximation to travel time). Of course, if you're using plain 
Euclidean distance, you can just use Fortune's plane sweep algorithm to 
calculate the Voronoi diagrams.

After all the district centers have been picked, you can't use K-means++ 
any more. Revert to ordinary K-means clustering at that point (or 
whatever other clustering you may want). For my program, this means that 
the K-means++ initial phase can be quicker than the subsequent 
move-centroid-to-population-center K-means phase, so you might also try 
running the initial phase for various different random seeds and then 
only picking the most promising to go onto the next phase.

Another refinement I've been considering, but I haven't implemented, is 
a way to split districts internally. Usually, most districts 
("countries" in my case) are about even, but some have much lower 
population and some have much greater. Remove one of the lower 
population districts and use the spare point to split a high population 
district. One way of doing that would be to turn a point (x, y) 
(centroid of a populous district) into (x+r, y+r) and (x-r, y-r) for 
some small value r, then letting k-means clustering draw the centroids 
apart in the right direction.
The split might affect other districts and the direction of the r 
component might be wrong, but it might also turn out well (and my manual 
emulation of this does simply because the large districts are so large 
that it distorts the others less than the split gains).