[EM] Redistricting, now with racial demographics
Kristofer Munsterhjelm
km-elmet at broadpark.no
Sun Jul 19 14:37:43 PDT 2009
Brian Olson wrote:
> I've updated my redistricting site ( http://bolson.org/dist/ ) to
> include the racial breakdown of all current congressional districts
> (sometimes interesting by itself) and that of the compactness based
> districts I have come up with. If you want you can jump directly to
> http://bolson.org/dist/ XX where XX is any US state abbreviation.
As I've mentioned before, I've been doing something like this, but out
of no other purpose than abstract interest, and applied to the world at
large.
Thus, I think I can give some ideas as to how to improve it. The first
and most immediate one is to use K-means++ clustering. For my own
simulator, this immediately gave an order of magnitude standard
deviation improvement.
See http://en.wikipedia.org/wiki/K-means%2B%2B for that. Basically,
K-means++ picks the first center randomly. Then it picks the next center
with a probability for each point proportional to the distance from the
closest existing center, times a power (originally squared, but my
program uses 0.25). Set the power to whatever gives the best results.
The simplest way of picking a new center is to just record the distance
to closest centers (you have to do that anyway to draw the map), then
use roulette selection for the probabilities.
Second, by recording distances in such a manner, further optimization is
possible. When adding a new district, you can simply grow outwards from
its centroid (center point) without having to recalculate all the other
districts. This kind of heap optimization enables my "global
redistricter" to work with distances based on elevation deltas (a rough
approximation to travel time). Of course, if you're using plain
Euclidean distance, you can just use Fortune's plane sweep algorithm to
calculate the Voronoi diagrams.
After all the district centers have been picked, you can't use K-means++
any more. Revert to ordinary K-means clustering at that point (or
whatever other clustering you may want). For my program, this means that
the K-means++ initial phase can be quicker than the subsequent
move-centroid-to-population-center K-means phase, so you might also try
running the initial phase for various different random seeds and then
only picking the most promising to go onto the next phase.
Another refinement I've been considering, but I haven't implemented, is
a way to split districts internally. Usually, most districts
("countries" in my case) are about even, but some have much lower
population and some have much greater. Remove one of the lower
population districts and use the spare point to split a high population
district. One way of doing that would be to turn a point (x, y)
(centroid of a populous district) into (x+r, y+r) and (x-r, y-r) for
some small value r, then letting k-means clustering draw the centroids
apart in the right direction.
The split might affect other districts and the direction of the r
component might be wrong, but it might also turn out well (and my manual
emulation of this does simply because the large districts are so large
that it distorts the others less than the split gains).
More information about the Election-Methods
mailing list