[EM] Measures of Compactness
Kristofer Munsterhjelm
km-elmet at munsterhjelm.no
Fri Sep 26 13:20:12 PDT 2025
On 2025-09-22 01:15, Andrew B Jennings (elections) via Election-Methods
wrote:
> I'm a supporter of defining compactness and then using a districting map
> that optimizes it. Brian Olson (bdistricting.com/about.html
> <http://bdistricting.com/about.html>) has always advocated that:
>
> > The best district map is the one where people have the lowest average
> distance to the center of their district.
>
> But what about using the square of the distance? Mathematically, it
> makes so many square roots disappear that the problem might be much more
> tractable.
I have the impression that better approximation algorithms are known for
capacitated k-medians than capacitated k-means, but k-means is more
commonly used because it's easier to deal with.[1]
See e.g. https://arxiv.org/pdf/2208.14129; and
https://arxiv.org/pdf/1812.07721 for k-median in the 2D Euclidean space.
> It also seems that minimizing the sum of squares of distances from each
> voter to the (population-weighted) centroid of their district and
> minimizing the sum of squares of distances between each pair of voters
> in the same district is practically the same problem (at least on the
> plane). That seems nice.
>
> I have some friends who actually want to propose redistricting our city
> mathematically. Is it worth using the square of the distance?
>
> I have done some maps minimizing the sum of the distances (and squares
> of distances) between all pairs of voters in the same district. They
> look pretty similar, at least for my city.
I tinkered with redistricting a little over half a year ago:
https://github.com/kristomu/redistricter
I don't think I found much of a change between the different forms, so I
just kept to k-means. (What *doesn't* work is to minimize the unweighted
distance subject to a population constraint - that just cracks cities.)
My experience was that finding a reasonably good solution is not all
that hard, but finding the best is extremely hard (and you won't know if
you found it). In addition there are tons of annoying real-world data
problems like census blocks with holes in them and whose borders produce
wiggly district borders seemingly no matter how you cut them. Compare
e.g.
https://github.com/kristomu/redistricter/blob/master/examples/california.png
and
https://github.com/kristomu/redistricter/blob/master/examples/california_straight_lines.png.
There are some nice postprocessing algorithms, though. It's often
possible to start with some assignment of centers and then use local
optimization to make their populations equal - by adding some penalty to
the center-population distances of districts that have too many people
in it compared to its size. That's how I got the difference of most
densely and least densely populated districts down to about 0.005% of
the total population for the California and Colorado assignments.[2]
The subgradient method of Janáček and Gábrišová
(https://www.tandfonline.com/doi/pdf/10.3846/1648-4142.2009.24.274-282
section 3.1) is a good example of such an algorithm, though I couldn't
get their exact method to converge every time.
-km
[1] It might also be the traditional approach, e.g. this paper from 1965
uses squared distances: https://doi.org/10.1287/opre.13.6.998
[2] I wonder if a variant of Loyd's algorithm could be used to turn such
weighted assignments to unweighted ones. I was thinking of trying that
out, but my code got too complicated and I left it.
More information about the Election-Methods
mailing list