[EM] Measures of Compactness

Fri Sep 26 13:20:12 PDT 2025

On 2025-09-22 01:15, Andrew B Jennings (elections) via Election-Methods 
wrote:
> I'm a supporter of defining compactness and then using a districting map 
> that optimizes it. Brian Olson (bdistricting.com/about.html 
> <http://bdistricting.com/about.html>) has always advocated that:
> 
>  > The best district map is the one where people have the lowest average 
> distance to the center of their district.
> 
> But what about using the square of the distance? Mathematically, it 
> makes so many square roots disappear that the problem might be much more 
> tractable.

I have the impression that better approximation algorithms are known for 
capacitated k-medians than capacitated k-means, but k-means is more 
commonly used because it's easier to deal with.[1]

See e.g. https://arxiv.org/pdf/2208.14129; and 
https://arxiv.org/pdf/1812.07721 for k-median in the 2D Euclidean space.

> It also seems that minimizing the sum of squares of distances from each 
> voter to the (population-weighted) centroid of their district and 
> minimizing the sum of squares of distances between each pair of voters 
> in the same district is practically the same problem (at least on the 
> plane). That seems nice.
> 
> I have some friends who actually want to propose redistricting our city 
> mathematically. Is it worth using the square of the distance?
> 
> I have done some maps minimizing the sum of the distances (and squares 
> of distances) between all pairs of voters in the same district. They 
> look pretty similar, at least for my city.

I tinkered with redistricting a little over half a year ago: 
https://github.com/kristomu/redistricter

I don't think I found much of a change between the different forms, so I 
just kept to k-means. (What *doesn't* work is to minimize the unweighted 
distance subject to a population constraint - that just cracks cities.)

My experience was that finding a reasonably good solution is not all 
that hard, but finding the best is extremely hard (and you won't know if 
you found it). In addition there are tons of annoying real-world data 
problems like census blocks with holes in them and whose borders produce 
wiggly district borders seemingly no matter how you cut them. Compare 
e.g. 
https://github.com/kristomu/redistricter/blob/master/examples/california.png 
and 
https://github.com/kristomu/redistricter/blob/master/examples/california_straight_lines.png.

There are some nice postprocessing algorithms, though. It's often 
possible to start with some assignment of centers and then use local 
optimization to make their populations equal - by adding some penalty to 
the center-population distances of districts that have too many people 
in it compared to its size. That's how I got the difference of most 
densely and least densely populated districts down to about 0.005% of 
the total population for the California and Colorado assignments.[2]

The subgradient method of Janáček and Gábrišová 
(https://www.tandfonline.com/doi/pdf/10.3846/1648-4142.2009.24.274-282 
section 3.1) is a good example of such an algorithm, though I couldn't 
get their exact method to converge every time.

-km

[1] It might also be the traditional approach, e.g. this paper from 1965 
uses squared distances: https://doi.org/10.1287/opre.13.6.998

[2] I wonder if a variant of Loyd's algorithm could be used to turn such 
weighted assignments to unweighted ones. I was thinking of trying that 
out, but my code got too complicated and I left it.