[EM] Impractical districting idea, complexity

Thu Jun 9 14:39:33 PDT 2022

A private post by Forest Simmons made me realize that finding the proper
boundaries for districts given their centers is the same problem as
finding the assignment of voters to candidates in Monroe's method for a
particular winning assembly.

Monroe (the cardinal version) works like this: Each voter rates each
candidate. The quality of some given winning assembly is the sum of
ratings that each voter gives to his assigned winning candidate, under
the following constraints:
	- each candidate must be assigned the same number of voters
	- this rating sum should be as high as possible.

So we can imagine a bipartite graph with all the voters on the left, the
winning candidates on the right, and we want to pick one edge from each
voter to one of the candidate to maximize the sum of the edge weights
(the ratings), under the constraints given.

But the districting problem is just the same, except we're minimizing
instead of maximizing: let the "rating" of voter x for "winner" y be the
squared distance from person x to center y. We want to assign people to
districts so that the sum of squared distances is minimized, under the
constraint that each district must have the same number of inhabitants.

Since minimization and maximization is equally hard, that means that all
the algorithm results Warren give on
https://rangevoting.org/MonroeMW.html apply to the districting problem.

(Furthermore, the construction doesn't need the metric to obey any sort
of nice property. So if you'd like to make the distance from point x to
y some estimate of the time it would take to walk from x to y, then you
can get naturally-shaped districts... if so desired. Maybe not so
practical for political redistricting, but perhaps if you want to draw
county maps for a fictional country?)

The reduction is better than using linear programming... but it doesn't
help.

If there are k districts and the state population is p, then there are
(p+k) nodes in the graph and m=p*k edges. The complexity of the
assignment algorithm is just about O(p^2k + p^2 log p). The p^2 is the
killer. Suppose we have a population of a million, then p^2 = 1e12. Ouch.

Other stuff/ideas:
	https://www.bowaggoner.com/courses/2019/csci5454/docs/approx.pdf gives
the approximation factor for *maximum* matching with the simple greedy
algorithm of always admitting the person who's closest to some center,
to that center, until everybody's assigned. (But this doesn't have
capacity constraints? I'm not sure?)
	https://people.cs.vt.edu/~sharathr/as_match.pdf gives approximate
matching with metric costs, but there's still an n^2 (i.e. p^2) term to
its complexity.
	Sampling would seem to be the obvious solution (just look at 1000
people or so), but as I understand it, the population constraints on
districting are exact: districts are not allowed to have even the
smallest difference in populations, which excludes approaches based on
randomness.

-km