[EM] A reduction for a generalization of median ratings/MJ

Tue Dec 2 04:03:17 PST 2014

Consider an election with rated ballots, v voters, and c candidates, and 
let r_{a, b} be the ath voter's rating of the bth candidate. 
Furthermore, let p_1...p_n be an enumeration of the candidates and 
voters, and say p_a "is a candidate" if it corresponds to a candidate, 
and correspondingly for "is a voter".

Define a "distance" d_xy (from x to y) as
	infinite if p_x is a candidate or both of p_x and p_y are voters,
	-r_{a, b} otherwise (if p_x is the ath voter and p_y is the bth candidate)

Now, the following generalization of MJ springs to mind:

Given some integer k >= 1, pick a set S of k points so that the sum of 
distances from all points not in S to their closest point in S is minimized.

(If k = 1, S consists of the point corresponding to the candidate with 
greatest median rating.)

Then elect the candidates corresponding to the k points picked, and give 
each of them weight equal to the number of voters who are closer to him 
than to any other. (Tiebreaks are left as a detail.)

But the problem above is just the k-median problem! ( 
https://www.cs.cmu.edu/~anupamg/adv-approx/lecture4.pdf )

It's NP-hard, at least for continuous distances[1]. Exact solvers don't 
seem to blow up that horribly, though. There's an exact solver with 
complexity O(n^sqrt(k)) for a minmax variant of the problem ( 
https://en.wikipedia.org/wiki/Facility_location_problem#Minimax_facility_location 
). Perhaps one could make a similar complexity solver for the minsum 
variant (i.e. the one above). If so, it would be manageable for 
realistic numbers of candidates or parties.

I also imagine one could use greedy approximations to the problem to 
solve it in polytime. The k-median problem is solved much more easily if 
d gives a metric[2], but it should be possible to reduce to set cover 
even if not and use the greedy (simple) algorithm for set cover there. 
These approaches would, like Tideman is to Kemeny, be methods of their 
own, and would have to be analyzed thus. Finally, the minmax variant 
also reduces to MJ (absent tiebreaks) in the 1-seat case, and so could 
possibly be used.

---

What would that give us? Applying it directly would give a weighted 
candidate method, and one that doesn't seem to have the first preference 
bias of methods based on IRV. The approximation methods might have 
biases of their own, of course.

Applying it to party list also seems pretty obvious at first. Say the 
voters rate/grade parties the way one would candidates in MJ. Then, to 
solve, just set k very large (or solve the uncapacitated facility 
location problem, which has no such limit k) and get the party weights. 
Then feed those to Webster's algorithm to get seat counts for each party.

However, that could leave some parties with zero seats. One way of 
handling this would be by elimination of those parties and running the 
method again. That's the IRV approach, and probably would lead to or 
increase nonmonotonicity. Another approach is more Bucklinesque. Start 
with k=1 and increase until one of the parties get zero seats. Pick the 
greatest k that gives every party at least one seat, and go with that. 
It's not perfect, but probably the best one can do short of an integer 
program (which would be rather complex)[3]. Furthermore, one would have 
to prove that "early factionalist lock-in" wouldn't happen. Say that 
there are a few voters that give their party max and every other party 
min. Then perhaps these would, due to their uncompromising nature, would 
get the fringe party a nonzero weight at low k. In that case, the party 
would get zero seats, but in doing so, would bar other minor parties 
that could appear later on. But perhaps this can't happen because the 
fringe party is always given support after less fringe parties.

In any case, the Bucklin approach would appear to handle an LCR 
equivalent the way we'd expect it to, e.g.:

46: L(Excellent, 3) C(Good, 2) R(Poor, 1)
43: R(Excellent, 3) C(Good, 2) L(Poor, 1)
11: C(Excellent, 3) R(Good, 2) L(Poor, 1)

If there's only one seat, then k=1 will give C one seat, but k=2 will 
give R and L less than a seat each, so k=1 is chosen and C wins. But 
with two seats, k=2 gives R and L one seat each, as desired.

For an ordinary (unweighted) candidate multiwinner method, one would 
need a capacitated version of the k-median problem. Set k = number of 
seats and the max capacity to ceil((num voters)/(num seats)). Then, by 
pigeonhole, k candidates will be closer to one quota's worth of weight 
than to any other number of quotas' worth, and every other candidate 
will have weight zero.

Capacitated k-median could also be used to set an upper limit to the 
weight of any candidate in a weighted method, e.g. a 10% limit mentioned 
elsewhere.

---

To conclude, though, there are many approximation algorithms for the 
k-median problem. Some of these might be simple enough, fast enough, and 
close enough to the exact result that they could be used. They could 
then be used for weighted candidate methods, or might be combined with 
the Bucklin approach to create a party list method that, although not 
perfect, would be considerably better than plurality-based party list 
PR. (Whether it is possible depends on whether the factionalist lock-in 
scenario is a real problem.)

---

[1] I'd imagine it to be NP-hard for discrete ones, too, by reduction 
from set cover.

[2] At first glance, the distance seems to obey the triangle inequality, 
simply because of the infinities. To make d fully metric, we would need 
to relax it to the following:
	d_xy is
		infinite if both of p_x and p_y are voters or candidates,
		-r{a, b} otherwise (if p_x is the ath voter and p_y the bth candidate 
or vice versa)
But that raises the possibility that the method would pick from the 
voters rather than the candidates. This could, I think, be avoided by 
adding a large number of "phantom voters" that vote every candidate 
equally so that v >> c.

[3] The problem is that ineffective votes can happen here as well, due 
to how the quantization is not connected to the rest of the method. Say 
that a bunch of left parties get slightly short of two seats each, and a 
major right-wing party takes the remaining seats. Had the left voters 
known this, they could have moved the excess from each party into some 
other left-wing party and taken at least one seat from the right-wing; 
and so, we'd like the method to do that automatically. But the above 
won't do so because the weighted vote stage has no idea of how the 
quantization by rounding works. (Statistical Condorcet will do the right 
thing, though in a ranked setting; and until I've optimized it, it's 
intractable for large party list elections.)