[EM] A multiwinner election method based on Kemeny

Mon Oct 5 12:12:20 PDT 2009

I continue to tinker with vector quantization, and in doing so, I've
found a multiwinner version of Kemeny. Some of the ideas for it, I've
mentioned before, but it might still be useful.

As with Kemeny, we define the Kemeny distance metric between two orders 
to be the sum of pairs where the orders' preferences differ: that is, 
for all pairs of candidates listed, add one to the distance if one of 
the orders rank the first candidate ahead of the other whereas the other 
does not, or vice versa.
To phrase it differently, consider a tournament matrix generated from 
each order. The matrix for A vs B is 1 if A beats B, -1 if B beats A, 
and 0 if they're equally ranked. Then the Kemeny distance is equal to 
the number of differing fields in the orders' tournament matrices.

The single-winner Kemeny rule proceeds by finding the ranking (social 
ordering) that is the geometric median of the orders given in the 
supplied ballots, with the metric in question being Kemeny distance.

Here's where the vector quantization idea comes into play: for 
multiwinner Kemeny with k winners, find k points (orders) so that the 
sum of the distances between each ballot and its closest point (order) 
is minimized.

Left to itself, the top ranked candidates in these n "winner orders" 
should provide a good council. However, there is one constraint that 
would have to be imposed: a single member can't be assigned more than 
one seat. Thus, the top rank of each of these orders must be unique: no 
candidate must be ranked top on more than one winner order.

-

To rephrase that, the multiwinner Kemeny rule is:

For some ranking p and q, define dist(p, q) equal to the Kemeny distance 
between p and q.

When electing k members from n candidates using i ballot rankings (b_1 
... b_i), a winner ranking set W is a vector of k rankings so that no 
candidate is ranked top on more than one of these k rankings.

Define cdist(p, W) as the distance, according to dist, from p to the 
member in W that minimizes the result.

Then determine W, subject to the rank constraint above, so that
  sum(x = 1..i) cdist(b_x, W)

is minimized.

The winners are those first ranked on the rankings that constitute W.

-

This method makes no mention of quotas, nor is it an "elect and punish" 
method (except indirectly, in that a ballot only counts against its 
closest winner ranking, not all of them). Yet, it seems to be fairly 
proportional, and it retains some of the Condorcet qualities of Kemeny 
(as I'll show in an example below).

On the other hand, it is very hard to determine the actual winners. 
Since a winner rank can be any of the n! possible rankings (I have not 
tried this with equal-rank), it's even worse than combinatorial methods 
like Schulze STV. Perhaps integer programming could be used to speed up 
the multiwinner method, since that has been used to find Kemeny winners 
more quickly.

Also, since Kemeny is a special case of this multiwinner method (with 
k=1), it inherits the flaws of Kemeny: vulnerability to clones, for 
instance. Furthermore, like many PR methods, it's not summable (at least 
not that I can see, because it's unclear ahead of time which the members 
of W are going to be).

I don't know if it retains reinforcement. Also, it resists vote
management for the example on Wikipedia's Schulze STV page, but since it 
can't strictly satisfy the DPC without an appropriate tiebreaker, it 
can't be weakly invulnerable to vote management either.
I'm not sure if it satisfies DPC in the weak sense that winner vectors 
satisfying the DPC will always either win or be tied at top.

Let's show how it works by a (2,3) election using the PSC-CLE example
where the Condorcet winner is far from those included in the STV winner 
set.

For the sake of computational efficiency, the reported distances are 
half of the actual distances (since if p has A > B and q has B > A, 
dist(p,q) would report 2 - one for p's A>B value not matching q's A>B 
value, and one for p's B>A value not matching q's B>A value, whereas one 
can make the check quicker by only checking A>B in this case).

33 A>D>B>C
33 B>D>A>C
32 C>D>A>B
2  D>A>B>C

The Droop quota is 100/3, 33.33^, so no single candidate is supported by
more than a Droop quota. STV elects {A, B}, and the CW is D.

For the multiwinner Kemeny rule above, W is
	a. C>D>A>B
	b. D>A>B>C
with sum of distances 99, because:

  33: A>D>B>C is closest to b. with a dist. of 1, and 33 * 1 = 33
  33: B>D>A>C is closest to b. with a dist. of 2, and 33 * 2 = 66
  32: C>D>A>B is closest to a. with a dist. of 0, and 32 * 0 =  0
   2: D>A>B>C is closest to b. with a dist. of 0, and  2 * 0 =  0

Thus the winners are {C, D}. One might object that A is better than C,
because C is ranked last on more ballots; however, this disagrees with
the B>D>A>C and C>D>A>B voters. If W is:
	a. A>D>B>C
	b. D>C>A>B
then
  33: A>D>B>C is closest to a. with 33 * 0 =  0
  33: B>D>A>C is closest to a. with 33 * 3 = 99
  32: C>D>A>B is closest to b. with 32 * 1 = 32
   2: D>A>B>C is closest to a. with  2 * 1 =  2

and the sum is 133, which is greater than 99.

Note that if we force A above the Droop quota by doing
	34 A>D>B>C
	33 B>D>A>C
	32 C>D>A>B
	1  D>A>B>C
then we have a tie between AC (A>D>B>C C>D>A>B) and CD (C>D>A>B D>A>B>C) 
with sum distance 100. Forcing it further to
	35 A>D>B>C
	32 B>D>A>C
	32 C>D>A>B
	1  D>A>B>C
makes AC win (sum distance 97).