[EM] CFC-Young, a more time-complexity friendly clustering multiwinner method

Mon Feb 21 03:25:02 PST 2011

Since mentioning CFC-Kemeny, I've been thinking about whether it's 
possible to make a CFC-method with a more viable runtime. I think I have 
found one, but let's define my terminology first.

CFC means "continuous forced clustering". The idea of these methods is 
like Monroe's method, in that each voter can split his vote among 
different "clusters", each corresponding to a seat. The voter doesn't 
actually do so, but the method does so as to maximize an objective 
function. There are usually three constraints:

- Each voter has a fixed (and equal) amount of power that he may spread 
among the clusters, so that no voter has more power than another.
- Each cluster must have the same amount of power allocated to it, in 
total, so that the result is proportional.
- Each cluster must elect a different candidate.

CFC-Kemeny works by assigning a ranking to each cluster so that the top 
rank is different for each, then calculating the point score values 
(Kendall-tau distance) for each voter with respect to his vote and the 
cluster's ranking. Any given assignment of a voter's power gives a score 
for that voter, which is the sum of scores of his ranking with respect 
to each cluster's ranking, weighted by the amount of power he allocates 
to the cluster. Then the method finds the power assignment that 
maximizes the sum of voters' scores (minimizes distance).
That takes the form of an assignment problem, and means that a voter who 
votes A>... will probably allocate most of his power to the A-cluster, 
unless that would leave someone else unable to allocate power there 
because of the proportionality constraint.

There is one problem with CFC-Kemeny. It is extremely slow. While each 
instance's objective score (sum of scores across voters) can be 
calculated in polynomial time, there are somewhere between (n-1)! and n! 
possible rankings per cluster, so this is a combinatorial method where 
each seat has an exponential number of choices. No wonder, then, that it 
doesn't work for more than a few candidates and seats.

I have been trying to find other Condorcet methods to use than Kemeny, 
but most are very hard to express in linear programming and leave few 
variables left for weighting-by-power. For instance, it is possible to 
find the minmax winner by linear programming, but only by optimizing a 
variable, and multiplying one variable-to-optimize (the voter's power to 
a given cluster) with another (the minmax objective variable) is not 
permitted under linear programming.

However, I think I have found a method that would work. It is a 
linear-programming relaxation of the Young method (not Kemeny-Young, 
simply Young). The Young method is this: the candidate where you have to 
remove the least number of voters to have him become the CW, is the 
winner. In integer programming, it is expressed like this:

c*'s Young score is:
   max sum over non-negative values x[i], given that
           for A being any and every candidate except c*,
               sum p = 1...i
                        x[p] * e[p][c*][A]
               > 0

for voters 1..i, x[i] is integer (either 0 if the voter is not part or 1 
if he is), and where e[i][c*][A] is 1 if the ith voter ranks c* above A, 
otherwise 0.

The candidate with the greatest Young score wins.

This works because the "must be greater than zero" constraint implies 
that more voters have c* above A than A above c* no matter who A is. 
Thus the program maximizes the sum of x[i] across all i, i.e. the number 
of voters who are still being counted, subject to that c* is the CW.

In the integer programming variety, one deals with discrete voters, but 
it's not such a stretch to consider each voter a collection of 
"mini-voters", so that it's possible to have ballots of any weight. 
Doing so turns the problem polytime, too, since we can use linear 
instead of integer programming.

The clustering variant of this, is:

Assign the council to check as candidates c[1]...c[k] for k seats. Then,

Council c[1]...c[k]'s Young score is
   max sum over non-negative values x[i][k], given that

   for each C = 1...k
       for A[C] being any and every candidate except c[C],
         sum p = 1...i
                x[p][C] * e[p][c[C]][A]
         > 0

   for each p = 1...i
        sum C = 1...k
          x[p][C]
         <= 1

   for each C = 1...k, D = 1...k except C
         sum p = 1...i
              x[p][C] - x[p][D]
         = 0

(end of linear program)

As before, the council with the maximal Young score wins.

Here, x[a][b] is the ath voter's power given to cluster b. Unlike the 
original method, x[a][b] can be continuous given that 0 <= x[a][b] <= 1. 
The first for-each (really C constraints) implements Young for each 
cluster, i.e. that the given candidate must be the CW in his cluster. 
The second for-each (p constraints) sets the voter proportionality 
criterion - the voter can't contribute more than one point of power in 
total. The third for-each sets the per-cluster proportionality 
constraint, that each cluster has an equal amount of power in total.

It might work. I'm a bit unsure about the third for-each, though, 
because Young is not monotone as an optimization function. That is, if 
you have something like Dodgson (minimal number of changes needed to 
turn the candidate into the CW), guessing at a too high value does no 
harm. If 3 changes can turn A into a CW, so can 4 or 10. But that is not 
the case for Young. It may be the case that four voters can make A into 
a CW, but that doesn't mean that three - or five - can. So CFC-Young 
should work if leaving the third for-each out still keeps 
proportionality. If not, who knows? I'd have to implement it.