[EM] Feature extraction and criteria for multiwinner elections

Fri Jan 2 02:38:55 PST 2009

One way of making multiwinner elections proportional is to have the 
method pass certain criteria. Most obvious of these are Droop 
proportionality, which is the multiwinner analog to mutual majority. 
However, such criteria can only say what the method should do, in 
certain cases, not what it does in all cases. This is like Condorcet - a 
Condorcet method elects a CW whenever there is one, but it says nothing 
about what happens when there isn't a CW.

So, if we're going to make better multiwinner methods, I think we need a 
way that's applicable to all situations, so that we can apply the same 
rule consistently. Another approach would be to try to devise criteria 
that cover increasingly more of the situations, but I'll get to that later.

If we're going to have a rule or metric for a good multiwinner election, 
what should it be? Well, the intuitive nature of proportional 
representation is that if there are some factions, and those factions 
all vote accordingly to their preferences, then those factions should be 
represented according to their numbers. This suggests some way of 
feature extraction: find underlying points in issue space, or the 
composition of issue space itself; then pick the candidates that most 
accurately represent this configuration.

That, in a sense, is what my test system does to find good multiwinner 
systems: it constructs n issues (of varying support among the people), 
then randomly assigns agrees and disagrees for each issue to each voter 
(and candidate) so that the support reaches the fraction in question. 
For instance, for an issue set so that 70% are in favor, a random 
fraction of 0.7 of the people (voters and candidates) have that issue in 
favor as well.

It may be possible to use this idea in reverse: construct some model of 
how voters weight disagreements, then assign n bits (for some n) so that 
the RMSE between their ballot scores and the predicted ballot scores 
(based on assigned issues to voters and candidates) is minimized. For a 
binary issue profile, in my simulations, I used simple "Hamming 
agreement". That is, if a voter and a candidate agree on five issues out 
of ten, the voter gives five points of ten (or 2.5 of 5 or whatnot). 
Obviously, this lends itself much more to cardinal than to ordinal 
ballots. What's important is not whether voters actually do this, but 
whether it's a good model - that is, whether the RMSE can be made very 
low by doing this. If voters vote based on personal appeal or something 
like that, and that can be modeled as three or four virtual "issues", 
then no great loss.

The good thing about binary issues is that once we have reconstructed 
them, it's simple (well, in the NP sense) to ensure proportionality. 
Simply pick the set of candidates so that the difference between the 
fractions supporting each issue, and those fractions for the entire 
people, is minimized according to some error (probably should be the 
Sainte-Lague metric, Gini, or RMSE).

What's not so easy is to assign the issues in the first place. The 
formal problem becomes something like: "define an issue matrix, n_i * 
(voters + candidates), where n_i[issue][person] is either 0 or 1; 
further define a model scoring function f(voter, cand) = SUM(k = 
1..num_issues, (1 - |n_i[k][voter] - n_i[k][cand]|)); now, given a 
voters*candidates score matrix q, and an integer p > 0, populate an 
issue array of p*(voter+candidates) so that the RMSE of the matrix, 
where difference at a point is defined as (f(voter, cand) - 
q[voter][cand]), is minimized". The decision version of the problem is, 
"is there any way of constructing such a binary issue matrix so that the 
RMSE (or some other error) is below a certain value?". I think the 
decision version is in NP, so at worst, the problem is NP-complete, but 
what's worse is that if it is, it's in not just the number of 
candidates, but also in the number of voters.

So that might be too hard. The idea seems sound, though; construct an 
opinion space and then pick proportionally from it. Could we use other 
feature extraction methods? There is one such function/method that can 
be done in polytime, namely SVD - singular value decomposition. It's 
used in, among other things, predicting movie ratings by most entries to 
the Netflix contest. However, though I have tinkered a bit with SVD, I 
haven't found any way of translating its result into issue space, or 
getting good results in my simulations by any such SVD-driven selection. 
The two ways I've tried have been to pick candidates so that the sum of 
each row is the same for the candidates and for the population at large, 
and also one so that a histogram over the candidates (or rather, a KDE, 
but it's roughly the same) is similar to one over the people, for all 
"issues". Neither seems to give much better results than random. Do any 
of you have ideas as to how to use SVD for this purpose? I'm no expert 
in statistics.

The binary issue model might also be expanded into a continuous value 
model, where each "issue" is no longer a yes/no but a point along a 
line. The right way of reproducing that issue space would probably be, 
as above, to use a KDE to synthesize a probability distribution and then 
check the similarity of that for some candidate set against that for the 
people; but if we can't construct binary issue models, we probably can't 
construct continuous value ones either, unless there's some 
"discontinuity makes it more difficult" case analogous to the difference 
between linear and integer programming.

-

I said I was going to mention some criteria that would cover more than 
Droop proportionality. I'm going to use the shorthand (k,n) for an 
election that picks a council of k out of n candidates. Some ideas are:

Condorcet: If the election is (1, n) and there's a CW, it should be elected.

Reverse Condorcet: If the election is (n-1, n) and there's a Condorcet 
loser, all but the Condorcet loser should be elected.

Fragmented Condorcet: If the election is (k, n), and there's a way of 
dividing the ballots into k piles so that each of those piles have a CW, 
and all k CWs are different, then those CWs should be elected. If there 
are multiple such partitions, the method passes if it elects according 
to one of them.

The idea of Condorcet is rather simple; if the multiwinner method's any 
good, it should be a good single-winner method in the (1,n) election 
case as well.

Reverse Condorcet is inspired by something I observed in Raph Frank's 
2-of-3 multiwinner diagrams. If you assign the three candidates 
according to primary colors (that is, red, green, and blue), then you 
get a diagram with Voronoi-type shapes in composite colors. See this, 
for instance: http://munsterhjelm.no/km/composite_multiwinner.png , 
which was made by overlaying "Optimal Utility (PAV)" from 
http://ivnryan.com/ping_yee/triang_10000_2.html

Fragmented Condorcet is something I'm more uncertain about, but it can 
be traced to two ideas. The first one is that of PR being like elections 
by faction: if there are two factions, one can imagine each faction 
having a separate CW, in which case, if the factions are of equal size, 
those two are those that should be elected. The other idea is that of 
how PR methods have to fail house monotonicity. Say you have something like
30: Left > Center > Right
30: Right > Center > Left
  4: Center > Left > Right
When electing one, that one should be Center to be Condorcet. When 
electing two, it should be Left and Right. This can be done by splitting 
like this:

15: Left > Center > Right
  2: Center > Left > Right

15: Right > Center > Left
  2: Center > Left > Right

which gives {Left, Right}, W5. By similar reasoning, if there's a direct 
democracy and everybdoy votes for themselves first, and the council's 
the same size as the people (direct democracy) then everybody is in it.

However, the Left-Center-Right example may also suggest that Fragmented 
Condorcet isn't desirable at all, since it splits the Center voters, and 
thus is merely a way of doing automatic gerrymandering.

What are your opinions on the criteria in general?