[EM] Proportional representation methods with very many candidates

Tue Dec 3 04:49:00 PST 2013

Recently, I've been fiddling a bit with recommender systems, and I have 
noticed what is often a problem with these systems: they recommend very 
many things of the same sort.

That is, if it detects that a person likes action movies, it may 
recommend a whole slate of them. This may even be correct in the respect 
that the user would like any randomly picked action movie ranked highly 
by the system more than the non-action movies that come later. But if he 
were to watch all of the movies from most recommended to least, he would 
grow tired very quickly.

I noted that in political terms, this is similar to the cloning problem 
you get when using a good majoritarian method (say, Condorcet) to fill a 
council. If the majority-preferred candidate has many clones, then all 
of that candidate's clones will fill the council, because the method 
(correctly) judges that the majority prefers each clone to the other 
candidates.

How can we deal with that in a political context? We can do so by using 
PR. And this suggests that one could also do something similar in a 
recommender system context.

That is, consider a cosine distance system: "you're likely to like what 
others that liked what you did also like". Then each user has a weighted 
vote corresponding to the similarity of sets (how much he likes what you 
likes), and the recommender system works by running a weighted Approval 
or Range election.

Which brings me to the point of my post (I tend to write long 
introductions!). In such systems you can easily have thousands or 
millions of "candidates". Which method should we choose if we want a PR 
election in order to provide diversity? The sheer number poses problems 
for many of the known systems.

For one, because there are so many "candidates", most of them will be 
tied in a Plurality elimination stage like the one in STV. So 
elimination will be more or less random at first, until one gets down to 
sufficiently few candidates that the votes can accumulate. So ordinary 
STV doesn't seem to be all that good.

So how about the virtual contest or combinatorial methods that avoid 
this problem? Both virtual contest methods (like Schulze STV or CPO-STV) 
and combinatorial methods (like PAV) don't scale well enough. And 
sequential ones (PRV, sequential PAV, etc) fail setwise election criteria.

Thus my question: Can you think of any method that would work well in 
this setting?

The only ones I can think of at the moment are STV-ME and a quadratic 
STV method I don't remember the details of. The former, by its 
"elimination election" by Condorcet among the losers, should have a 
greater chance of telling different zero-Plurality-votes candidates 
apart. And the quadratic STV method was a patch to STV that would let 
one use any positional method possible while still retaining the quota 
guarantees of STV. Again, that would make it easier to tell losers apart.

There is of course the compromise solution of just using sequential PAV 
or proportional Range voting. In "worse-is-better" style, it wouldn't 
care very much about theoretical guarantees as long as the method is 
good enough. These sequential methods could also be brought closer to 
the combinatorial ones by considering chunks of k candidates at a time, 
with the greatest k that is still practical for the available processing 
power.