[EM] Forward from Warren Smith

Sat Feb 23 08:55:00 PST 2002

To voting-theory people...

A new paper on voting methods just came out, with a rather different
angle than usual (1. written by computer, not political, scientists; 2.
oriented toward task of ranking web pages by search engines, i.e. 100000
candidates, 20 voters, versus usual political scenario of 10 candidates,
100000 voters; 3. rather different notions of "good quality"... want the top 10
web pages to have a good chance of containing the one you want and not "spam"...).
It might be interesting to try to take their voting methods and put them into
my computer program that compares different voting systems, and see what happens.

Cynthia Dwork  Ravi Kumar  Moni Naor  D. Sivakumar:
Rank Aggregation Methods for the Web,
http://www10.org/cdrom/papers/577/ 

They do not give pseudocode for their algorithms... 
They have 3 main ideas (#1 and #2 are competing ideas, and they ultimately prefer #2):
1. use of "min-weight matching in a complete bipartite graph"
to find the permutation "best approximating" the vote-permutations
2. construction of various Markov chains on the candidates, and the 
rank-ordering output by the voting system then corresponds to the stationary
distribution of the Markov chain. 
3. "local Kemenyization" to "improve" the rank-ordering output by any voting
system.
[See, e.g. their proposition 4 and their Markov chains MC1,MC2,MC3,MC4.]
All these ideas are much more sophisticated algorithmically than the
usual voting algorithms, but still are polynomial time.

After experiments they conclude "Of all the methods... MC4 outperforms all others...
[based on their human judgement of how the web pages "should" have been ranked]
beats Borda by a huge margin... Local Kemenyization seems to improve [everything tried]
by about 1-3%..."  UnKemenyized MC4 seems to be the following:
  A. consider the following Markov chain. If at a candidate P, pick a candidate Q
     at random: if Q>P according to a majority of voters, go to Q, else stay at P.
  B. find the stationary-limit distribution of this Markov chain [this can be
     done by finding the Perron-Frobenius eigenvector of a certain CxC matrix if there
     are C candidates] and rank candidates according to their (decreasing) probabilities 
     in this distribution.
MC4 is rather similar to Condorcet's method, but intuitively seems
like it might be better than Condorcet since it takes advantage of information that
Condorcet ignores/discards?  On the other hand, note that, like the "Copeland"
system it discards some info that Condorcet uses -- namely, the victory-margins
in the pairwise comparisons. That suggests it would be worse than Condorcet?
Their MC1 and MC3 schemes do not discard victory margins but experimentally are worse
than MC4 in the sense that they are "more vulnerable to spam."
(They did not do the MC4 vs Condorcet experimental
comparison, so one cannot tell.)  Typically most candidates will have ZERO
stationary probability and there will be a small intransitive cycle of candidates 
with nonzero, and these candidates often will have non-tied rankings...

Here is the abstract of a talk Dwork gave on this:
-------------------------------------------------------------------------
Topic: Rank Aggregation Revisited
Speaker: Cynthia Dwork
Date: 01/11/2001 

Abstract:
The rank aggregation problem is to combine many different rank orderings on the
same set of candidates, or alternatives, in order to obtain a ``better'' ordering.
Rank aggregation has been studied extensively in the context of social choice
theory, where several ``voting paradoxes'' have been discovered. It also arises in
sports (How to rank/compare players?), collaborative filtering, meta-search, and
database middleware.

A natural step toward aggregation was taken by Kemeny. Informally, given k input
orderings, a Kemeny-Optimal aggregation is an ordering that minimizes the sum of
``bubble sort'' distances to the input orderings. Thus, intuitively, Kemeny optimal
solutions produce best compromise orderings. However, finding a Kemeny optimal
aggregation is NP-hard.

We revisit rank aggregation with an eye toward reducing spam - -- strategic
manipulation of web pages in order to achieve an ``undeservedly'' high ranking --
in meta-search. We note the virtues of Kemeny-Optimal aggregation in this context
and develop a natural relaxation that preserves the spam-fighting capabilities of
Kemeny-Optimality at vastly reduced cost. We show how to efficiently take *any*
initial aggregation of the input orderings and produce its ``local Kemenization''
-- intuitively, a maximally consistent locally Kemeny-Optimal solution. This yields
a new approach to rank aggregation: ``Given the input orderings, first construct
any desirable initial aggregation and then output its local Kemenization.'' We
propose the use of Markov chains for obtaining the initial aggregation, and suggest
specific chains for this purpose.

Joint work with Ravi Kumar, Moni Naor, and D. Sivakumar.

---------------------------------------------------------------