IIA performance (was Re: [EM] IMC, I2C and LIIA criteria)

Fri Mar 14 14:06:02 PST 2003

Markus Schulze wrote (13 March 2003):
> Steve wrote:
> > As for your conjecture that MAM and BeatpathWinner would
> > probably perform about the same in a simulation that adds
> > a randomly ranked candidate (or, equivalently, a
> > simulation that retallies after deleting a random loser,
> > which might be easier to write), I guess I'd be willing to
> > make a small wager that MAM would do slightly better than
> > BeatpathWinner, based on the other random voting
> > simulations that show MAM winners beat BeatpathWinner
> > winners pairwise more often than vice versa.
> 
> The MinMax method has the property that an additional
> candidate can change the winner without being elected only
> when the new candidate pairwise beats the original winner.

I assume that by "MinMax" Markus means one of the variations that 
measures each candidate's largest pairwise "defeat."  The property 
does not hold if the method elects the candidate whose largest 
pairwise opposition is smallest (assuming votes can express pairwise 
indifference). 

That property doesn't sound like a good proxy for IIA; else it would 
be an indicator that Minmax is the best method.  Minmax fails clone 
independence and local independence of irrelevant alternatives, which 
are satisfied by MAM, so it seems silly to argue that Minmax is a 
good IIA benchmark.

Markus' message had nothing more to say about MinMax (which is not 
the same as Smith//MinMax, which Markus went on to discuss but which 
doesn't share that property).  So I don't see the relevance of that 
property to the discussion of MAM vs. BeatpathWinner.

> Random simulations by Norman Petry in 2000 demonstrated that
> the winner of the beat path method is almost always the
> Smith//MinMax winner. 

As I recall, Norm made other errors besides the one pointed out by 
Rob LeGrand (13 Mar 2003, "[EM] Smith//MinMax").  Rob wrote:
   > I believe the incidences of agreement among 
   > Smith//MinMax, plain MinMax, beatpath and 
   > Ranked Pairs are all well over 90% for 
   > 15 candidates or fewer. Norm Petry was 
   > randomly generating pairwise matrices, 
   > not voter preferences, driving the incidence 
   > of Condorcet candidates down and disagreement 
   > among Condorcet methods up.

Also, randomly generating pairwise matrices can produce some matrices 
that are impossible for real votes to produce, if each element of the 
matrix represents a "winning votes" count rather than a "margin" 
count. (In other words, the number of votes that rank x over y, 
rather than the number that rank x over y minus the number that rank 
y over x.)  It's been so long that I don't recall if Norm explained 
how he avoided impossible matrices, or if he provided an argument why 
randomly generated matrices would appear in the same proportion as 
the matrices produced by randomly generated votes (and if not why 
this can't lead to biased results). 

I'm not convinced Norm implemented the correct definitions of 
BeatpathWinner or RankedPairs(wv) in his simulations.  If my memory 
is correct, Norm indicated at one point that he implemented an old 
definition that Markus posted long before, which was not equivalent 
to BeatpathWinner.  

The huge discrepancy between Smith//Minmax and RP suggests he didn't 
program RP correctly.  And I don't recall if Norm said how his 
implementation of RP broke ties among same size majorities, which 
can't be done according to spec given only the pairwise matrix.)

> Therefore, I would give a small wager that the 
> beat path method does it better. 

Markus originally conjectured that MAM and BeatpathWinner would 
perform about the same on IIA.  He appears to have changed his mind.  
Perhaps after I posted a reason why I'd wager MAM would do better, he 
felt he should try to manufacture a counter-argument.

> For example, when there are 15 candidates then the
> Smith//MinMax winner and the winner of the beat path method
> are identical in 91.7% while the Smith//MinMax winner and
> the Ranked Pairs winner are identical in only 41.8% of all
> situations.

My own simulations directly compared MAM and BeatpathWinner using 
randomly generated votes, and produced data that conflict 
considerably with Norm's, and I place more faith in my results than 
his. (Apparently Rob LeGrand also has data that conflict with 
Norm's.)  

Here are some relevant excerpts from my data.  The third column shows 
the percentage of scenarios where the MAM winner beat the 
BeatpathWinner winner pairwise, and the fourth column shows the 
percentage of scenarios where the BeatpathWinner winner beat the MAM 
winner pairwise:

   #Alts    #Voters    MAM>BPW    BPW>MAM
     15       100       9.756%     1.319%
     15       101       5.290%     0.277%
     15      1000      13.23%      2.08% 

(The rest of my data can be seen at my web pages at 
www.alumni.caltech.edu/~seppley by following the appropriate link.  
There's also a link to a comparison of MAM vs. Instant Runoff, which 
shows MAM winners beat IRV winners pairwise more often than vice 
versa.  For example, for 15 alternatives and 100 voters, MAM>IRV is 
28.71% and IRV>MAM is 9.43%.)

Assuming Markus correctly quoted Norm's stats, they imply that MAM 
and BeatpathWinner disagree in about half of the 15-alternatives 
scenarios, whereas my stats show they disagree in far less than half. 
 Thus Rob LeGrand and I believe Norm's data is flawed.

I could run simulations comparing the agreements of Smith//Minmax or 
Minmax with MAM and with BeatpathWinner, but I don't accept Markus' 
argument that that would shed light on which method provides the most 
independence from irrelevant alternatives.  I think simulations 
directly comparing MAM and BeatpathWinner, such as mine excerpted 
above, provide a real reason to believe MAM provides more 
independence than BeatpathWinner, by showing that MAM winners beat 
BeatpathWinner winners pairwise more often than vice versa.  Directly 
comparing MAM and the BeatpathWinner by the IIA performance 
simulations that Markus suggested be tried would of course be more 
definitive.

If a clearer argument is provided that simulations comparing the 
agreement of Smith//Minmax or Minmax with MAM and with BeatpathWinner 
would be useful, I'll try to find time to do it. (Rob's message hints 
that he or someone besides Norm has done it.  Is that data available? 
 If so, I could execute a small part of the simulation to compare a 
few data points, and stop early if my data points are about the same 
as Rob's.)

If anyone decides to program the IIA performance simulations, I 
recommend doing it by deleting a loser (the "contraction consistency" 
test) rather than adding a new alternative (the "expansion 
consistency" test).  With random voting, an added alternative has 
nearly a 50% chance of beating the previous winner pairwise, and I'd 
guess an added alternative would have about a 40% chance of becoming 
the new winner (which is hardly an irrelevant alternative) which 
means many scenarios would have to be discarded.  To speed the tests, 
I think it makes sense to reuse the votes n-1 times, where n is the 
number of alternatives, by deleting each of the losers (such that 
only one is deleted at a time) instead of deleting only one randomly 
picked loser. (Also, the votes could be reused even more by deleting 
randomly selected subsets of the losers.)

One might also consider testing IIA performance by checking how often 
deletion of the last-place alternative causes the winner to change.  
An argument for this test is that the last-place alternative ought to 
be the "least relevant" alternative.  Two arguments against this test 
are (1) being ranked last by a method is not necessarily the same 
thing as being least relevant, and (2) IIA says nothing about "least 
relevant" alternatives.  MAM would trounce BeatpathWinner on this 
test, since MAM satisfies LIIA and BeatpathWinner does not. (A 
corollary of LIIA satisfaction is that MAM's winner will not change 
if the last-place alternative is deleted.)  I assume Markus will 
argue that this is not a reasonable test of IIA performance.  :-)

Did Peyton Young provide a clear argument anywhere to explain his 
claim that LIIA is a "slight weakening" of IIA?
Presumably, Young thought LIIA is more than just a curiosity.  
Perhaps we should ask him.  I think he's currently at Johns Hopkins 
(www.jhu.edu), or maybe at the Brookings Institution.

-- Steve Eppley