[EM] Scatter plot of clone independence versus IIA

Wed May 26 09:17:51 PDT 2021

I'm getting very interesting results from creating a scatter plot that 
shows success rates for Clone Independence (CI) versus Independence of 
Irrelevant Alternatives (IIA).  Of course these success-rate percentages 
convert into failure rates simply by subtracting them from 100.

Specifically the following chart shows the success-rate percentages for 
different vote-counting methods using different numbers of candidates.

https://www.rankedchoiceoregon.org/img/clone_iia_success_rates.jpg

The numbers "2" and "9" (in the method labels) indicate how many 
candidates were used for those data points.  The other points along each 
line indicate the intermediate candidate counts.

Disclaimer: My software probably still has some bugs.  Plus I might have 
made mistakes when I pieced together this formally formatted scatter 
plot.  I'm hoping that your feedback will help me identify any errors.

The Clone Independence (CI) test adds two clone candidates (who are 
similar to one of the original candidates) and identifies how often the 
winner changes.

Note: This scatter plot does NOT separate CI failures into their 
different types: (1) helps the similar candidate, (2) harms the similar 
candidate, (3) causes one of the clones to win, or (4) causes some other 
candidate to win.

The Independence of Irrelevant Alternatives (IIA) test removes each 
non-winning candidate, one at a time, to identify if the winner changes. 
  If any of these removals causes the winner to change, then the method 
fails that IIA test.

Note: The IIA test cannot fail when there are just two candidates 
because removing the non-winning candidate always causes the winner to 
still win.  As a consequence, the scatter-plot points along the right 
edge (which have the digit "2) only indicate the success rate for Clone 
Independence.

When a set of randomly generated ballots yields a tie -- for winner -- 
according to the Condorcet-Kemeny method or IRV method or Plurality 
method, then that attempted test is ignored for all the methods. 
Without these omissions too many tests would yield a "tie" result, which 
cannot be meaningfully categorized as either a failure or success.

Ties that occur at earlier stages of a method's computation are resolved 
in simple ways if the method does not define a way to resolve such ties. 
  In particular, when a set of ballots causes IRV to reach a tie in an 
earlier elimination round, all those tied candidates are eliminated in 
the same round.  This is why the IRV method sometimes fails the Clone 
Independence test even though IRV would never fail if lower-level ties 
were never encountered.

This use of randomly generated ballots often yields tied results and 
Condorcet cycles, especially compared to real-life elections where clear 
popularity differences are common.

Yet instead of interpreting these results as "unrealistic," I suggest 
that these tests be regarded as a kind of "stress test" where the 
method's fairness is tested in these challenging test cases. 
Specifically, high fairness rates in these challenging cases provides 
evidence of greater fairness in less-challenging real elections.

With these considerations in mind, here are some notable results:

* Approval voting has the lowest success rates for both clone 
independence (CI) and IIA.  About simulation: To convert from a ranking 
ballot to an Approval ballot, the candidates above the halfway 
preference level are approved.  When there is an odd number of 
candidates, the middle candidate gets one-half of an approval vote.

* Borda count has lower CI (clone) success rates (higher failure rates) 
compared to the other methods, except Approval.  About simulation: These 
simulations assume non-tactical voting, yet in real elections 
Borda-count ballots would be marked tactically (using burying, betrayal, 
etc. and ranking multiple candidates at the same preference level) to 
increase the voter's influence.

* IRV-BTR -- Instant-runoff voting with bottom-two runoff -- yields 
about the same results as IRV.  I included this method in hopes that the 
results would help Robert in his reform efforts in Burlington.  Alas, 
unless I've got a bug in the code -- which is very possible -- IRV-BTR 
isn't looking better than IRV for these two fairness criteria (CI and IIA).

* IRV -- Instant Runoff Voting -- becomes more vulnerable to IIA 
failures as the number of candidates increases, especially compared to 
the other methods (except Approval).  This vulnerability might be a 
consequence of the method having a zero CI failure rate (when there are 
no ties at any level).

* Plurality voting reverses back to fewer IIA failures as the number of 
candidates increases, which is different from the other methods.  Also, 
plurality voting has a less-predictable pattern compared to the other 
methods, which might be because it collects so little preference 
information.  The need for tactical voting in plurality elections means 
that real-life plurality failure rates (for close elections) will be 
higher than in these non-tactical simulations.  About simulation: The 
ballot's single vote goes to the candidate who is ranked highest.

* The Condorcet-Kemeny method has higher failure rates for Clone 
Independence when there are just two or three candidates.  The failure 
rate decreases when there are more candidates.  As a reminder, this 
scatter plot does not separate clone-independence failures into their 
different kinds of failures, which is important information regarding 
whether a method is vulnerable to money-based election tactics.

* STAR voting -- Score Then Automatic Runoff -- always uses six 
preference levels, so only the data point for six candidates 
meaningfully matches this method.  About simulation: The other data 
points for "STAR" voting simulate a STAR-like ballot on which the number 
of preference levels matches the number of candidates.  Also, these 
tests simulate non-tactical ballot marking, but human voters using STAR 
voting in a real-life election, especially a US general election, would 
use ballot-marking tactics that exploit the method's ability to express 
strong preferences.

* RCIPE -- Ranked Choice Including Pairwise Elimination -- has fewer CI 
failures compared to the other methods.  About this method: RCIPE is 
similar to IRV except that if an elimination round includes a "pairwise 
losing candidate" (a.k.a. Condorcet loser) then this candidate is 
eliminated instead of the fewest-transferred-votes candidate.  When an 
elimination round involves a tie, this method resolves the tie using the 
IPE counting method.

* IPE -- Instant Pairwise Elimination -- has the lowest failure rates 
compared to the other methods.  About this method: IPE eliminates the 
candidate who has the highest "opposition" count, where the opposition 
count for a candidate is the number of remaining candidates who (on a 
ballot) are ranked higher than that candidate, summed across all the 
ballots.  If there is a tie, it's resolved using the lowest "support" 
count, which counts the remaining candidates who are ranked lower than 
the specified candidate.  Method clarification: This counting method 
does not actually re-count the ballots for each elimination round; 
instead a calculator and pen and paper can be used with the overall 
pairwise-count table to calculate the opposition and support counts 
(which change after each elimination).

If someone wants a closer look at the actual numbers used in this 
scatter plot, they are at the bottom of this text file:

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/results_from_generate_random_ballots.txt

Here's a link to the C++ program (which mostly uses the C language 
subset) that generates random ballots, supplies these ballots to 
separate software where the winners are calculated, and analyzes the 
results:

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/generate_random_ballots.cpp

I hope you agree that these results are very helpful for comparing 
vote-counting methods.

For perspective, my interest in CI and IIA failure/success rates is 
based on my frustration that money-based tactics are used to manipulate 
election results by exploiting CI and IIA failures in ways that achieve 
strategic nomination.  These money-based tactics include:

* Financially supporting clones (similar candidates) who split votes 
away from a popular reform-minded candidate

* Removing financial support from clones (similar candidates) who split 
votes away from the status-quo-supporting (puppet-like) candidate

* Possibly supporting distracting (irrelevant) candidates

In contrast, I believe it's more difficult to use money to manipulate an 
election by exploiting other kinds of failures such as: monotonicity, 
favorite betrayal, later no help/harm, etc.  In other words, I believe 
these other fairness criteria too often get more attention than they 
deserve (under current money-dominant political conditions).

Here's a final note to those of you who are in the academic world.  I 
invite you to reproduce these results and add other methods and publish 
the results.  Why is this important?  The improvement of Wikipedia 
articles about better vote-counting methods requires peer-reviewed 
academic articles that meaningfully compare vote-counting methods in 
ways that go beyond Wikipedia's simple pass/fail checklist in the 
"Comparison of electoral systems" article.  Alas, too often that 
over-simplistic checklist gets cited as evidence of support for inferior 
vote-counting methods.

In other words, flagging vote-counting failures as either zero or 
non-zero is primitive.  We need more measurements of HOW OFTEN the 
non-zero failures occur.

Richard Fobes
"The VoteFair guy"