[EM] Scatter plot of clone independence versus IIA
VoteFair
electionmethods at votefair.org
Wed May 26 09:17:51 PDT 2021
I'm getting very interesting results from creating a scatter plot that
shows success rates for Clone Independence (CI) versus Independence of
Irrelevant Alternatives (IIA). Of course these success-rate percentages
convert into failure rates simply by subtracting them from 100.
Specifically the following chart shows the success-rate percentages for
different vote-counting methods using different numbers of candidates.
https://www.rankedchoiceoregon.org/img/clone_iia_success_rates.jpg
The numbers "2" and "9" (in the method labels) indicate how many
candidates were used for those data points. The other points along each
line indicate the intermediate candidate counts.
Disclaimer: My software probably still has some bugs. Plus I might have
made mistakes when I pieced together this formally formatted scatter
plot. I'm hoping that your feedback will help me identify any errors.
The Clone Independence (CI) test adds two clone candidates (who are
similar to one of the original candidates) and identifies how often the
winner changes.
Note: This scatter plot does NOT separate CI failures into their
different types: (1) helps the similar candidate, (2) harms the similar
candidate, (3) causes one of the clones to win, or (4) causes some other
candidate to win.
The Independence of Irrelevant Alternatives (IIA) test removes each
non-winning candidate, one at a time, to identify if the winner changes.
If any of these removals causes the winner to change, then the method
fails that IIA test.
Note: The IIA test cannot fail when there are just two candidates
because removing the non-winning candidate always causes the winner to
still win. As a consequence, the scatter-plot points along the right
edge (which have the digit "2) only indicate the success rate for Clone
Independence.
When a set of randomly generated ballots yields a tie -- for winner --
according to the Condorcet-Kemeny method or IRV method or Plurality
method, then that attempted test is ignored for all the methods.
Without these omissions too many tests would yield a "tie" result, which
cannot be meaningfully categorized as either a failure or success.
Ties that occur at earlier stages of a method's computation are resolved
in simple ways if the method does not define a way to resolve such ties.
In particular, when a set of ballots causes IRV to reach a tie in an
earlier elimination round, all those tied candidates are eliminated in
the same round. This is why the IRV method sometimes fails the Clone
Independence test even though IRV would never fail if lower-level ties
were never encountered.
This use of randomly generated ballots often yields tied results and
Condorcet cycles, especially compared to real-life elections where clear
popularity differences are common.
Yet instead of interpreting these results as "unrealistic," I suggest
that these tests be regarded as a kind of "stress test" where the
method's fairness is tested in these challenging test cases.
Specifically, high fairness rates in these challenging cases provides
evidence of greater fairness in less-challenging real elections.
With these considerations in mind, here are some notable results:
* Approval voting has the lowest success rates for both clone
independence (CI) and IIA. About simulation: To convert from a ranking
ballot to an Approval ballot, the candidates above the halfway
preference level are approved. When there is an odd number of
candidates, the middle candidate gets one-half of an approval vote.
* Borda count has lower CI (clone) success rates (higher failure rates)
compared to the other methods, except Approval. About simulation: These
simulations assume non-tactical voting, yet in real elections
Borda-count ballots would be marked tactically (using burying, betrayal,
etc. and ranking multiple candidates at the same preference level) to
increase the voter's influence.
* IRV-BTR -- Instant-runoff voting with bottom-two runoff -- yields
about the same results as IRV. I included this method in hopes that the
results would help Robert in his reform efforts in Burlington. Alas,
unless I've got a bug in the code -- which is very possible -- IRV-BTR
isn't looking better than IRV for these two fairness criteria (CI and IIA).
* IRV -- Instant Runoff Voting -- becomes more vulnerable to IIA
failures as the number of candidates increases, especially compared to
the other methods (except Approval). This vulnerability might be a
consequence of the method having a zero CI failure rate (when there are
no ties at any level).
* Plurality voting reverses back to fewer IIA failures as the number of
candidates increases, which is different from the other methods. Also,
plurality voting has a less-predictable pattern compared to the other
methods, which might be because it collects so little preference
information. The need for tactical voting in plurality elections means
that real-life plurality failure rates (for close elections) will be
higher than in these non-tactical simulations. About simulation: The
ballot's single vote goes to the candidate who is ranked highest.
* The Condorcet-Kemeny method has higher failure rates for Clone
Independence when there are just two or three candidates. The failure
rate decreases when there are more candidates. As a reminder, this
scatter plot does not separate clone-independence failures into their
different kinds of failures, which is important information regarding
whether a method is vulnerable to money-based election tactics.
* STAR voting -- Score Then Automatic Runoff -- always uses six
preference levels, so only the data point for six candidates
meaningfully matches this method. About simulation: The other data
points for "STAR" voting simulate a STAR-like ballot on which the number
of preference levels matches the number of candidates. Also, these
tests simulate non-tactical ballot marking, but human voters using STAR
voting in a real-life election, especially a US general election, would
use ballot-marking tactics that exploit the method's ability to express
strong preferences.
* RCIPE -- Ranked Choice Including Pairwise Elimination -- has fewer CI
failures compared to the other methods. About this method: RCIPE is
similar to IRV except that if an elimination round includes a "pairwise
losing candidate" (a.k.a. Condorcet loser) then this candidate is
eliminated instead of the fewest-transferred-votes candidate. When an
elimination round involves a tie, this method resolves the tie using the
IPE counting method.
* IPE -- Instant Pairwise Elimination -- has the lowest failure rates
compared to the other methods. About this method: IPE eliminates the
candidate who has the highest "opposition" count, where the opposition
count for a candidate is the number of remaining candidates who (on a
ballot) are ranked higher than that candidate, summed across all the
ballots. If there is a tie, it's resolved using the lowest "support"
count, which counts the remaining candidates who are ranked lower than
the specified candidate. Method clarification: This counting method
does not actually re-count the ballots for each elimination round;
instead a calculator and pen and paper can be used with the overall
pairwise-count table to calculate the opposition and support counts
(which change after each elimination).
If someone wants a closer look at the actual numbers used in this
scatter plot, they are at the bottom of this text file:
https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/results_from_generate_random_ballots.txt
Here's a link to the C++ program (which mostly uses the C language
subset) that generates random ballots, supplies these ballots to
separate software where the winners are calculated, and analyzes the
results:
https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/generate_random_ballots.cpp
I hope you agree that these results are very helpful for comparing
vote-counting methods.
For perspective, my interest in CI and IIA failure/success rates is
based on my frustration that money-based tactics are used to manipulate
election results by exploiting CI and IIA failures in ways that achieve
strategic nomination. These money-based tactics include:
* Financially supporting clones (similar candidates) who split votes
away from a popular reform-minded candidate
* Removing financial support from clones (similar candidates) who split
votes away from the status-quo-supporting (puppet-like) candidate
* Possibly supporting distracting (irrelevant) candidates
In contrast, I believe it's more difficult to use money to manipulate an
election by exploiting other kinds of failures such as: monotonicity,
favorite betrayal, later no help/harm, etc. In other words, I believe
these other fairness criteria too often get more attention than they
deserve (under current money-dominant political conditions).
Here's a final note to those of you who are in the academic world. I
invite you to reproduce these results and add other methods and publish
the results. Why is this important? The improvement of Wikipedia
articles about better vote-counting methods requires peer-reviewed
academic articles that meaningfully compare vote-counting methods in
ways that go beyond Wikipedia's simple pass/fail checklist in the
"Comparison of electoral systems" article. Alas, too often that
over-simplistic checklist gets cited as evidence of support for inferior
vote-counting methods.
In other words, flagging vote-counting failures as either zero or
non-zero is primitive. We need more measurements of HOW OFTEN the
non-zero failures occur.
Richard Fobes
"The VoteFair guy"
More information about the Election-Methods
mailing list