[EM] Scatter plot of clone independence versus IIA

Sat Jun 19 17:41:06 PDT 2021

I have updated the scatter plot that charts Clone Independence and IIA 
(Independence of Irrelevant Alternatives) for various methods.  The 
scatter plot is (still) at:

   https://www.rankedchoiceoregon.org/img/clone_iia_success_rates.jpg

Here are the most important changes to the software and chart:

* The Clone Independence test now categorizes the case of a clone 
candidate displacing the similar candidate as a success, not a failure. 
This interpretation was requested on the r/EndFPTP subreddit, and it 
makes sense to me.  The other three kinds of CI failures continue to be 
categorized as failures.  This means that when the original candidate 
who is similar to the clones (is there a name for this candidate?) wins 
without the clones, and then loses to one of the clones (when they are 
added), that displacement is not regarded as a failure of clone 
independence.  The wording in the Wikipedia article on Clone 
Independence implies that such displacements are failures, so that's the 
interpretation I originally used.  I'm not suggesting that the Wikipedia 
wording is incorrect; rather I'm suggesting that when measuring CI 
success/failure rates it's important to also measure how often each kind 
of CI failure occurs (which is what I wrote earlier), and because this 
scatter plot is intended to be a meaningful summary of the measurements.

* Because of this change, the success rates for the Condorcet-Kemeny 
method and the Borda count method are much higher.

* The success rates for the Condorcet-Kemeny method are now very similar 
to the Instant Pairwise Elimination (IPE) success rates.  This is not 
surprising to me because I designed opposition counts -- which are used 
in the IPE method -- as a quick way to estimate Condorcet-Kemeny 
results.  (I had been surprised by the earlier results.)

* The Instant Pairwise Elimination (IPE) method calculations now 
correctly eliminate Condorcet losers when they occur.  The previous 
calculations just used the pairwise opposition counts.  The resulting CI 
and IIA measurements remain about the same.  (Typically when there is a 
Condorcet loser, the Condorcet loser has an opposition count that is 
very close to the highest opposition count.)

* I've eliminated the Approval method from the plot.  As someone pointed 
out on r/EndFPTP, converting ranked ballots into Approval ballots is not 
meaningful.  This is especially true for these tests because my counting 
software is separate from the ballot-generation software, so the clone 
candidates often do not get the same approval rating as the original 
similar candidate.

* Minor change: Now that the Approval results are gone, the scale starts 
at 50 percent compliance.

Additional details are in the software, which is on Github here:

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/generate_random_ballots.cpp

As a reminder (and because someone on Reddit asked), I chose CI and IIA 
because they measure vulnerability to strategic nomination, which is 
easily exploited through the control of campaign contributions.

Richard Fobes

On 5/26/2021 9:17 AM, VoteFair wrote:
> I'm getting very interesting results from creating a scatter plot that
> shows success rates for Clone Independence (CI) versus Independence of
> Irrelevant Alternatives (IIA).  Of course these success-rate percentages
> convert into failure rates simply by subtracting them from 100.
>
> Specifically the following chart shows the success-rate percentages for
> different vote-counting methods using different numbers of candidates.
>
> https://www.rankedchoiceoregon.org/img/clone_iia_success_rates.jpg
>
> The numbers "2" and "9" (in the method labels) indicate how many
> candidates were used for those data points.  The other points along each
> line indicate the intermediate candidate counts.
>
> Disclaimer: My software probably still has some bugs.  Plus I might have
> made mistakes when I pieced together this formally formatted scatter
> plot.  I'm hoping that your feedback will help me identify any errors.
>
> The Clone Independence (CI) test adds two clone candidates (who are
> similar to one of the original candidates) and identifies how often the
> winner changes.
>
> Note: This scatter plot does NOT separate CI failures into their
> different types: (1) helps the similar candidate, (2) harms the similar
> candidate, (3) causes one of the clones to win, or (4) causes some other
> candidate to win.
>
> The Independence of Irrelevant Alternatives (IIA) test removes each
> non-winning candidate, one at a time, to identify if the winner changes.
>  If any of these removals causes the winner to change, then the method
> fails that IIA test.
>
> Note: The IIA test cannot fail when there are just two candidates
> because removing the non-winning candidate always causes the winner to
> still win.  As a consequence, the scatter-plot points along the right
> edge (which have the digit "2) only indicate the success rate for Clone
> Independence.
>
> When a set of randomly generated ballots yields a tie -- for winner --
> according to the Condorcet-Kemeny method or IRV method or Plurality
> method, then that attempted test is ignored for all the methods. Without
> these omissions too many tests would yield a "tie" result, which cannot
> be meaningfully categorized as either a failure or success.
>
> Ties that occur at earlier stages of a method's computation are resolved
> in simple ways if the method does not define a way to resolve such ties.
>  In particular, when a set of ballots causes IRV to reach a tie in an
> earlier elimination round, all those tied candidates are eliminated in
> the same round.  This is why the IRV method sometimes fails the Clone
> Independence test even though IRV would never fail if lower-level ties
> were never encountered.
>
> This use of randomly generated ballots often yields tied results and
> Condorcet cycles, especially compared to real-life elections where clear
> popularity differences are common.
>
> Yet instead of interpreting these results as "unrealistic," I suggest
> that these tests be regarded as a kind of "stress test" where the
> method's fairness is tested in these challenging test cases.
> Specifically, high fairness rates in these challenging cases provides
> evidence of greater fairness in less-challenging real elections.
>
> With these considerations in mind, here are some notable results:
>
> * Approval voting has the lowest success rates for both clone
> independence (CI) and IIA.  About simulation: To convert from a ranking
> ballot to an Approval ballot, the candidates above the halfway
> preference level are approved.  When there is an odd number of
> candidates, the middle candidate gets one-half of an approval vote.
>
> * Borda count has lower CI (clone) success rates (higher failure rates)
> compared to the other methods, except Approval.  About simulation: These
> simulations assume non-tactical voting, yet in real elections
> Borda-count ballots would be marked tactically (using burying, betrayal,
> etc. and ranking multiple candidates at the same preference level) to
> increase the voter's influence.
>
> * IRV-BTR -- Instant-runoff voting with bottom-two runoff -- yields
> about the same results as IRV.  I included this method in hopes that the
> results would help Robert in his reform efforts in Burlington.  Alas,
> unless I've got a bug in the code -- which is very possible -- IRV-BTR
> isn't looking better than IRV for these two fairness criteria (CI and IIA).
>
> * IRV -- Instant Runoff Voting -- becomes more vulnerable to IIA
> failures as the number of candidates increases, especially compared to
> the other methods (except Approval).  This vulnerability might be a
> consequence of the method having a zero CI failure rate (when there are
> no ties at any level).
>
> * Plurality voting reverses back to fewer IIA failures as the number of
> candidates increases, which is different from the other methods.  Also,
> plurality voting has a less-predictable pattern compared to the other
> methods, which might be because it collects so little preference
> information.  The need for tactical voting in plurality elections means
> that real-life plurality failure rates (for close elections) will be
> higher than in these non-tactical simulations.  About simulation: The
> ballot's single vote goes to the candidate who is ranked highest.
>
> * The Condorcet-Kemeny method has higher failure rates for Clone
> Independence when there are just two or three candidates.  The failure
> rate decreases when there are more candidates.  As a reminder, this
> scatter plot does not separate clone-independence failures into their
> different kinds of failures, which is important information regarding
> whether a method is vulnerable to money-based election tactics.
>
> * STAR voting -- Score Then Automatic Runoff -- always uses six
> preference levels, so only the data point for six candidates
> meaningfully matches this method.  About simulation: The other data
> points for "STAR" voting simulate a STAR-like ballot on which the number
> of preference levels matches the number of candidates.  Also, these
> tests simulate non-tactical ballot marking, but human voters using STAR
> voting in a real-life election, especially a US general election, would
> use ballot-marking tactics that exploit the method's ability to express
> strong preferences.
>
> * RCIPE -- Ranked Choice Including Pairwise Elimination -- has fewer CI
> failures compared to the other methods.  About this method: RCIPE is
> similar to IRV except that if an elimination round includes a "pairwise
> losing candidate" (a.k.a. Condorcet loser) then this candidate is
> eliminated instead of the fewest-transferred-votes candidate.  When an
> elimination round involves a tie, this method resolves the tie using the
> IPE counting method.
>
> * IPE -- Instant Pairwise Elimination -- has the lowest failure rates
> compared to the other methods.  About this method: IPE eliminates the
> candidate who has the highest "opposition" count, where the opposition
> count for a candidate is the number of remaining candidates who (on a
> ballot) are ranked higher than that candidate, summed across all the
> ballots.  If there is a tie, it's resolved using the lowest "support"
> count, which counts the remaining candidates who are ranked lower than
> the specified candidate.  Method clarification: This counting method
> does not actually re-count the ballots for each elimination round;
> instead a calculator and pen and paper can be used with the overall
> pairwise-count table to calculate the opposition and support counts
> (which change after each elimination).
>
> If someone wants a closer look at the actual numbers used in this
> scatter plot, they are at the bottom of this text file:
>
> https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/results_from_generate_random_ballots.txt
>
>
> Here's a link to the C++ program (which mostly uses the C language
> subset) that generates random ballots, supplies these ballots to
> separate software where the winners are calculated, and analyzes the
> results:
>
> https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/generate_random_ballots.cpp
>
>
> I hope you agree that these results are very helpful for comparing
> vote-counting methods.
>
> For perspective, my interest in CI and IIA failure/success rates is
> based on my frustration that money-based tactics are used to manipulate
> election results by exploiting CI and IIA failures in ways that achieve
> strategic nomination.  These money-based tactics include:
>
> * Financially supporting clones (similar candidates) who split votes
> away from a popular reform-minded candidate
>
> * Removing financial support from clones (similar candidates) who split
> votes away from the status-quo-supporting (puppet-like) candidate
>
> * Possibly supporting distracting (irrelevant) candidates
>
> In contrast, I believe it's more difficult to use money to manipulate an
> election by exploiting other kinds of failures such as: monotonicity,
> favorite betrayal, later no help/harm, etc.  In other words, I believe
> these other fairness criteria too often get more attention than they
> deserve (under current money-dominant political conditions).
>
> Here's a final note to those of you who are in the academic world.  I
> invite you to reproduce these results and add other methods and publish
> the results.  Why is this important?  The improvement of Wikipedia
> articles about better vote-counting methods requires peer-reviewed
> academic articles that meaningfully compare vote-counting methods in
> ways that go beyond Wikipedia's simple pass/fail checklist in the
> "Comparison of electoral systems" article.  Alas, too often that
> over-simplistic checklist gets cited as evidence of support for inferior
> vote-counting methods.
>
> In other words, flagging vote-counting failures as either zero or
> non-zero is primitive.  We need more measurements of HOW OFTEN the
> non-zero failures occur.
>
> Richard Fobes
> "The VoteFair guy"
> ----
> Election-Methods mailing list - see https://electorama.com/em for list info
>