# [EM] Calculated failure/success rates using randomized ballots and candidates

VoteFair electionmethods at votefair.org
Sun Jan 3 23:15:32 PST 2021

```I've written a C++ program that generates randomized ballots, feeds
these ballots to a separate program that calculates a winner according
to various vote-counting methods, compares the results, and calculates
failure/success rates.

The program does two kinds of tests:

* IIA: Tests successes/failures according to the Independence of
Irrelevant Alternatives (IIA) criterion.  Specifically it calculates
which candidate would win with all the candidates, and then it removes
each of the non-winning candidates, one by one, to test whether a
different candidate would win.  If any of the comparisons yield a
different result -- such as a 6-candidate contest giving a different
winner compared to the 7-candidate contest that uses the same ballots
(with one candidate omitted in the 6-candidate contest) -- then that's
counted as one failure.

* "Agree/Disagree": Tests how often one counting method yields a winner
who is the same as, or different from, the winner according to another
vote-counting method.

ASSUMPTIONS/CONDITIONS:

The randomized ballots assume that the voters are expressing their
sincere preferences, without any tactical voting, and without ranking or
rating two candidates at the same preference level.

For both tests, when a tie occurs for the winner of either test case,
the tied case is ignored.  This means ties do not count as a failure or
a success.

For my purposes I've used 4,000 randomized cases per test, and each test
uses 17 ballots.  Unless otherwise specified, the test (or full test in
the case of IIA) uses 7 candidates.

IIA RESULTS:

The Independence of Irrelevant Alternatives (IIA) success rate of the
Condorcet-Kemeny method is 79%, which is a failure rate of 21%.

The IIA success rates of the following methods are all about 1% or 2%,
which is a failure rate of 99% or 98%:

* IRV: Instant Runoff Voting

* IPE: Instant Pairwise Elimination (described in ElectoWiki)

* IRMPL: Instant Runoff Minus Pairwise Loser, which uses PLE (see below)
as a safety net under IRV.

* STAR: Score Then Automatic Runoff

Another method, PLE, which is an abbreviation for Pairwise Loser
Elimination, has a calculated 100% success rate, which is zero failures.
This method successively eliminates the Condorcet loser, one round at
a time, and stops with a tie when it encounters a Condorcet
(rock-paper-scissors-like) cycle.  This perfect success rate occurs
because tied cases are ignored, which leaves only cases that have no
cycles at any level, which means the method finds the Condorcet winner
in both the 7-candidate case and the 6-candidate case.

Conclusion: Methods that eliminate one candidate at a time frequently
fail the Independence of Irrelevant Alternatives (IIA) criterion.  In
contrast, the Condorcet-Kemeny method, which uses all the pairwise
counts (not just the biggest or smallest pairwise counts or differences
between pairwise counts), yields a dramatically better IIA success rate.

Important: In real elections the success rates would be higher -- there
would be fewer failures -- because real elections have meaningful
differences between candidates.  Remember that this software randomizes
the ballots without any bias.  This means that almost all the test cases
are "semi-balanced" or "sitting on the fence" (or maybe "finding the
highest sand dune rather than finding the highest mountain") kinds of cases.

Advocates of STAR voting may claim these numbers are not meaningful for
STAR voting because STAR voting uses Score ballots, not ranked ballots.
(On Score ballots the gap between preference levels is significant.)
Yet fans of STAR voting also claim that its use of a top-two runoff
discourages tactical voting, particularly the tactic of favoring the use
of high and low preference levels, and avoiding the use of middle
preference levels.  Keeping in mind that these tests assume the voters
are voting sincerely, I believe these two claims are contradictory.
(Feedback on this or any other part of this message is welcome.)

AGREE/DISAGREE RESULTS:

Below are the results from the "Agree/Disagree" test, which in my tests
compare the Condorcet-Kemeny winner with the winner from each of the
indicated vote-counting methods.  Specifically the "agree" percentages
refer to matches with the Condorcet-Kemeny winner, and the "disagree"
percentages apply when the method identifies a different winner.  (By
definition the Condorcet-Kemeny method would yield 100% agreement.)

About the "ties" numbers specified in parentheses:  They are counts out
of 4,000 cases, not percentages.  These tied cases are not counted in
either the success or failure percentages.

Note that when there are only two candidates, all the methods always agree.

number of candidates: 2
IPE agree/disagree: 100%  0%  (0 ties)
IRMPL agree/disagree: 100.0%  0%  (0 ties)
STAR agree/disagree: 100.0%  0%  (0 ties)
IRV agree/disagree: 100.0%  0%  (0 ties)
PLE agree/disagree: 100.0%  0%  (0 ties)

number of candidates: 3
IPE agree/disagree: 95.1%  4.8%  (0 ties)
IRMPL agree/disagree: 95.7%  4.2%  (0 ties)
STAR agree/disagree: 95.2%  4.7%  (296 ties)
IRV agree/disagree: 93.0%  6.9%  (643 ties)
PLE agree/disagree: 100.0%  0%  (286 ties)

number of candidates: 4
IPE agree/disagree: 92.5%  7.4%  (59 ties)
IRMPL agree/disagree: 91.3%  8.6%  (9 ties)
STAR agree/disagree: 94.8%  5.1%  (440 ties)
IRV agree/disagree: 84.0%  15.9%  (1582 ties)
PLE agree/disagree: 100.0%  0%  (943 ties)

number of candidates: 5
IPE agree/disagree: 92.3%  7.6%  (103 ties)
IRMPL agree/disagree: 88.9%  11.0%  (14 ties)
STAR agree/disagree: 93.6%  6.3%  (435 ties)
IRV agree/disagree: 77.7%  22.2%  (2485 ties)
PLE agree/disagree: 100.0%  0%  (1724 ties)

number of candidates: 6
IPE agree/disagree: 90.6%  9.3%  (172 ties)
IRMPL agree/disagree: 84.9%  15.0%  (27 ties)
STAR agree/disagree: 91.4%  8.5%  (420 ties)
IRV agree/disagree: 69.7%  30.2%  (3203 ties)
PLE agree/disagree: 100.0%  0%  (2513 ties)

number of candidates: 7
IPE agree/disagree: 88.7%  11.2%  (219 ties)
IRMPL agree/disagree: 81.6%  18.3%  (67 ties)
STAR agree/disagree: 89.4%  10.5%  (441 ties)
IRV agree/disagree: 59.5%  40.4%  (3517 ties)
PLE agree/disagree: 100.0%  0%  (3063 ties)

As the number of candidates increases, the methods more often disagree
with the Condorcet-Kemeny method.  So the bottom numbers, where there
are 7 candidates, are the most revealing.

The bottom numbers show that IRV -- Instant Runoff Voting -- is the
worst of these methods.  It agrees in about 60 percent of the non-tie
cases.  The other three methods -- IPE, IRMPL, and STAR -- have similar
success rates of about 80 or 90 percent.

Of course this result -- that IRV is not a good vote-counting method --
is not surprising.  Yet it's nice to have numeric confirmation.

Here are links to the two programs used in these tests:

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/generate_random_ballots.cpp

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/votefair_ranking.cpp

GOAL:

My hope is that this software takes us a step closer to yielding numbers
to better characterize HOW OFTEN each vote-counting method passes or
fails each of the "fairness" criteria, the ones that are currently
flagged as "yes" or "no" in this comparison table:

https://en.wikipedia.org/wiki/Comparison_of_electoral_systems#Compliance_of_selected_single-winner_methods

I realize the numbers calculated by my software are not suitable as
estimates for real-life elections -- because randomized ballots and
randomized candidates do not match real-life elections.  Yet these
calculated numbers provide a peek at ways to compare methods more
meaningfully than just flagging methods as pass-or-fail.

THANKS:

If you find any software bugs, please tell me, either here or on GitHub.

Feedback is welcome.  That's why I've posted these results here.

Richard Fobes
```