[EM] Calculated failure/success rates using randomized ballots and candidates

Sun Jan 3 23:15:32 PST 2021

I've written a C++ program that generates randomized ballots, feeds 
these ballots to a separate program that calculates a winner according 
to various vote-counting methods, compares the results, and calculates 
failure/success rates.

The program does two kinds of tests:

* IIA: Tests successes/failures according to the Independence of 
Irrelevant Alternatives (IIA) criterion.  Specifically it calculates 
which candidate would win with all the candidates, and then it removes 
each of the non-winning candidates, one by one, to test whether a 
different candidate would win.  If any of the comparisons yield a 
different result -- such as a 6-candidate contest giving a different 
winner compared to the 7-candidate contest that uses the same ballots 
(with one candidate omitted in the 6-candidate contest) -- then that's 
counted as one failure.

* "Agree/Disagree": Tests how often one counting method yields a winner 
who is the same as, or different from, the winner according to another 
vote-counting method.

ASSUMPTIONS/CONDITIONS:

The randomized ballots assume that the voters are expressing their 
sincere preferences, without any tactical voting, and without ranking or 
rating two candidates at the same preference level.

For both tests, when a tie occurs for the winner of either test case, 
the tied case is ignored.  This means ties do not count as a failure or 
a success.

For my purposes I've used 4,000 randomized cases per test, and each test 
uses 17 ballots.  Unless otherwise specified, the test (or full test in 
the case of IIA) uses 7 candidates.

IIA RESULTS:

The Independence of Irrelevant Alternatives (IIA) success rate of the 
Condorcet-Kemeny method is 79%, which is a failure rate of 21%.

The IIA success rates of the following methods are all about 1% or 2%, 
which is a failure rate of 99% or 98%:

* IRV: Instant Runoff Voting

* IPE: Instant Pairwise Elimination (described in ElectoWiki)

* IRMPL: Instant Runoff Minus Pairwise Loser, which uses PLE (see below) 
as a safety net under IRV.

* STAR: Score Then Automatic Runoff

Another method, PLE, which is an abbreviation for Pairwise Loser 
Elimination, has a calculated 100% success rate, which is zero failures. 
  This method successively eliminates the Condorcet loser, one round at 
a time, and stops with a tie when it encounters a Condorcet 
(rock-paper-scissors-like) cycle.  This perfect success rate occurs 
because tied cases are ignored, which leaves only cases that have no 
cycles at any level, which means the method finds the Condorcet winner 
in both the 7-candidate case and the 6-candidate case.

Conclusion: Methods that eliminate one candidate at a time frequently 
fail the Independence of Irrelevant Alternatives (IIA) criterion.  In 
contrast, the Condorcet-Kemeny method, which uses all the pairwise 
counts (not just the biggest or smallest pairwise counts or differences 
between pairwise counts), yields a dramatically better IIA success rate.

Important: In real elections the success rates would be higher -- there 
would be fewer failures -- because real elections have meaningful 
differences between candidates.  Remember that this software randomizes 
the ballots without any bias.  This means that almost all the test cases 
are "semi-balanced" or "sitting on the fence" (or maybe "finding the 
highest sand dune rather than finding the highest mountain") kinds of cases.

Advocates of STAR voting may claim these numbers are not meaningful for 
STAR voting because STAR voting uses Score ballots, not ranked ballots. 
(On Score ballots the gap between preference levels is significant.) 
Yet fans of STAR voting also claim that its use of a top-two runoff 
discourages tactical voting, particularly the tactic of favoring the use 
of high and low preference levels, and avoiding the use of middle 
preference levels.  Keeping in mind that these tests assume the voters 
are voting sincerely, I believe these two claims are contradictory. 
(Feedback on this or any other part of this message is welcome.)

AGREE/DISAGREE RESULTS:

Below are the results from the "Agree/Disagree" test, which in my tests 
compare the Condorcet-Kemeny winner with the winner from each of the 
indicated vote-counting methods.  Specifically the "agree" percentages 
refer to matches with the Condorcet-Kemeny winner, and the "disagree" 
percentages apply when the method identifies a different winner.  (By 
definition the Condorcet-Kemeny method would yield 100% agreement.)

About the "ties" numbers specified in parentheses:  They are counts out 
of 4,000 cases, not percentages.  These tied cases are not counted in 
either the success or failure percentages.

Note that when there are only two candidates, all the methods always agree.

number of candidates: 2
IPE agree/disagree: 100%  0%  (0 ties)
IRMPL agree/disagree: 100.0%  0%  (0 ties)
STAR agree/disagree: 100.0%  0%  (0 ties)
IRV agree/disagree: 100.0%  0%  (0 ties)
PLE agree/disagree: 100.0%  0%  (0 ties)

number of candidates: 3
IPE agree/disagree: 95.1%  4.8%  (0 ties)
IRMPL agree/disagree: 95.7%  4.2%  (0 ties)
STAR agree/disagree: 95.2%  4.7%  (296 ties)
IRV agree/disagree: 93.0%  6.9%  (643 ties)
PLE agree/disagree: 100.0%  0%  (286 ties)

number of candidates: 4
IPE agree/disagree: 92.5%  7.4%  (59 ties)
IRMPL agree/disagree: 91.3%  8.6%  (9 ties)
STAR agree/disagree: 94.8%  5.1%  (440 ties)
IRV agree/disagree: 84.0%  15.9%  (1582 ties)
PLE agree/disagree: 100.0%  0%  (943 ties)

number of candidates: 5
IPE agree/disagree: 92.3%  7.6%  (103 ties)
IRMPL agree/disagree: 88.9%  11.0%  (14 ties)
STAR agree/disagree: 93.6%  6.3%  (435 ties)
IRV agree/disagree: 77.7%  22.2%  (2485 ties)
PLE agree/disagree: 100.0%  0%  (1724 ties)

number of candidates: 6
IPE agree/disagree: 90.6%  9.3%  (172 ties)
IRMPL agree/disagree: 84.9%  15.0%  (27 ties)
STAR agree/disagree: 91.4%  8.5%  (420 ties)
IRV agree/disagree: 69.7%  30.2%  (3203 ties)
PLE agree/disagree: 100.0%  0%  (2513 ties)

number of candidates: 7
IPE agree/disagree: 88.7%  11.2%  (219 ties)
IRMPL agree/disagree: 81.6%  18.3%  (67 ties)
STAR agree/disagree: 89.4%  10.5%  (441 ties)
IRV agree/disagree: 59.5%  40.4%  (3517 ties)
PLE agree/disagree: 100.0%  0%  (3063 ties)

As the number of candidates increases, the methods more often disagree 
with the Condorcet-Kemeny method.  So the bottom numbers, where there 
are 7 candidates, are the most revealing.

The bottom numbers show that IRV -- Instant Runoff Voting -- is the 
worst of these methods.  It agrees in about 60 percent of the non-tie 
cases.  The other three methods -- IPE, IRMPL, and STAR -- have similar 
success rates of about 80 or 90 percent.

Of course this result -- that IRV is not a good vote-counting method -- 
is not surprising.  Yet it's nice to have numeric confirmation.

LINKS:

Here are links to the two programs used in these tests:

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/generate_random_ballots.cpp

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/votefair_ranking.cpp

GOAL:

My hope is that this software takes us a step closer to yielding numbers 
to better characterize HOW OFTEN each vote-counting method passes or 
fails each of the "fairness" criteria, the ones that are currently 
flagged as "yes" or "no" in this comparison table:

https://en.wikipedia.org/wiki/Comparison_of_electoral_systems#Compliance_of_selected_single-winner_methods

I realize the numbers calculated by my software are not suitable as 
estimates for real-life elections -- because randomized ballots and 
randomized candidates do not match real-life elections.  Yet these 
calculated numbers provide a peek at ways to compare methods more 
meaningfully than just flagging methods as pass-or-fail.

THANKS:

If you find any software bugs, please tell me, either here or on GitHub.

Feedback is welcome.  That's why I've posted these results here.

Richard Fobes