# [EM] Simple voting methods

Colin Champion colin.champion at routemaster.app
Tue Oct 26 14:10:04 PDT 2021

```This post summarises an empirical evaluation of some simple voting
methods, seeking to fill gaps in the evaluations I've seen. Rather than
present results here (where the formatting might get mangled) I've put
them on the web at http://www.masterlyinactivity.com/condorcet.html.
Results are in good faith but the absence of bugs is not guaranteed.

The generative process is a spatial model. Voters and candidates are
drawn from the same mixture of 3 two-dimensional gaussian components.
This allows quite a high degree of asymmetry, yet Condorcet cycles arise
in only 0.7% of elections. I suspect that this is an upper bound on
their frequency under sincere voting in real life. There are a million
elections each with 10001 voters and 9 candidates.

'sptp' is the sum of first and second preferences; 'av'=IRV=Hare; 'mj'
is an idealised highest median method in which a voter's rating of a
candidate is minus the Euclidean distance between them. This is
unrealistically favourable to MJ since it eliminates the requirement on
voters to quantise their ratings, which they will do inconsistently.
MJ's performance is estimated from 10x fewer elections than other methods.

'condorcet' is the method which is considered to give the wrong result
whenever there is a cycle. 'condorcet+X' uses X as a cycle-breaker, so
'condorcet+borda' = Black.

Tiebreaks may exist in 'full' and 'restricted' forms. If Llull's method
has a full Borda tiebreak, then a Borda ranking is computed on the
entire field, and the winner is the Llull winner who comes highest in
the Borda ranking. A restricted Borda tiebreak is computed by applying
the Borda count to the Llull tie set – that is, ballots are compressed
by squeezing out candidates not belonging to the set, and Borda's method
applied. (Llull=Copeland.) SPTP can only be full and an AV tiebreak is
generally understood to be restricted.

I don't place much faith in Darlington's model, which draws voters from
a symmetric distribution. This relies on small sample sizes to prise
apart the Condorcet methods. There is no reason to suppose that the
behaviour seen in small samples (which is liable to be influenced by
numerical ties) will be representative of large electorates.

I find it to be very rare for Llull's method to produce a unique winner
when there are Condorcet cycles (<3% of the time). Perhaps I should be
suspicious of this. Smith's method never gives a unique winner in the
presence of cycles.

There are two metrics. The first is the percentage of elections in which
each method elects the rightful winner, defined as the candidate whose
mean distance from voters is minimised. The second is mean loss. If a
voting method elects a candidate whose mean distance from voters is y,
and if the minimum mean distance from voters over candidates is x, then
the loss is y-x. I consider losses to be intrinsically meaningless, but
to provide a sounder basis for comparison than percentage correctness.
The average distance from a voter to the best candidate in any election
is around 1.7, so the differences in performance between competitive
methods are very small.

Table 5 (measuring percentage correct for a particular form of tactical
voting) is the sole case in which Condorcet+AV (ie. Condorcet/Hare)
outperforms Minimax, and when we look at the corresponding figures under
a loss metric the tables are turned: evidently Condorcet/Hare makes
fewer but larger mistakes than Minimax, and is weaker overall.

The best result in each table has a solid underline; all other results
which are better than Minimax have broken underlines. There are various
interesting points of detail, but overall the striking feature is the
consistently good performance of Minimax. Smith+BordaR, Smith+MinimaxR
and and Smith+MinimaxF run it close.

The 8 tables apply 2 metrics to sincere voting and to 3 forms of
tactical voting. In all 3 types of tactical voting voters whose sincere
first preference is for a certain candidate (c0) place another candidate
(c1) at an insincere position in their lists. Type 1 (compromising)
moves c1 to the top; type 2 (false cycles) puts him second; and type 3
(burying) puts him at the bottom.

The number of candidates is reduced to 5 for tactical voting. Each of
the 20 (c0,c1) pairs corresponds to an attempt at subversion, which is
considered to be successful if the winning candidate is closer to c0,
and further from the average voter, than the winner under sincere
voting. The result attributed to each method is the worst amongst its
sincere result and all successful subversions of the given type. I have
not considered the effect of simultaneous tactical voting by supporters
of more than one candidate, either in the same or in different ways.

A choice between voting methods should not be based purely on figures
such as those given. An important factor is a method's sensitivity to
'irrelevant' candidates (eg. through strategic nomination). This is not
tested here, but the Borda count is notoriously weak, and I would be
inclined to reject any use of it (even as a tiebreak) in consequence.

Another consideration is the pragmatic acceptability of the various
methods (with simplicity the main aspect). For my part, I would be
reluctant to accept any use of the Smith set on account of its lack of
conceptual simplicity.

The last additional consideration is vulnerability to forms of tactical
voting not studied here. I don't know whether looking at more
complicated forms would change anything.

Taking everything together, there really seems to be nothing better than
Minimax. My instinctive preference beforehand was for Llull's method; in
particular, I found Darlington's reasons for excluding it to be
unpersuasive. In fact Llull+SPTP performs reasonably well (and can't be
accused of trading on the strength of its tiebreak), but it doesn't
offer any advantage over pure Minimax.

And my final conclusion is that it's ridiculously easy to perform an
evaluation of this sort. 300 lines of code isn't very much. I would
expect dozens of such evaluations to have been performed so that correct
ones would confirm each other while buggy ones stood out. Have they?- I
haven't seen them. For what it's worth, when there's an overlap my
results are generally consistent with Darlington's and with a much older
evaluation by Chamberlin and Cohen (but the overlap is small).

My software is on the same web page as the results - not because I think
it's of interest, but because I think reproducible results are of more
value than unreproducible ones.

CJC
```