[EM] The Borda count with spatial data and truncation
Colin Champion
colin.champion at routemaster.app
Wed Mar 8 01:49:03 PST 2023
There was a discussion last year about the merits of enforced truncation
of ballots. I modified my evaluation code to allow me to measure the
effects under a spatial model. The results for Condorcet methods and IRV
were as you'd expect, but the results for the Borda count were surprising.
The first oddity is that with aggressive truncation (eg. from 10 to
4 candidates), the Borda count may outperform Condorcet methods. In this
particular case (10 candidates truncated to 4) I find that IRV is 47%
correct, the Borda count 83% correct, and Condorcet methods have lower
and upper bounds of 70% and 76%, with minimax coming in at 72%. (The
model is bivariate Gaussian.) I don't think this is hard to explain: a
truncated Condorcet method isn't a true Condorcet method, doesn't
satisfy the conditions of the median voter theorem, and need not work
particularly well.
The second oddity is that the Borda count does not behave monotonically.
When 10 candidates are truncated to 6 rather than 4, the Borda count
returns 80% accuracy - i.e. it does *less* well than when subjected to
more drastic truncation.
This is surprising but not astonishing. The Borda count is a
positional method whose coefficients have no claim to optimality; the
truncated Borda count is a method with different coefficients; and
changing the degree of truncation, even in an information-destroying
way, is not guaranteed to affect performance in any particular direction.
It struck me that it should be possible to improve on the Borda
coefficients. A natural way to do so is to look at Euclidean distances.
We can compute (by numerical integration) the average distance of a
voter from the candidate she ranks top, second, etc., and if we use
these averages as weights, then the average score obtained by a
candidate should be an approximation to his average distance from
voters; so we elect the candidate with the lowest score. For a more
conventional statistic we can transform the weights so as to be
justified in electing the candidate with the highest score. I will call
this the Euclidean Borda count.
The results aren't very good. In one dimension the EBC scores
*worse* than the standard Borda count whereas in two it performs
slightly better; but it still isn't monotonic under truncation.
At this point I abandoned all subtlety in favour of brute force. I
simply hillclimbed to maximise the accuracy of the Borda count with
parametric weights. I will call the resulting method the Spatial Borda
Count. To be precise, I simulated a large number of elections under the
model, and maximised the total number of occasions on which candidate A
outscored candidate B when A was actually better than B. This is more
symmetric (but further away from the intended use) than maximising the
number of times on which *the best* candidate outscored each rival
individually, or outscored all rivals together.
We can view the hillclimb as minimising the sum of terms
sgn(d[k,i]-d[k,j]) * sgn(scr[k,i]-scr[k,j]) where d[k,i] is the distance
of the ith candidate from the origin in election k (the voters being
centred on the origin) and scr[k,i] is the candidate's score.
This can be simplified to the sum of terms sgn( (d[k,i]-d[k,j]) *
(scr[k,i]-scr[k,j]) ), and if we adopt the approximation sgn(x)=x, we
end up looking for scores which are maximally correlated with distances;
and these are obviously obtained by letting the scores be proportional
to distances. So the EBC drops out as a natural approximation to the SBC.
The attachment is a graph showing the weights against ballot
position for various methods (normalising the weights for comparability
and so that high rather than low scores are best). The lowest position
on a ballot is at the left and the top at the right.
I measure the accuracy of the Borda count (on untruncated ballots)
as 85% whereas the SBC is 91%. The number of errors is thus reduced by
40% by a minor change in coefficients. (The improvement is less
impressive in a single dimension.)
When I look at truncation, I find to my relief that the SBC is
monotonic. For sufficient levels of truncation the Borda count does
better than the SBC; but if we wanted to optimise performance for
truncated ballots, we’d actually generate a different set of weights.
The results are as follows, using a bivariate Gaussian model and
truncating 10 candidates to k:
k:irv :bord: sbc :clow:minx:cup
10: 51 : 85 : 91.2 : 99 : 99 : 99
8: 51 : 85 : 91.0 : 99 : 99 : 99
7: 51 : 84 : 89.6 : 99 : 99 : 99
6: 51 : 80 : 86.5 : 96 : 97 : 97
5: 50 : 81 : 86.0 : 88 : 89 : 90
4: 47 : 83 : 83.7 : 70 : 72 : 76
3: 41 : 66 : 64.7 : 50 : 52 : 59
2: 35 : 44 : 43.5 : 36 : 38 : 43
1: 30 : 30 : 30.0 : 30 : 30 : 30
CJC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20230308/3f004162/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbc.svg
Type: image/svg+xml
Size: 14174 bytes
Desc: not available
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20230308/3f004162/attachment-0001.svg>
More information about the Election-Methods
mailing list