[EM] The Borda count with spatial data and truncation

Wed Mar 8 01:49:03 PST 2023

There was a discussion last year about the merits of enforced truncation 
of ballots. I modified my evaluation code to allow me to measure the 
effects under a spatial model. The results for Condorcet methods and IRV 
were as you'd expect, but the results for the Borda count were surprising.
    The first oddity is that with aggressive truncation (eg. from 10 to 
4 candidates), the Borda count may outperform Condorcet methods. In this 
particular case (10 candidates truncated to 4) I find that IRV is 47% 
correct, the Borda count 83% correct, and Condorcet methods have lower 
and upper bounds of 70% and 76%, with minimax coming in at 72%. (The 
model is bivariate Gaussian.) I don't think this is hard to explain: a 
truncated Condorcet method isn't a true Condorcet method, doesn't 
satisfy the conditions of the median voter theorem, and need not work 
particularly well.

The second oddity is that the Borda count does not behave monotonically. 
When 10 candidates are truncated to 6 rather than 4, the Borda count 
returns 80% accuracy - i.e. it does *less* well than when subjected to 
more drastic truncation.
    This is surprising but not astonishing. The Borda count is a 
positional method whose coefficients have no claim to optimality; the 
truncated Borda count is a method with different coefficients; and 
changing the degree of truncation, even in an information-destroying 
way, is not guaranteed to affect performance in any particular direction.

It struck me that it should be possible to improve on the Borda 
coefficients. A natural way to do so is to look at Euclidean distances. 
We can compute (by numerical integration) the average distance of a 
voter from the candidate she ranks top, second, etc., and if we use 
these averages as weights, then the average score obtained by a 
candidate should be an approximation to his average distance from 
voters; so we elect the candidate with the lowest score. For a more 
conventional statistic we can transform the weights so as to be 
justified in electing the candidate with the highest score. I will call 
this the Euclidean Borda count.
    The results aren't very good. In one dimension the EBC scores 
*worse* than the standard Borda count whereas in two it performs 
slightly better; but it still isn't monotonic under truncation.

At this point I abandoned all subtlety in favour of brute force. I 
simply hillclimbed to maximise the accuracy of the Borda count with 
parametric weights. I will call the resulting method the Spatial Borda 
Count. To be precise, I simulated a large number of elections under the 
model, and maximised the total number of occasions on which candidate A 
outscored candidate B when A was actually better than B. This is more 
symmetric (but further away from the intended use) than maximising the 
number of times on which *the best* candidate outscored each rival 
individually, or outscored all rivals together.
    We can view the hillclimb as minimising the sum of terms 
sgn(d[k,i]-d[k,j]) * sgn(scr[k,i]-scr[k,j]) where d[k,i] is the distance 
of the ith candidate from the origin in election k (the voters being 
centred on the origin) and scr[k,i] is the candidate's score.
    This can be simplified to the sum of terms sgn( (d[k,i]-d[k,j]) * 
(scr[k,i]-scr[k,j]) ), and if we adopt the approximation sgn(x)=x, we 
end up looking for scores which are maximally correlated with distances; 
and these are obviously obtained by letting the scores be proportional 
to distances. So the EBC drops out as a natural approximation to the SBC.
    The attachment is a graph showing the weights against ballot 
position for various methods (normalising the weights for comparability 
and so that high rather than low scores are best). The lowest position 
on a ballot is at the left and the top at the right.
    I measure the accuracy of the Borda count (on untruncated ballots) 
as 85% whereas the SBC is 91%. The number of errors is thus reduced by 
40% by a minor change in coefficients. (The improvement is less 
impressive in a single dimension.)
    When I look at truncation, I find to my relief that the SBC is 
monotonic. For sufficient levels of truncation the Borda count does 
better than the SBC; but if we wanted to optimise performance for 
truncated ballots, we’d actually generate a different set of weights.
The results are as follows, using a bivariate Gaussian model and 
truncating 10 candidates to k:

   k:irv :bord: sbc  :clow:minx:cup
10: 51 : 85 : 91.2 : 99 : 99 : 99
   8: 51 : 85 : 91.0 : 99 : 99 : 99
   7: 51 : 84 : 89.6 : 99 : 99 : 99
   6: 51 : 80 : 86.5 : 96 : 97 : 97
   5: 50 : 81 : 86.0 : 88 : 89 : 90
   4: 47 : 83 : 83.7 : 70 : 72 : 76
   3: 41 : 66 : 64.7 : 50 : 52 : 59
   2: 35 : 44 : 43.5 : 36 : 38 : 43
   1: 30 : 30 : 30.0 : 30 : 30 : 30

    CJC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20230308/3f004162/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbc.svg
Type: image/svg+xml
Size: 14174 bytes
Desc: not available
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20230308/3f004162/attachment-0001.svg>