<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <font face="Helvetica, Arial, sans-serif">There was a discussion

      last year about the merits of enforced truncation of ballots. I

      modified my evaluation code to allow me to measure the effects

      under a spatial model. The results for Condorcet methods and IRV

      were as you'd expect, but the results for the Borda count were

      surprising. <br>

         The first oddity is that with aggressive truncation (eg. from

      10 to 4 candidates), the Borda count may outperform Condorcet

      methods. In this particular case (10 candidates truncated to 4) I

      find that IRV is 47% correct, the Borda count 83% correct, and

      Condorcet methods have lower and upper bounds of 70% and 76%, with

      minimax coming in at 72%. (The model is bivariate Gaussian.) I

      don't think this is hard to explain: a truncated Condorcet method

      isn't a true Condorcet method, doesn't satisfy the conditions of

      the median voter theorem, and need not work particularly well. <br>

      <br>

      The second oddity is that the Borda count does not behave

      monotonically. When 10 candidates are truncated to 6 rather than

      4, the Borda count returns 80% accuracy - i.e. it does *less* well

      than when subjected to more drastic truncation. <br>

         This is surprising but not astonishing. The Borda count is a

      positional method whose coefficients have no claim to optimality;

      the truncated Borda count is a method with different coefficients;

      and changing the degree of truncation, even in an

      information-destroying way, is not guaranteed to affect

      performance in any particular direction. <br>

      <br>

      It struck me that it should be possible to improve on the Borda

      coefficients. A natural way to do so is to look at Euclidean

      distances. We can compute (by numerical integration) the average

      distance of a voter from the candidate she ranks top, second,

      etc., and if we use these averages as weights, then the average

      score obtained by a candidate should be an approximation to his

      average distance from voters; so we elect the candidate with the

      lowest score. For a more conventional statistic we can transform

      the weights so as to be justified in electing the candidate with

      the highest score. I will call this the Euclidean Borda count. <br>

         The results aren't very good. In one dimension the EBC scores

      *worse* than the standard Borda count whereas in two it performs

      slightly better; but it still isn't monotonic under truncation.<br>

      <br>

      At this point I abandoned all subtlety in favour of brute force. I

      simply hillclimbed to maximise the accuracy of the Borda count

      with parametric weights. I will call the resulting method the

      Spatial Borda Count. To be precise, I simulated a large number of

      elections under the model, and maximised the total number of

      occasions on which candidate A outscored candidate B when A was

      actually better than B. This is more symmetric (but further away

      from the intended use) than maximising the number of times on

      which *the best* candidate outscored each rival individually, or

      outscored all rivals together. <br>

         We can view the hillclimb as minimising the sum of terms

      sgn(d[k,i]-d[k,j]) * </font><font face="Helvetica, Arial,

      sans-serif"><font face="Helvetica, Arial, sans-serif">sgn(scr[k,i]-scr[k,j])

        where d[k,i] is the distance of the ith candidate from the

        origin in election k (the voters being centred on the origin)

        and </font></font><font face="Helvetica, Arial, sans-serif"><font

        face="Helvetica, Arial, sans-serif"><font face="Helvetica,

          Arial, sans-serif"><font face="Helvetica, Arial, sans-serif">scr[k,i]

            is the candidate's score.<br>

               This can be simplified to the sum of terms </font></font></font></font><font

      face="Helvetica, Arial, sans-serif"><font face="Helvetica, Arial,

        sans-serif"><font face="Helvetica, Arial, sans-serif"><font

            face="Helvetica, Arial, sans-serif"><font face="Helvetica,

              Arial, sans-serif">sgn( (d[k,i]-d[k,j]) * </font><font

              face="Helvetica, Arial, sans-serif"><font face="Helvetica,

                Arial, sans-serif">(scr[k,i]-scr[k,j]) ), and if we

                adopt the approximation sgn(x)=x, we end up looking for

                scores which are maximally correlated with distances;

                and these are obviously obtained by letting the scores

                be proportional to distances. So the EBC drops out as a

                natural approximation to the SBC.<br>

                   The attachment is a graph showing the weights against

                ballot position for various methods (normalising the

                weights for comparability and so that high rather than

                low scores are best). The lowest position on a ballot is

                at the left and the top at the right. <br>

                   I measure the accuracy of the Borda count (on

                untruncated ballots) as 85% whereas the SBC is 91%. The

                number of errors is thus reduced by 40% by a minor

                change in coefficients. (The improvement is less

                impressive in a single dimension.)<br>

                   When I look at truncation, I find to my relief that

                the SBC is monotonic. For sufficient levels of

                truncation the Borda count does better than the SBC; but

                if we wanted to optimise performance for truncated

                ballots, we’d actually generate a different set of

                weights.<br>

                The results are as follows, using a bivariate Gaussian

                model and truncating 10 candidates to k:<br>

                <br>

                  k:irv :bord: sbc  :clow:minx:cup<br>

                10: 51 : 85 : 91.2 : 99 : 99 : 99<br>

                  8: 51 : 85 : 91.0 : 99 : 99 : 99<br>

                  7: 51 : 84 : 89.6 : 99 : 99 : 99<br>

                  6: 51 : 80 : 86.5 : 96 : 97 : 97<br>

                  5: 50 : 81 : 86.0 : 88 : 89 : 90<br>

                  4: 47 : 83 : 83.7 : 70 : 72 : 76<br>

                  3: 41 : 66 : 64.7 : 50 : 52 : 59</font></font></font></font></font></font><br>

    <font face="Helvetica, Arial, sans-serif"><font face="Helvetica,

        Arial, sans-serif"><font face="Helvetica, Arial, sans-serif"><font

            face="Helvetica, Arial, sans-serif"><font face="Helvetica,

              Arial, sans-serif"><font face="Helvetica, Arial,

                sans-serif"><font face="Helvetica, Arial, sans-serif"><font

                    face="Helvetica, Arial, sans-serif"><font

                      face="Helvetica, Arial, sans-serif"><font

                        face="Helvetica, Arial, sans-serif"><font

                          face="Helvetica, Arial, sans-serif"><font

                            face="Helvetica, Arial, sans-serif">  2: 35

                            : 44 : 43.5 : 36 : 38 : 43<br>

                          </font></font></font></font></font></font></font></font></font></font></font></font><font

      face="Helvetica, Arial, sans-serif"><font face="Helvetica, Arial,

        sans-serif"><font face="Helvetica, Arial, sans-serif"><font

            face="Helvetica, Arial, sans-serif"><font face="Helvetica,

              Arial, sans-serif"><font face="Helvetica, Arial,

                sans-serif">  1: 30 : 30 : 30.0 : 30 : 30 : 30<br>

                <br>

                   CJC<br>

              </font></font></font></font></font></font>

  </body>

</html>