[Election-Methods] "Correlation" revisited

Thu Oct 18 20:53:49 PDT 2007

For his "Correlated Instant Borda Runoff" method, Ken Kuhlman introduced 
the concept of candidate correlation.  I proposed two measures of 
correlation: mean Borda distance and third-order correlation.  Recall 
that the latter is defined as follows:

corr(X, Y; Z) = (% of ballots on which Z is not voted between X and Y)
corr(X, Y) = min(corr(X, Y; Z) for Z in candidates)

For example, given the ballots based on a linear spectrum with 
candidates {A=0.1, B=0.3, C=0.5, D=0.7, E=0.9}, i.e.,

   20: A>B>C>D>E
   10: B>A>C>D>E
   10: B>C>A>D>E
   10: C>B>D>A>E
   10: C>D>B>E>A
   10: D>C>E>B>A
   10: D>E>C>B>A
   20: E>D>C>B>A

We have:

   corr(A, B; C) = 90%
   corr(A, B; D) = 90%
   corr(A, B; E) = 90%
   corr(A, B) = 90%

   corr(A, C; B) = 20%
   corr(A, C; D) = 80%
   corr(A, C; E) = 80%
   corr(A, C) = 20%

   corr(A, D; B) = 30%
   corr(A, D; C) = 30%
   corr(A, D; E) = 70%
   corr(A, D) = 30%

   corr(A, E; B) = 40%
   corr(A, E; C) = 40%
   corr(A, E; D) = 40%
   corr(A, E) = 40%

   corr(B, C; A) = 90%
   corr(B, C; D) = 90%
   corr(B, C; E) = 90%
   corr(B, C) = 90%

   corr(B, D; A) = 80%
   corr(B, D; C) = 20%
   corr(B, D; E) = 80%
   corr(B, D) = 20%

   corr(B, E; A) = 70%
   corr(B, E; C) = 30%
   corr(B, E; D) = 30%
   corr(B, E) = 30%

   corr(C, D; A) = 90%
   corr(C, D; B) = 90%
   corr(C, D; E) = 90%
   corr(C, D) = 90%

   corr(C, E; A) = 80%
   corr(C, E; B) = 80%
   corr(C, E; D) = 20%
   corr(C, E) = 20%

   corr(D, E; A) = 90%
   corr(D, E; B) = 90%
   corr(D, E; C) = 90%
   corr(D, E) = 90%

Note that pairs of consecutive candidates are highly correlated, at 
90%.  This makes sense.  However, the rest of the correlations are 
completely counterintuitive:

   corr(A, E)
   > corr(A, D) = corr(B, E)
   > corr(A, C) = corr(B, D) = corr(C, E)

That is, the further apart candidates are on the political spectrum, the 
more correlated they are!  This is clearly flawed, so we need a better 
definition of correlation.

So, I will now propose a new method of ranking correlations.  I have not 
defined an absolute measure of correlation, which shouldn't really 
matter that much because the only correlation-based methods proposed so 
far only refer to "most correlated" and "least correlated" candidates.

*** THIRD-ORDER CORRELATION BEATPATH RANKING (TOCBR) ***

Step 1: Count the frequency of each possible ordering of the 3 candidates.

   A>B>C: 20
   A>B>D: 20
   A>B>E: 20
   A>C>D: 30
   A>C>E: 30
   A>D>E: 40
   B>A>C: 10
   B>A>D: 20
   B>A>E: 30
   B>C>A: 10
   B>C>D: 40
   B>C>E: 40
   B>D>A: 10
   B>D>E: 50
   B>E>A: 10
   C>A>D: 10
   C>A>E: 20
   C>B>A: 60
   C>B>D: 10
   C>B>E: 20
   C>D>A: 20
   C>D>B: 10
   C>D>E: 60
   C>E>A: 20
   C>E>B: 10
   D>A>E: 10
   D>B>A: 50
   D>B>E: 10
   D>C>A: 40
   D>C>B: 40
   D>C>E: 10
   D>E>A: 30
   D>E>B: 20
   D>E>C: 10
   E>B>A: 40
   E>C>A: 30
   E>C>B: 30
   E>D>A: 20
   E>D>B: 20
   E>D>C: 20

Step 2: Based on the triples from step 1, compute a pairwise matrix of 
pairs of candidates:

* X>Y>Z -> add 1 vote for (X=Y) > (X=Z) and (Y=Z) > (X=Z)
* X>Y=Z -> add 1 vote for (Y=Z) > (X=Y) and (Y=Z) > (X=Z)
* X=Y>Z -> add 1 vote for (X=Y) > (X=Z) and (X=Y) > (Y=Z)
* X=Y=Z -> ignore

        A=B A=C A=D A=E B=C B=D B=E C=D C=E D=E
   A=B:  0, 80, 70, 60, 10, 20, 30,  0,  0,  0
   A=C: 10,  0, 70, 60, 10,  0,  0, 10, 20,  0
   A=D: 10, 20,  0, 60,  0, 20,  0, 10,  0, 10
   A=E: 10, 20, 30,  0,  0,  0, 30,  0, 20, 10
   B=C: 10, 80,  0,  0,  0, 80, 70, 10, 20,  0
   B=D: 10,  0, 70,  0, 10,  0, 70, 10,  0, 10
   B=E: 10,  0,  0, 60, 10, 20,  0,  0, 20, 10
   C=D:  0, 20, 70,  0, 10, 80,  0,  0, 80, 10
   C=E:  0, 20,  0, 60, 10,  0, 70, 10,  0, 10
   D=E:  0,  0, 30, 60,  0, 20, 70, 10, 80,  0

Step 3: Find the strongest beatpaths between each pair of candidate 
pairs.  Rank the candidate pairs according to (# of beatpath wins) - (# 
of beatpath losses).

Using Margins in the example, we get a ranking of:

   1. B=C and C=D (6)
   3. A=B and D=E (5)
   5. A=C, B=D, and C=E (-1)
   8. A=D and B=E (-5)
   10. A=E (-9)

The only counterintuive results are B being more correlated to C than to 
A, and D being more correlated to C than E -- distinctions appearing 
between identical distances on the political spectrum.  But it's far 
more reasonable than the original definition of third-order correlation.