[Election-Methods] "Correlation" revisited
Dan Bishop
danbishop04 at gmail.com
Thu Oct 18 20:53:49 PDT 2007
For his "Correlated Instant Borda Runoff" method, Ken Kuhlman introduced
the concept of candidate correlation. I proposed two measures of
correlation: mean Borda distance and third-order correlation. Recall
that the latter is defined as follows:
corr(X, Y; Z) = (% of ballots on which Z is not voted between X and Y)
corr(X, Y) = min(corr(X, Y; Z) for Z in candidates)
For example, given the ballots based on a linear spectrum with
candidates {A=0.1, B=0.3, C=0.5, D=0.7, E=0.9}, i.e.,
20: A>B>C>D>E
10: B>A>C>D>E
10: B>C>A>D>E
10: C>B>D>A>E
10: C>D>B>E>A
10: D>C>E>B>A
10: D>E>C>B>A
20: E>D>C>B>A
We have:
corr(A, B; C) = 90%
corr(A, B; D) = 90%
corr(A, B; E) = 90%
corr(A, B) = 90%
corr(A, C; B) = 20%
corr(A, C; D) = 80%
corr(A, C; E) = 80%
corr(A, C) = 20%
corr(A, D; B) = 30%
corr(A, D; C) = 30%
corr(A, D; E) = 70%
corr(A, D) = 30%
corr(A, E; B) = 40%
corr(A, E; C) = 40%
corr(A, E; D) = 40%
corr(A, E) = 40%
corr(B, C; A) = 90%
corr(B, C; D) = 90%
corr(B, C; E) = 90%
corr(B, C) = 90%
corr(B, D; A) = 80%
corr(B, D; C) = 20%
corr(B, D; E) = 80%
corr(B, D) = 20%
corr(B, E; A) = 70%
corr(B, E; C) = 30%
corr(B, E; D) = 30%
corr(B, E) = 30%
corr(C, D; A) = 90%
corr(C, D; B) = 90%
corr(C, D; E) = 90%
corr(C, D) = 90%
corr(C, E; A) = 80%
corr(C, E; B) = 80%
corr(C, E; D) = 20%
corr(C, E) = 20%
corr(D, E; A) = 90%
corr(D, E; B) = 90%
corr(D, E; C) = 90%
corr(D, E) = 90%
Note that pairs of consecutive candidates are highly correlated, at
90%. This makes sense. However, the rest of the correlations are
completely counterintuitive:
corr(A, E)
> corr(A, D) = corr(B, E)
> corr(A, C) = corr(B, D) = corr(C, E)
That is, the further apart candidates are on the political spectrum, the
more correlated they are! This is clearly flawed, so we need a better
definition of correlation.
So, I will now propose a new method of ranking correlations. I have not
defined an absolute measure of correlation, which shouldn't really
matter that much because the only correlation-based methods proposed so
far only refer to "most correlated" and "least correlated" candidates.
*** THIRD-ORDER CORRELATION BEATPATH RANKING (TOCBR) ***
Step 1: Count the frequency of each possible ordering of the 3 candidates.
A>B>C: 20
A>B>D: 20
A>B>E: 20
A>C>D: 30
A>C>E: 30
A>D>E: 40
B>A>C: 10
B>A>D: 20
B>A>E: 30
B>C>A: 10
B>C>D: 40
B>C>E: 40
B>D>A: 10
B>D>E: 50
B>E>A: 10
C>A>D: 10
C>A>E: 20
C>B>A: 60
C>B>D: 10
C>B>E: 20
C>D>A: 20
C>D>B: 10
C>D>E: 60
C>E>A: 20
C>E>B: 10
D>A>E: 10
D>B>A: 50
D>B>E: 10
D>C>A: 40
D>C>B: 40
D>C>E: 10
D>E>A: 30
D>E>B: 20
D>E>C: 10
E>B>A: 40
E>C>A: 30
E>C>B: 30
E>D>A: 20
E>D>B: 20
E>D>C: 20
Step 2: Based on the triples from step 1, compute a pairwise matrix of
pairs of candidates:
* X>Y>Z -> add 1 vote for (X=Y) > (X=Z) and (Y=Z) > (X=Z)
* X>Y=Z -> add 1 vote for (Y=Z) > (X=Y) and (Y=Z) > (X=Z)
* X=Y>Z -> add 1 vote for (X=Y) > (X=Z) and (X=Y) > (Y=Z)
* X=Y=Z -> ignore
A=B A=C A=D A=E B=C B=D B=E C=D C=E D=E
A=B: 0, 80, 70, 60, 10, 20, 30, 0, 0, 0
A=C: 10, 0, 70, 60, 10, 0, 0, 10, 20, 0
A=D: 10, 20, 0, 60, 0, 20, 0, 10, 0, 10
A=E: 10, 20, 30, 0, 0, 0, 30, 0, 20, 10
B=C: 10, 80, 0, 0, 0, 80, 70, 10, 20, 0
B=D: 10, 0, 70, 0, 10, 0, 70, 10, 0, 10
B=E: 10, 0, 0, 60, 10, 20, 0, 0, 20, 10
C=D: 0, 20, 70, 0, 10, 80, 0, 0, 80, 10
C=E: 0, 20, 0, 60, 10, 0, 70, 10, 0, 10
D=E: 0, 0, 30, 60, 0, 20, 70, 10, 80, 0
Step 3: Find the strongest beatpaths between each pair of candidate
pairs. Rank the candidate pairs according to (# of beatpath wins) - (#
of beatpath losses).
Using Margins in the example, we get a ranking of:
1. B=C and C=D (6)
3. A=B and D=E (5)
5. A=C, B=D, and C=E (-1)
8. A=D and B=E (-5)
10. A=E (-9)
The only counterintuive results are B being more correlated to C than to
A, and D being more correlated to C than E -- distinctions appearing
between identical distances on the political spectrum. But it's far
more reasonable than the original definition of third-order correlation.
More information about the Election-Methods
mailing list