[EM] 2004 baseball grid, revisited

Fri May 6 16:13:48 PDT 2005

Last fall Jobst analyzed the 2004 team-to-team stats of the American
League as an example of using River.

I decided to revisit the example since I hadn't followed his work the
first time, and found that his River analysis had an error.

In the example, scores were listed as percentages of head-to-head
matches won.

Jobst suggested that I try to find something like Approval to use to
see how DMC would work.

I started using row sums.  If I were using winning games instead of
percentages, the row sum would be total games won, exactly what is
used to rank teams currently.

The row sum of wv is equivalent to the Borda score.  It then occurred
to me that dividing the row sum by N-1 is the average number of
winning votes.

When applied to winning percentages, this gives the average percentage
of games won, a perfectly reasonable way to rank baseball teams, if
not candidates.

So here is that example, reposted.  Because the average percentage is
hard to tell apart, I put brackets around it.

Best viewed with fixed-width font (e.g. Courier) and a wide screen:

Original matrix (row averages on diagonal):

Tms          a    b    c    d    e    f    g    h    i    j    k    l    m    n    o
a ( Bal)  [50]   53   33   50  100   67   33   44   26    0   78   58   71   58   28
b ( Bos)    47 [60]   67   43   86   67   56   33   58   89   56   74   44   74   50
c ( CWS)    67   33 [51]   53   42   68   44   47   43   22   78   67   67   43   44
d ( Cle)    50   57   47 [50]   47   58   56   37   33   67   56   50   11   71   56
e ( Det)     0   14   58   53 [42]   42   22   37   57   44   56   50   44   67   50
f (  KC)    33   33   32   42   58 [33]    0   37   17   22   29   33   44   50   33
g ( LAA)    67   44   56   44   78  100 [60]   56   56   53   65   86   47   44   39
h ( Min)    56   67   53   63   63   63   44 [55]   33   29   56   44   71   67   61
i ( NYY)    74   42   57   67   43   83   44   67 [62]   78   67   79   56   63   56
j ( Oak)   100   11   78   33   56   78   47   71   22 [58]   58   78   55   67   56
k ( Sea)    22   44   22   44   44   71   35   44   33   42 [39]   29   37   22   50
l (  TB)    42   26   33   50   50   67   14   56   21   22   71 [43]   22   50   83
m ( Tex)    29   56   33   89   56   56   53   29   44   45   63   78 [54]   78   56
n ( Tor)    42   26   57   29   33   50   56   33   37   33   78   50   22 [42]   44
o (Intr)    72   50   56   44   50   67   61   39   44   44   50   17   44   56 [50]

Grid reordered in descending order of row average:

Tms          i    b    g    j    h    m    c    a    d    o    l    e    n    k    f
i ( NYY)  [62]   42   44   78   67   56   57   74   67   56   79   43   63   67   83
b ( Bos)    58 [60]   56   89   33   44   67   47   43   50   74   86   74   56   67
g ( LAA)    56   44 [60]   53   56   47   56   67   44   39   86   78   44   65  100
j ( Oak)    22   11   47 [58]   71   55   78  100   33   56   78   56   67   58   78
h ( Min)    33   67   44   29 [55]   71   53   56   63   61   44   63   67   56   63
m ( Tex)    44   56   53   45   29 [54]   33   29   89   56   78   56   78   63   56
c ( CWS)    43   33   44   22   47   67 [51]   67   53   44   67   42   43   78   68
a ( Bal)    26   53   33    0   44   71   33 [50]   50   28   58  100   58   78   67
d ( Cle)    33   57   56   67   37   11   47   50 [50]   56   50   47   71   56   58
o (Intr)    44   50   61   44   39   44   56   72   44 [50]   17   50   56   50   67
l (  TB)    21   26   14   22   56   22   33   42   50   83 [43]   50   50   71   67
e ( Det)    57   14   22   44   37   44   58    0   53   50   50 [42]   67   56   42
n ( Tor)    37   26   56   33   33   22   57   42   29   44   50   33 [42]   78   50
k ( Sea)    33   44   35   42   44   37   22   22   44   50   29   44   22 [39]   71
f (  KC)    17   33    0   22   37   44   32   33   42   33   33   58   50   29 [33]

"Intr" is inter-league play.  I didn't count Intr victories when doing
River.

Percentages >50 are winning, 50 is a tie.

After reordering, the winner is quickly seen to be Boston, which
agrees with River.

I found that the 2004 National League grid was similar -- St. Louis
was the winner with both DMC-AvgPct and River.

I find it interesting that both league winners were predicted by
Condorcet -- possibly one could use this in betting pools ;-).  But
the DMC method is much faster to find by hand than River.

One could of course use Borda/Row-average-seeded DMC for elections as
well.  That would be equivalent to Pairwise Sorted Borda.  And no
extra approval cutoff would be required.

But using Borda score as the seed ranking would overly encourage
strategic burying and eliminate the ability to adjust the approval
cutoff without changing ranking.

Ted
-- 
araucaria dot araucana at gmail dot com