Condorcet voting methods typically preprocess ballots into a pairwise matrix, which is convenient because the tabulation methods have a significantly reduced set of "input data" vs. having to process all individual ballots.  This is particularly convenient if we wish to allow the "2nd stage" of tabulation to happen on the client, such as in javascript on a web page (for instance, I have been building a javascript vote tabulator which, if provided with a matrix, can do the processing client side: 

<a href="http://www.karmatics.com/voting/testharness.html">http://www.karmatics.com/voting/testharness.html</a> ).  If we have to process all ballots, this could be inconvenient because all ballots must now be delivered to the client, which could be bulky if their are a large number of voters.  In other words, the quantity of input data of a matrix is determined by the number of candidates, while ballot data is determined by the number of voters.

Unfortunately, as Paul K has pointed out, the pairwise matrix is "lossy", as you can never retrieve the actual ballots from it.  Whether the voting method itself actually uses this data or not, people who want to see how everyone actually voted, and possibly do various statistical analysis on it, are limited in what they can do because they cannot see all the data.

Since I am now exploring methods that rely directly on ballot data, rather than on the matrix, I especially interested in finding a convenient non-lossy way to compress the ballot data.  This compression will not only make it convenient to pass the data around (such as delivering it to a client side javascript application), it can also potentially make it much more efficient to batch process.

<br><br>So lets say I have the following ballot data:<br><br>A>B>C=D<br>A>C=D>B<br>D>B<br>A>B>C=D<br>D>B<br><br>Since there are two pairs of identical ballots, this can obviously be compressed into 

<br><br>2: A>B>C=D<br>

1: A>C=D>B<br>

2: D>B As the number of ballots becomes large (say, in the thousands or tens of thousands), this becomes quite significant.  Given N candidates, there is a fixed number of possible unique ballots, capping the quantity of data.  It will still be more data than the pairwise matrix, but far less than having to store each ballot as a separate piece of data.

<br><br>My question is, what is this number?  I'm sure I could work it out but I'm sure someone has already done it....<br><br>Thanks,<br>-rob<br><br>