[EM] Identifying ranked ballot sets

Thu Oct 4 03:56:05 PDT 2018

On 2018-10-02 01:18, John wrote:
> [Note:  I'm not on-list; please CC me on replies]
> 
> Somebody suggested a two-ballot set to me with three candidates, and
> it turns out you can mutate this set and achieve the same pairwise
> votes.

[...]
> I'm trying to identify a concise representation of a set of ranked
> ballots.  At this time, I believe this may be:
> 
> 1.  The pairwise race matrix (Win-Lose)
> 2.  The matrix of appearances at each ordinal rank
> 
> For BACD and DCAB, #2 is as such:
> 
>    1  2  3  4
> A 0  1  1  0
> B 1  0  0  1
> C 0  1  1  0
> D 1  0  0  1
> 
> It probably doesn't matter if you rank the truncated ballots by
> counting all unranked as ties at that rank or by not counting them at
> that rank.

If I understand correctly, you want to find a way of representing
ballots that takes less space than just listing the ballots directly.

Matrices like the positional matrix (how many ballots have A as first, B
as first, ..., D as fourth) and the pairwise matrix have the property
that the number of bits you need to store the data is a polynomial of
the number of candidates and the logarithm of the number of voters.

However, a pigeonhole argument shows that a data structure that
unambiguously defines a ballot set can't be polynomial in both the
number of candidates and log(number of voters). There must thus exist
two ballot sets that are different but look the same by pairwise and
positional matrices.

If we assume no truncation or equal-rank ballots and that there are c
candidates, then there is a mapping from an individual ranked ballot to
a number between 0 and c!, since there are c! possible ways to list c
candidates in order. (If we allow truncation and equal-rank, then there
will only be more ballots, so this is a conservative assumption.)

Recording a single ballot then requires log(c!) = O(c log c) bits. (All
logarithms from this point on are base-2 logarithms.) If the order of
the candidates and the voters matter, and there are v voters, then the
best you can do is to record each ballot, which will require v * c *
log(c) bits (up to some constant factor). This is exponential in log(c),
and thus neither positional nor pairwise matrices can accomplish this.

Another possibility is to have a c!-tuple, where each element counts how
many voters voted that way. Such a structure would require c! * log(v)
bits; so it's possible to get something that is polynomial in log(v),
but at the cost of no longer being polynomial in c. (Strictly speaking,
the log(v) term can be reduced by noting that not all the elements need
the full bit length because the sum over all of them is supposed to be v.)

If the order of the candidates don't matter, then that's equivalent to
assuming that the first ballot is A>B>C>D>... (you can always relabel
the candidates so that this is the case). That makes the first ballot
redundant and so the required space is (v-1) * c log c, but that's still
not polynomial in log(v).

If the order of the voters don't matter, then we have a stars-and-bars
problem. We have a c!-tuple (one nonnegative integer for each possible
ordering) whose elements sum up to v, so the total possible number of
ballot sets is v+c!-1 choose v. The number of bits required is the
base-2 logarithm of this. As far as I recall, the number of bits is
still exponential in log(v), although I don't have the exact
calculations available at the moment.

Here's a quick attempt to prove what I said. My logic or calculations
might be wrong; check them if you want to be sure:

First use Stirling's approximation: log(N choose K) ~= N log(N/K) +
(N-K) log(N/(N-K)), or N * (log2(N) - log2(K)) + (N-K) * (log2(N) -
log2(N-K)).

Inserting v+c!-1 for N and v for K, and letting x=c!-1 gives

f(x) = (v+x) * (log(v+x) - log(v)) + x * (log(v+x) - log(x))

Taking the limit as x goes to infinity of f(x)/(x * log(x)) gives a
result of 1. This means that f(x) is O(x log x). Thus there's a way of
storing the data that requires O(c! log c!) space, which is not very
useful since it's not polynomial in c.

Alternatively, v+c!-1 choose v = v+c!-1 choose c!-1. Inserting v+c!-1
for N and c!-1 for K, and letting x = c!-1 as before gives

g(v) = (v+x) * (log(v+x) - log(x)) + v * (log(v+x) - log(v))

Since this is the same as f(x) only with x and v flipped, g(v) is O(v
log v).

So we can either have a data structure that is exponential in c (being
O(c! log c!)) or one that's exponential in log(v) (being O(v log v)),
but we can't have one that's polynomial in both c and log(v).