[EM] IRV counting with integers to handle multiple candidates at the same preference level

Sun Jan 30 18:12:06 PST 2022

On 1/29/2022 5:28 AM, Kristofer Munsterhjelm wrote:
 > On 27.01.2022 20:49, Forest Simmons wrote:
 >> El jue., 27 de ene. de 2022 2:44 a. m., Kristofer Munsterhjelm
 >> <km_elmet at t-online.de <mailto:km_elmet at t-online.de>> escribió:
 >>     On 27.01.2022 04:26, Forest Simmons wrote:
 >> The precinct summary would be something like
 >>
 >> 150 ABCDEFG+920BADEFG+ ...13GFED
 >
 > Yes. That is an explicit compression version of my method, and using
 > text instead of binary.
 >
 > If you want to put every bit to maximum use, then instead of the string
 > "ABCDEFG", you can use a 13 bit integer field; in this case, the field
 > would have value 0. GFEDCBA would correspond to value 5039. Truncation
 > and equal rank make it a bit harder; I tried to construct a bijection
 > between truncated/equal-rank ballots and integers once but didn't quite
 > manage to do it.

Thank you Forest and Kristofer for answering my question.

I'm interpreting your replies to mean that of course we can summarize 
the ranked choice ballot data in the way that IRV counting requires.  Is 
that right?

As Kristofer says" "Truncation and equal rank make it a bit harder."

I believe it might be useful to summarize the ballot info in four ways:

1. Counts and encoded rankings for ballots that rank each candidate at a 
different preference level.  Using the method above.

2. Separate counts and encoded rankings for the ballots that have shared 
ranking levels, but just sending a subset of these, and truncated to 
only show the ranks for the most popular candidates (omitting the 
clearly can't-win candidates)

3. Sending the pairwise vote counts -- both for use in finding pairwise 
losing candidates, and as a complete summary of all the ballot data (but 
not sufficient to track IRV or STV vote transfers).

4. Maybe also sending counts for how many ballots mark each oval in the 
pattern.  This is like saying 236 ballots have a mark in column 1 and 
row 1, 932 ballots have a mark in column 2 and row 1, etc.  (If a ballot 
marks more than one oval for the same candidate, only the higher-ranked 
oval would be counted.)  Of course this is equivalent to how Score 
ballots can be summarized.

Regarding type 2, namely shared-preference-level ballots, if a precinct 
has say 50,000 voters, is there a way to estimate whether the upload 
time is hours versus minutes (at normal internet speeds)?

Obviously there can be huge numbers of possible marking patterns when 
shared-preference-level ballots are involved, but how many of those 
patterns would appear on more than just a few ballots?

Would it be reasonable to allow the precinct software to estimate the 
possible winners and truncate the detailed pattern data to only include 
the possible winners (based on this precinct-level estimate).  Or should 
the central counting location be the only source for requesting this 
kind of truncation?

Does anyone recognize whether summary method 4 offers any meaningful way 
to eliminate some of the candidates as can't win candidates?

I continue to believe there must be ways to overcome the IRV limitation 
of not being precinct summable.  At least for gathering enough 
information within an hour of poll closing time to either:

* tentatively reveal enough data to reveal a winner, ...

* or else reveal the election is very close, and a day or two is needed 
to collect the full detailed vote counts.

Richard Fobes
The VoteFair guy

On 1/29/2022 5:28 AM, Kristofer Munsterhjelm wrote:
> On 27.01.2022 20:49, Forest Simmons wrote:
>>
>>
>> El jue., 27 de ene. de 2022 2:44 a. m., Kristofer Munsterhjelm
>> <km_elmet at t-online.de <mailto:km_elmet at t-online.de>> escribió:
>>
>>     On 27.01.2022 04:26, Forest Simmons wrote:
>>     > Here's how to make even IRV precinct summable in a practical way, even
>>     > when there are 135 candidates with 135 factorial permutations, and
>>     many
>>     > more when you take equal rankings an truncations into account:
>>     >
>>     > Encode each ballot as a string, just as we do on EM list election
>>     > profiles. Add the ballot strings together algebraically as though each
>>     > string were a variable name. Algebra software easily adds these
>>     strings
>>     > together algebraically (as opposed to string concatenation) by
>>     > collecting like terms on the fly. The precinct sums are added
>>     together,
>>     > as well, again by collecting like terms. The coefficients of the
>>     > collected terms tell how many ballots in each faction, just as in
>>     our EM
>>     > example profiles.
>>     >
>>     > You don't need to initialize 135 factorial bins, when almost all
>>     of the
>>     > would end up empty, anyway!
>>
>>     It has always been known that any method is summable in O(V) space :-)
>>     Summable in the sense of the criterion means more like O(log V) *
>>     O(C^k).
>>
>>     Strictly speaking, yes, you could store all the votes on a memory card.
>>     Suppose the election has a billion voters and there are 10 candidates.
>>     We need 22 bits to store the permutation index since log_2(10!) =
>>     21.791... (Arithmetic coding might get you even closer, but ignore
>>     for now.)
>>     This times a billion voters gives 22 gigabit = 3 gigabytes, simple and
>>     easy. And that's before compression, so it's a worst bound. Most likely
>>     there are a lot of voters who would vote from left to right or vice
>>     versa, and considerably fewer who would vote far right first, far left
>>     second.
>>
>>     So, if this is easy, then what are the downsides? I guess there are two.
>>     First, that detailed ballot information makes it easier to buy and sell
>>     votes, because you could use the ranking as a signature to prove that
>>     you're voting the way your benefactor wants you to.
>>
>>
>> The precinct summary would be something like
>>
>> 150 ABCDEFG+920BADEFG+ ...13GFED
>
> Yes. That is an explicit compression version of my method, and using
> text instead of binary.
>
> If you want to put every bit to maximum use, then instead of the string
> "ABCDEFG", you can use a 13 bit integer field; in this case, the field
> would have value 0. GFEDCBA would correspond to value 5039. Truncation
> and equal rank make it a bit harder; I tried to construct a bijection
> between truncated/equal-rank ballots and integers once but didn't quite
> manage to do it.
>
> If V >> c!, then the best format is an array c! long that gives how many
> voters voted for each particular ranked ballot. If V << c!, then the
> best approach is just a list of the rank indices for each voter (this
> list should probably be sorted so as not to leak information about what
> the tenth voter voted).
>
> If the number of distinct orders is small, then it may pay off to, in
> effect, RLE encode by index number. The fields would then look like:
>
> (index number, number of voters voting this way)
>
> for every index where there's at least one vote. E.g. if 1024 people
> vote ABCDEFG, then the pair is (0, 1024). The loss here is that you use
> some bits explicitly denoting the index.
>
> -km
>