[EM] Re: CONFIRMATION SAMPLE SIZE

Sun Nov 23 18:21:09 PST 2003

Joe Weinstein wrote:

>
> I will treat both cases together. As I understand the matter (or maybe 
> don't), the required sample-size N should be the same in both cases; 
> the cases differ only in the kind of validation demanded, to ensure 
> that an object in the sample may be accepted as an authentic 
> individual of the population being sampled. (Here in each case the 
> population comprises 10 million ballots.)

That's not exactly what I had in mind. I actually wanted to assume the 
same confidence level (99.99%) in both cases, and find out how much the 
resulting N differes between the two cases.

>
> The answer N to Ken’s question depends on what, precisely, one means 
> by ‘the election result’ - and also (though I won't go explicitly very 
> far into the issue here) by ‘confirm with 99.99% confidence’. Namely, 
> either of two hypotheses (i.e. conditions) may be deemed ‘the result’ 
> of interest here. The hypothesis ‘P=0.51’ expresses the computer’s 
> finding, whereas the hypothesis ‘P>0.50’ expresses the operative 
> result that A wins.

I'm not specifically interested in whether the vote tally 
(5,100,000:4,900,000) is exactly right; my primary concern is whether 
the election result "A wins" is correct. Probably a more precise 
statement of the question is "IF the election is rigged, resulting in an 
erroneous result (B should have won), then what is the probability of my 
NOT discovering this from a random sampling of N ballots, and how big 
should N be to make the probability < 0.01%?"

>
> ...
> Actually N may be taken slightly smaller, because in the above formula 
> for S we omitted the factor SQRT(1-f), where f is the 
> finite-population ‘sampling fraction’ N/NTOT. Here, to first cut, f is 
> about 150,000 / 10 million, or 0.015; so that a refined calculation 
> allows us to take N approximately
> (1-0.015)*152,100, i.e. just under 150,000. 

I think this answers my question, at least for the case of unserialized 
ballots: N = 150,000. If the ballots are serialized, however, it seems 
to me that the result would differ. Following is an outline of my 
(faulty?) logic:

First, I assume that the number of database ballot records has been 
verified to be NO GREATER than the number of official (paper) ballots, 
and that the ballot serial numbers in the database have been confirmed 
to be unique. Then if any database ballot records do not correlate to 
official ballots, an equal or greater number of official ballots will 
not correlate to database records, so inspection of sufficiently many 
official ballots will uncover the discrepancy.

Under the assumption that the vote is rigged, resulting in the wrong 
result, then at least 1% of the official ballots will correspond to 
erroneous or missing database records. If I inspect a single ballot an 
try to correlate it to the database, there will be > 1% of finding a 
discrepancy, or < 99% chance of not finding it. If I check N 
randomly-selected ballots, the probability of not finding a discrepancy 
is < 0.99^N; if I want this value to be less that 0.01% it suffices that 
0.99^N = 0.0001. Hence, N = log(0.0001)/log(0.99) = 916; and with less 
than 1000 ballots I can confirm, with > 99.99% confidence, that the 
election is not rigged. If, however, I determine that the election IS 
rigged, I still may not have enough information to determine whether the 
discrepancy is sufficient to affect the election outcome. But the ballot 
serialization at least makes it possible to determine, very quickly and 
inexpensively, whether there is any significant probability of election 
fraud. If there is, then you can take it to the next level by getting 
courts and law enforcement involved and, if necessary, doing a full recount.

Consider another situation in which A wins, not by a 2% margin, but by 
just 0.02%. With serialization, it still takes a comparatively small 
number of ballots (< 1400) to validate the election, but now how big 
would the sample size have to be without serialization?

>
> PROVOCATIVE COMMENT. This sample size N=150,000 in effect will with 
> high confidence confirm any majority margin of 51-49 or better, 
> regardless of whether that margin has been ‘counted’ by a computer or 
> ‘predicted’ by a prior poll or simply happens quietly to be the 
> voters' mass preference.
>
> One may argue that if the true margin is closer than 51-49, there is 
> no effective ‘mandate’ anyhow from the total electorate. Hence, why 
> not simply use this sample size N=150,000, randomly choose N electors, 
> and go with their decision? At least you then have only 150,000 
> ballots to authenticate and count, not millions. Also, mass 
> ‘campaigning’ could at one and the same time be both far cheaper and 
> yet more directly and meaningfully involve the electors.

BLASPHEMY! EVERY VOTE COUNTS!
Consider the 2000 U. S. presidential race in which Bush beat Gore in 
Florida by a scant 0.02% margin (537 votes), winning him the Electoral 
College even though Gore won the popular vote by a (comparatively) 
whopping 0.5% (532,994 votes). Statistically, 0.02% is effectively a tie 
- they might just as well have determined the outcome by a coin toss and 
saved all the lawyers' fees. In this kind of situation it doesn't really 
matter much who wins; what is important is that the process by which the 
winner is selected be institutionally sanctioned and unbiased. Given 
people's firm belief that "every vote counts" (or should count), I don't 
think there would be any support for a statistically-determined election 
outcome, or even a coin toss, unless every last vote has been counted 
and the tally is an exact numerical tie.

Ken Johnson