# [EM] Re: CONFIRMATION SAMPLE SIZE

Sun Nov 23 18:21:09 PST 2003

```Joe Weinstein wrote:

>
> I will treat both cases together. As I understand the matter (or maybe
> don't), the required sample-size N should be the same in both cases;
> the cases differ only in the kind of validation demanded, to ensure
> that an object in the sample may be accepted as an authentic
> individual of the population being sampled. (Here in each case the
> population comprises 10 million ballots.)

That's not exactly what I had in mind. I actually wanted to assume the
same confidence level (99.99%) in both cases, and find out how much the
resulting N differes between the two cases.

>
> The answer N to Ken’s question depends on what, precisely, one means
> by ‘the election result’ - and also (though I won't go explicitly very
> far into the issue here) by ‘confirm with 99.99% confidence’. Namely,
> either of two hypotheses (i.e. conditions) may be deemed ‘the result’
> of interest here. The hypothesis ‘P=0.51’ expresses the computer’s
> finding, whereas the hypothesis ‘P>0.50’ expresses the operative
> result that A wins.

I'm not specifically interested in whether the vote tally
(5,100,000:4,900,000) is exactly right; my primary concern is whether
the election result "A wins" is correct. Probably a more precise
statement of the question is "IF the election is rigged, resulting in an
erroneous result (B should have won), then what is the probability of my
NOT discovering this from a random sampling of N ballots, and how big
should N be to make the probability < 0.01%?"

>
> ...
> Actually N may be taken slightly smaller, because in the above formula
> for S we omitted the factor SQRT(1-f), where f is the
> finite-population ‘sampling fraction’ N/NTOT. Here, to first cut, f is
> about 150,000 / 10 million, or 0.015; so that a refined calculation
> allows us to take N approximately
> (1-0.015)*152,100, i.e. just under 150,000.

I think this answers my question, at least for the case of unserialized
ballots: N = 150,000. If the ballots are serialized, however, it seems
to me that the result would differ. Following is an outline of my
(faulty?) logic:

First, I assume that the number of database ballot records has been
verified to be NO GREATER than the number of official (paper) ballots,
and that the ballot serial numbers in the database have been confirmed
to be unique. Then if any database ballot records do not correlate to
official ballots, an equal or greater number of official ballots will
not correlate to database records, so inspection of sufficiently many
official ballots will uncover the discrepancy.

Under the assumption that the vote is rigged, resulting in the wrong
result, then at least 1% of the official ballots will correspond to
erroneous or missing database records. If I inspect a single ballot an
try to correlate it to the database, there will be > 1% of finding a
discrepancy, or < 99% chance of not finding it. If I check N
randomly-selected ballots, the probability of not finding a discrepancy
is < 0.99^N; if I want this value to be less that 0.01% it suffices that
0.99^N = 0.0001. Hence, N = log(0.0001)/log(0.99) = 916; and with less
than 1000 ballots I can confirm, with > 99.99% confidence, that the
election is not rigged. If, however, I determine that the election IS
rigged, I still may not have enough information to determine whether the
discrepancy is sufficient to affect the election outcome. But the ballot
serialization at least makes it possible to determine, very quickly and
inexpensively, whether there is any significant probability of election
fraud. If there is, then you can take it to the next level by getting
courts and law enforcement involved and, if necessary, doing a full recount.

Consider another situation in which A wins, not by a 2% margin, but by
just 0.02%. With serialization, it still takes a comparatively small
number of ballots (< 1400) to validate the election, but now how big
would the sample size have to be without serialization?

>
> PROVOCATIVE COMMENT. This sample size N=150,000 in effect will with
> high confidence confirm any majority margin of 51-49 or better,
> regardless of whether that margin has been ‘counted’ by a computer or
> ‘predicted’ by a prior poll or simply happens quietly to be the
> voters' mass preference.
>
> One may argue that if the true margin is closer than 51-49, there is
> no effective ‘mandate’ anyhow from the total electorate. Hence, why
> not simply use this sample size N=150,000, randomly choose N electors,
> and go with their decision? At least you then have only 150,000
> ballots to authenticate and count, not millions. Also, mass
> ‘campaigning’ could at one and the same time be both far cheaper and
> yet more directly and meaningfully involve the electors.

BLASPHEMY! EVERY VOTE COUNTS!
Consider the 2000 U. S. presidential race in which Bush beat Gore in
Florida by a scant 0.02% margin (537 votes), winning him the Electoral
College even though Gore won the popular vote by a (comparatively)
whopping 0.5% (532,994 votes). Statistically, 0.02% is effectively a tie
- they might just as well have determined the outcome by a coin toss and
saved all the lawyers' fees. In this kind of situation it doesn't really
matter much who wins; what is important is that the process by which the
winner is selected be institutionally sanctioned and unbiased. Given
people's firm belief that "every vote counts" (or should count), I don't
think there would be any support for a statistically-determined election
outcome, or even a coin toss, unless every last vote has been counted
and the tally is an exact numerical tie.

Ken Johnson

```