[EM] CONFIRMATION SAMPLE SIZE

Thu Nov 20 00:11:01 PST 2003

CONFIRMATION  SAMPLE SIZE   (WAS Re: Re: Touch Screen Voting Machines)

THE  QUESTION.  In EM message 12737, Wed. 19 Nov 03, Ken Johnson asked:

“Suppose you have a two-candidate election with 10,000,000 voters, and the 
computer
says that candidate A beats candidate B 51% to 49%. How many 
randomly-selected ballots would you need ...to confirm the election result 
with 99.99% confidence?”

Ken asked the question for two different cases: one where you not only 
inspect each sampled ballot but also ‘correlate’ it to a database, and the 
other where you don’t correlate (because presumably you can’t or needn’t).

I will treat both cases together.  As I understand the matter (or maybe 
don't), the required sample-size N should be the same in both cases;  the 
cases differ only in the kind of validation demanded, to ensure that an 
object in the sample may be accepted as an authentic individual of the 
population being sampled.  (Here in each case the population comprises 10 
million ballots.)

For a discussion and solution, we need a few definitions.  Let P be the 
proportion of ballots for A, and NTOT the total number of ballots cast: here 
NTOT = 10 million.  Let N be the size (to be chosen) of the random sample of 
ballots, and let p be the sample’s proportion of ballots for A.

The answer N to Ken’s question depends on what, precisely, one means by ‘the 
election result’ - and also (though I won't go explicitly very far into the 
issue here) by  ‘confirm with 99.99% confidence’.  Namely, either of two 
hypotheses (i.e. conditions) may be deemed ‘the result’ of interest here.  
The hypothesis ‘P=0.51’ expresses the computer’s finding, whereas the 
hypothesis ‘P>0.50’ expresses the operative result that A wins.

Remember that unless the sample is total and exhaustive, the sample data 
cannot unshakeably confirm a given value or restricted range of values for 
P.  Also - with very few exceptions- a reasonably small sample cannot 
unshakeably reject or invalidate a value or range either.  (One of the few 
exceptions is the value P=0, in case the sample does contain one or more 
votes for A.)

What a sample CAN do is give data which may show that one given value of P - 
e.g. P=0.51 - is more likely (maybe much more likely) than another such 
value - e.g. P=0.49 or P=0.50.  In other words, sample data can help us 
decide among two hypotheses H0 and H1 (statisticians typically call these 
the ‘null hypothesis’ and the ‘alternative hypothesis’), each of form ‘P=X’.

THE QUESTION, INTERPRETED, IN  PRINCIPLE AND IN PRACTICE.  I therefore 
interpret the question in principle as asking each of a host of questions of 
the following form, one for each value X less than or equal to 0.50 .

Please bear briefly with this theoretical host.  In a short while, for the 
practical solution, the host of questions will reduce to a single question - 
for the case X=0.50 .    Namely (and I will not show this, but it can 
readily be shown), a value N big enough for that case will work for any 
other case.

Consider the following two hypotheses:

	H0: P=0.51 and H1(X): P=X (where X is some given value at most 0.50).

H0 says that the computer is accurate.  H1 says that the computer is 
inaccurate and worse: it is  operationally wrong, in that A actually loses, 
and has a (given) minority (X) of the votes.

The question for given X then asks us to choose sample size N large enough 
so that:

	If H0 is in fact true, then with probability at least 99.99% the resulting 
sample data will be more likely under H0 than under H1(X); and
	If H1(X) is in fact true, then with probability at least 99.99% the 
resulting sample data will be more likely under H1(X) than under H0.

So, in effect, the sample size will allow us with high (99.99%) ‘confidence’ 
to ‘confirm’  whichever of H0, H1(X) is true, at least vs. the other 
hypothesis.

(Comment: if M is any desired multiplier >1, and if in the displayed 
requirement statements you want to replace the condition ‘will be more 
likely’ by the condition ‘will be at least M times more likely’, you can 
still satisfy the statements, by a suitable larger choice of N.  My solution 
below will not go further into this, but it could readily be modified to do 
so.)

In other words, on the question (defined for each nonnegative X up to 0.50) 
as to which of two possibilities is really more credible - the computer 
being right, or A losing with P=X - sample size N will be large enough to 
enable us almost surely to find out.

As stated above, we need only choose N large enough for the case X=0.50.

SOLUTION  CALCULATIONS.  Given that in fact P=X, and with N of sufficient 
size, the observed proportion p of pro-A ballots, considered over all 
possible size-N samples, will follow nearly a bell-curve (i.e. Gaussian 
normal) distribution, with mean X and standard deviation S = SQRT 
(X(1-X)/N).  Here, we use instances X=51% and X=50%.  In both instances for 
practical purposes S = (1/2)*SQRT (1/N). (This is exact for X=0.50; S is 
very slightly less when X=0.51 .)

For X=0.51 the idea is to take N so large that 99.99% of the distribution of 
p will be closer to 0.51 than to 0.50.  The same N will work for X=0.50: 
then 99.99% of the distribution will be closer to 0.50 than to 0.51.  For 
both cases at once, it suffices to ensure that N is big enough to make a 
suitable multiple zS of standard deviation S be smaller than 
(1/2)(0.51-0.50), = 0.005.

Recall, S = (1/2)*SQRT(1/N)).  So we need N big enough to make z*SQRT(1/N) 
<0.01, or equivalently so that:

	 N>(10^4)*z^2.

Here, z^2 denotes z-squared, and z is defined as that number (or any 
convenient larger value) for which in a standard bell-curve distribution 
(which has mean 0 and standard deviation 1) 99.99% of the distribution mass 
(i.e. area under the bell curve) is less than z.

To find z, one may resort to any usual table of standard bell-curve tail 
areas, or use an equivalent computation routine.  (Temporarily lacking both, 
I first used an inequality readily derived from the analytic definition of 
the standard bell curve: namely that the tail area under the curve beyond z 
is less than - but nearly - (1/z) times the height of the curve at z.)  
Result is that it suffices to take z=3.9 or any larger value, and then from 
the above displayed inequality N may be taken any convenient value > 
152,100.

Actually N may be taken slightly smaller, because in the above formula for S 
we omitted the factor SQRT(1-f), where f is the finite-population ‘sampling 
fraction’ N/NTOT.  Here, to first cut, f is about 150,000 / 10 million, or 
0.015; so that a refined calculation allows us to take N approximately
(1-0.015)*152,100,  i.e. just under 150,000.

PROVOCATIVE COMMENT.  This sample size N=150,000 in effect will with high 
confidence confirm any majority margin of 51-49 or better, regardless of 
whether that margin has been ‘counted’ by a computer or ‘predicted’ by a 
prior poll or simply happens quietly to be the voters' mass preference.

One may argue that if the true margin is closer than 51-49, there is no 
effective ‘mandate’ anyhow from the total electorate.  Hence, why not simply 
use this sample size N=150,000, randomly choose N electors, and go with 
their decision?  At least you then have only 150,000 ballots to authenticate 
and count, not millions.  Also, mass ‘campaigning’ could at one and the same 
time be both far cheaper and yet more directly and meaningfully involve the 
electors.

More in this vein another time.  I still plan in a later posting to show 
that in fact average voter power, counting  both sampled and unsampled 
voters,  is typically enhanced, not reduced, by replacing a mass electorate 
by a much smaller random sample.

Joe Weinstein
Long Beach CA USA

_________________________________________________________________
Need a shot of Hank Williams or Patsy Cline?  The classic country stars are 
always singing on MSN Radio Plus.  Try one month free!  
http://join.msn.com/?page=offers/premiumradio