# [EM] Consistency Criterion

Forest Simmons fsimmons at pcc.edu
Sun Aug 26 15:17:39 PDT 2001

```Here's another "paradox" of the same type as the candy jar example below:

Joe and Jill are members of the same basket ball team.

In the first game of the season Joe sinks 30 out of 40 attempted shots.

In that same game Jill makes 9 out of 10.

So in the first game their respective averages are 75 and 90 percent.

In the second game Joe fouls out early having made only one shot out of
ten tries.

In that game Jill makes ten out of forty shots.

Their respective averages are 10 and 25 percent.

So Jill's averages are significantly better than Joe's in both games (90
vs 75 and 25 vs 10).

Yet Jill's over all average is only 38 percent (19/50) compared to Joe's
62 percent (31/50).

Similarly one baseball player can have a better batting average than
another every week of the season, yet have a worse over all average.

How could this apply to voting?

Well, the statement about batting averages is strangely similar to the
following true statement:

A candidate could be the IRV winner in every voting district without being
the over all IRV winner.

However, this could never happen in a two way contest, since in that case
IRV and all other reasonable methods pick the majority winner, and
majorities in every district entail an over all majority.

Keeping in mind that the candy jar, basket ball, and batting average
players, two batters) we see that the resemblance with voting
inconsistency is mostly superficial.

However, to clarify the relationship between the two kinds of
inconsistency, here's an example that shows how Cardinal Ratings with
slightly relaxed rules is "candy jar inconsistent" even though CR is
consistent in the usual sense:

Joe and Jill are candidates in a two way contest to be decided by CR
ballots in two precincts.

In the first precinct the forty voters all gave Joe a rating of 75
percent.  Thirty of these voters knew nothing about Jill, so they gave her
no rating at all. The other ten rated Jill at 90 percent.

By the relaxed rules of this CR method, Jill would have won if there were
no other precinct, because all of her ratings were 90 percent compared to
Joe's ratings of 75 percent.

This rule was adopted because it was recommended by statisticians who
liked the best estimates of the candidates' utilities to determine the
outcome of the election.

As you have guessed by now, Jill was better known than Joe in the other
precinct. She was rated 25 percent by all forty voters in that precinct,
while Joe was only rated by ten of the voters, each of which gave him a
ten percent rating.

So Jill won in the second precinct also, 25% versus 10%.

Yet when the results of the two precincts are combined we see that Joe is
the over all winner with an average rating of 62 percent versus 38 percent
for Jill.

Now we see clearly why this kind of paradox doesn't apply to the usual CR:
Usually an unrated candidate will get a default rating. Otherwise the
ballot would be considered spoiled.

Similar considerations apply to other methods. Truncation of preference
ballots either result in default (perhaps tied) ranks or (when not
allowed) the ballot is considered spoiled.

Nevertheless, Blake's main point is well taken. Just because a criterion
seems reasonable on the surface doesn't mean that it really is.

In the case of ordinary voting consistency, I consider it to be a
reasonable criterion. If it is violated flagrantly (as in IRV) that's a
demerit.  If it is violated mildly (as in Ranked Pairs), then that's
excusable.

The IIAC is an example of a criterion that seems reasonable on the
surface, yet turns out to be impossible for any reasonable method based on
preference ballots, because all of the ballot information about one
candidate is relative to the other candidates. Remove candidates and you
remove reference points. That's also why clones are so insidious; add
clones and you add deceptive reference points.

I hope this helps to clarify these seeming paradoxes.

Forest

On Sun, 19 Aug 2001, Blake Cretney wrote:

> On Fri, 17 Aug 2001 12:55:04 -0400
> Douglas Greene <douggreene at earthlink.net> wrote:
>
> > ---------------------------
> > also in the literature called the "consistency" problem:
> > if 2 subsets of the voters agree on a winner, to be consistent, the
> > combined election should produce that same winner - but
> > with the IRV, that is not necessarily so.
>
> Here's an interesting paradox along the same lines.  I'm getting this
> from a book called "The Paradoxicon" by Nicholas Falletta.
>
> Let's say you have two jars, a tall jar and a fat jar.  You also have
> two kinds of candies, orange and mint.  The jars have the following
> contents.
>
> Tall:  50 orange 60 mint
> Fat:   30 orange 40 mint
>
> If you want an orange flavoured candy, and the candies all look the
> same, then you'd want to pick from the tall jar, since it has a higher
> proportion of orange:mint candies.
>
> Now, imagine a different pair of jars with these proportions.
>
> Tall:  60 orange 30 mint
> Fat:   90 orange 50 mint
>
> Once again, you would pick the tall jar for an orange candy.
>
> OK, now imagine that you dumped the contents of both tall jars into
> one giant tall jar, and both the fat jars into one giant fat jar.
> Now, the question is, which giant jar do you pick for the best chance
> of an orange candy.
>
> It would seem sensible that we could rely on a statistical consistency
> criterion to give us an answer, similar to the criterion suggested for
> elections.  That is, if we picked the tall jar both times, it is only
> common sense that we should pick the giant tall jar with their
> combined contents.  Unfortunately, this is wrong.  The giant jars
> contain.
>
> Tall:  110 orange 90 mint
> Fat:   120 orange 90 mint
>
> It turns out, the fat jar is the correct choice, in violation of my
> proposed statistical consistency criterion.
>
> So, I guess it makes sense to be cautious about what criteria one