Ratings as a Standard

Mon Jun 7 01:25:55 PDT 1999

Blake Cretney wrote:
> 
> Bart Ingles wrote:
> 
> > Blake Cretney wrote:
> > >
> > > The purpose of a standard method, as we were using the term, was to
> > > be able to judge when practical methods were arriving at the proper
> > > conclusion, based on the sincere preferences of voters.  So, it made
> > > no sense to criticize a method as a potential standard because of
> > > strategy problems.
> > >
> > > The plan to hypothesize away extremist voters is a different matter,
> > > but I can still see some merit in it.  It may, however, be difficult
> > > to define exactly what constitutes an extremist voter.
> > >
> > > For example, imagine that one issue confronting voters is Proposal X.
> > >  Most voters dislike Proposal X, but only as one of many issues.  A
> > > smaller percentage of voters love proposal X with a zeal that causes
> > > them to rate exclusively on this basis.  Under these circumstances,
> > > support for proposal X would tend to help a candidate (in any method).
> > >  However, this is particularly the case in methods based on ratings,
> > > where the supporters will have an effect far above what their numbers
> > > might suggest.  So, my question is, [1] are the Proposal X supporters the
> > > kind of voters that you will hypothesize do not exist.  If not, [2] is
> the
> > > likely result using ratings a problem.  [3] Ratings would seem to suggest
> > > that Proposal X is overwhelmingly likely to be right. Do you agree
> > > with this?
> >
> > [1] I was thinking more in terms of assuming all examples have
> > candidates rated between 0 and 100, where 100 is fully qualified for the
> > position, and 0 is lacking any particular qualifications.  I wouldn't
> > necessarily exclude the X supporters in your example, but would make
> > sure that their ratings were bounded reasonably.
> >
> > [2] I don't think the problem is as bad here as with ranking in
> > unfavorable ranked examples.  In the above case, the X advocates will
> 
> The ranked examples that precipitated this discussion occurred when a
> majority of voters voted strategically, and the strategy back-fired.
> Since they involved strategy, they should not be considered for
> determining which method makes the best standard as we are defining
> it.  Do you have bad Condorcet or Path Voting examples where everyone
> votes sincerely?

The first of the two "bad Borda" examples assumed sincere voting:

      100    80    60    40    20    0
      ----------------------------------
45     A                          B  C
15     B  C                          A
40     C                          B  A

> > win only if the multi-issue voters can't find a candidate with whom they
> > agree on most issues, or if they are split so that the X voters can act
> > as the tiebreaker.
> 
>              A anti   B  pro    C  pro
> 90  I anti   100      90        0
> 10  II pro   0        100       100
> Total        9000     9100      1000
> 
> Group I may agree totally with A (after all they rate A at 100), but
> B still wins.  Is this what you want?  Are group II reasonably bounded
> when their votes of B over A carry ten times the weight of A's A over
> B votes?

I don't have too much of a problem with it, since it was group I's
90-point rating of B that made this possible.  If you are concerned that
group II's 100-point spread between A and B could mean that they are
extremists, how about the following example? 

             A anti   B  pro    C  pro
55  I anti   100      98        0
45  II pro   80       100       100
Total        9100     9890      4500

Since group II now only has a 20-point spread between A and B, I don't
think you could call their position extreme.  On the other hand, group I
is now showing extreme indifference between A and B.  I would view group
I's vote as a kind of near-abstention from the A/B race.

You could address the possibility of extreme voting without making
unwarranted use of near-indifferent votes by limiting the maximimum
spread between rankings.  For example, by limiting the maximum spread
between two choices to 50 points, the results in your example would
change from 9000/9100/1000 to:

             A anti   B  pro    C  pro
90  I anti   100      90        40
10  II pro   50       100       100
Total        9500     9100      4600

-- presumably more to your liking.  My counter-example, with no
extremist voters deciding between the front-runners, would be unchanged
(at least between the front-runners):

             A anti   B  pro    C  pro
55  I anti   100      98        48
45  II pro   80       100       100
Total        9100     9890      7140

> > At the very least, you can still say that the larger
> > the majority, the more ambivalent they can be and still cancel out a
> > strident minority.
> 
> That's true, although not very reassuring.
> 
> > At worst, this kind of problem is merely the opposite of the problems
> > you have when ignoring ratings completely.  You could avoid either
> > extreme by using ratings, but by giving less weight to the ratings than
> > Average Ratings would, or by limiting the rating value that can come
> > from a single issue (say to 50 points or so).
> >
> > [3] I don't believe that ratings (or rankings) necessarily mean that the
> > voters are correct.  This question comes up later in this message.
> 
> I didn't say, "which side is correct?"  I said:
> 
> > > [3] Ratings would seem to suggest
> > > that Proposal X is overwhelmingly likely to be right. Do you agree
> > > with this?
> 
> "LIKELY to be right," not, "right."  I was talking about probability,
> not certainty.

I took "overwhelmingly likely" to mean "near certainty".  But the short
answer to your question is, no, I don't agree with the statement.

> --snip--
> >
> > But if ratings above 100 and below 0 are the product of extremist
> > thinking or emotionalism, then we have good reason for excluding this
> > information.  We could even go farther and say that no two candidates
> > can be more than 50 points apart (or some other number), without having
> > to discard ratings information entirely.
> 
> How are the 100 and 0 ratings defined?  This seems to imply that
> absolute levels of support are defined.  How do you decide how much
> someone has to like a candidate for it to be 100 support, or how much
> dislike results in 0.  This decision could alter the outcome
> prescribed by the standard.

I agree that definitions for the 0 and 100 levels are needed.  Since the
benchmark is only intended for hypothetical examples, this should not an
insurmountable problem.  We have been using them in this thread without
a formal definition, after all.

I would define 100 as being fully qualified for the position, using
whatever attributes are being discussed at the time, and 0 as having no
particular qualifications.  The main thing is that the parties to any
debate would have to agree on the definitions.

> > Just to be clear, I'm not that hung up on Average Ratings per se.  I
> > have used it because it's easy to understand and it takes both number of
> > voters and rating values into account.  It gives as much weight to
> > ratings as to number of voters, which may be viewed as an extreme (I
> > can't imagine why I would want to give more weight to ratings).
> > Condorcet would be at the opposite extreme, not using ratings at all.
> 
> It seems that the reason you are not "hung up" on any particular
> Average Ratings method as a standard is that they all have
> demonstrable flaws.  And yet Condorcet is criticized for not being
> enough like these methods.
>
> If you would give a specific, defined standard then I could argue
> whether it is better or worse than Condorcet.  Right now, you seem to
> have a huge group of widely different possible standards.  It seems to
> me that the purpose of this discussion is to come to some agreement
> (or at least debate) a standard for judging the proper winner based on
> sincere preferences.  Then we could decide which method will come the
> closest to this method when votes are not necessarily sincere.  I
> don't see how such a large and vaguely defined mass of potential
> methods as you are proposing can be said to be more or less like any
> other method.

True, there are a number of ways that Average Ratings could be modified
to answer your extremist voter objection, while leaving the method
capable of ignoring extremely weak ratings differences.  I don't see
this as a weakness, but I can understand you objection to the lack of
something concrete to debate.  I gave one example of a "quick and dirty"
method near the top of this post, by simply limiting the maximum rating
difference.  I used 50 as a maximum, but I don't know where you draw the
line between reasonable and extreme.  Maybe we could agree on a cutoff
level, or possibly a curve where x is the sincere rating and f(x) is the
"corrected" rating, stripped of the likely extremist component.

In the examples that started this thread, I used average ratings merely
to give numbers to what appeared to me to be visually obvious.  I hadn't
considered using the method as a standard that would be valid in any
situation.  Whether the method is vulnerable to extremist voting in
other situations doesn't mean that Average Ratings is invalid in the
original examples, however, nor does it defend the ranked methods in
those examples.  It might if those examples used extremist voting, but I
don't see anything that would give that impression (unless all of the
voters were extremist).

> 
> --snip--
> >
> > Interesting.  If you break the votes into separate decisions on each
> > issue, then one wrong decision on the part of each of the A>B voters
> > should have the same effect as an equal number of wrong decisions
> > allocated among the B>A voters.  It might be easier for a small group to
> > make a wrong decision than for the large, but they would have to make
> > more wrong decisions to have the same impact, so it sort of balances
> > out.
> 
> Except that
> 1) The small group may not be making more wrong decisions, they may
> just be making one that they feel particularly strongly about.
> 2) The decisions an individual makes are not made independently from
> each other.  A wrong decision is likely based on incorrect principles
> that will lead to other wrong decisions.
> 
> >
> > This assumes that the likelihood of a wrong decision is inversely
> > proportional to the size of the group.
> 
> Right.
> 
> > Offhand I would guess that this
> > would be the case if the errors were "random".  Errors caused by
> > extremism on the part of the voters might be more likely than average in
> > a small group.  On the other hand, if the errors were not the fault of
> > the voters, but rather the result of a political operative supplying bad
> > information on one issue, then the large group could more easily be the
> > cause of a wrong election outcome.
> 
> I still don't see why a large group would be easier to fool than a
> small group.

Not easier to fool, but maybe just as easy, but with greater
consequences if they are fooled.  If the "error" was due to
manipulation, then group size may not matter.  A large group might be
just as easy to manipulate as a small one.  The payoff would be larger,
though.  Multiplying that effect, if the large group's rating difference
between two candidates was close (which it would have to be for a
smaller group to be a threat), then the manipulation might more easily
result in a reversal in rankings.