Ratings as a Standard

Bart Ingles bartman at netgate.net
Thu May 27 03:31:09 PDT 1999


Blake Cretney wrote:
> 
> Bart Ingles wrote:
> 
> > In an earlier post, Blake Cretney wrote:
> >
> > It occurs to me that extremist voting problems should be excluded from
> > the question of rating-based standards, just as strategy considerations
> > are.  An actual election method would need to deal with both, but the
> > standards are only dealing with hypothetical situations.  We can then
> > relate these standards to actual methods in two steps:
> 
> The purpose of a standard method, as we were using the term, was to
> be able to judge when practical methods were arriving at the proper
> conclusion, based on the sincere preferences of voters.  So, it made
> no sense to criticize a method as a potential standard because of
> strategy problems.
> 
> The plan to hypothesize away extremist voters is a different matter,
> but I can still see some merit in it.  It may, however, be difficult
> to define exactly what constitutes an extremist voter.
> 
> For example, imagine that one issue confronting voters is Proposal X.
>  Most voters dislike Proposal X, but only as one of many issues.  A
> smaller percentage of voters love proposal X with a zeal that causes
> them to rate exclusively on this basis.  Under these circumstances,
> support for proposal X would tend to help a candidate (in any method).
>  However, this is particularly the case in methods based on ratings,
> where the supporters will have an effect far above what their numbers
> might suggest.  So, my question is, [1] are the Proposal X supporters the
> kind of voters that you will hypothesize do not exist.  If not, [2] is the
> likely result using ratings a problem.  [3] Ratings would seem to suggest
> that Proposal X is overwhelmingly likely to be right. Do you agree
> with this?

[1] I was thinking more in terms of assuming all examples have
candidates rated between 0 and 100, where 100 is fully qualified for the
position, and 0 is lacking any particular qualifications.  I wouldn't
necessarily exclude the X supporters in your example, but would make
sure that their ratings were bounded reasonably.

[2] I don't think the problem is as bad here as with ranking in
unfavorable ranked examples.  In the above case, the X advocates will
win only if the multi-issue voters can't find a candidate with whom they
agree on most issues, or if they are split so that the X voters can act
as the tiebreaker.  At the very least, you can still say that the larger
the majority, the more ambivalent they can be and still cancel out a
strident minority.

At worst, this kind of problem is merely the opposite of the problems
you have when ignoring ratings completely.  You could avoid either
extreme by using ratings, but by giving less weight to the ratings than
Average Ratings would, or by limiting the rating value that can come
from a single issue (say to 50 points or so).

[3] I don't believe that ratings (or rankings) necessarily mean that the
voters are correct.  This question comes up later in this message.


> Also, at what point does the tendency for people to inaccurately
> describe the degree of difference between alternatives become the kind
> of voting that you would ignore for stage one.

Since these are hypothetical examples, I don't think this applies here.


[...]
> Bart Ingles wrote:
> 
> > Blake Cretney wrote:
> > >
> > > Bart Ingles wrote:
> > > > In the following ratings example:
> > > >
> > > >                   Candidates
> > > > group votes   A     B     C
> > > >   I    50      100    95    0
> > > >  II    50        5   100    0
> > > >
> > > > By arguing that ratings are irrelevant, you are saying that the above
> > > > example is an exact tie between A and B.  To me this is not even a
> close
> > > > contest.
> > >
> > > That is our point of disagreement.  More on this below, but let me
> > > point out that if not for candidate C, this would have been an exact
> > > tie in ratings as well.
> >
> > Only if you normalize the ratings.  Since the ratings are intended to
> > describe a scenario that could be handled differently by different
> > election methods, it is probably better to leave the numbers alone as
> > much as is practical, and let the actual methods decide how to handle
> > the situation.  For example, the group I voters might be more likely to
> > abstain under some methods.
> 
> Because in average ratings it is generally assumed that people will
> vote from 0 to 100, and because this was the method you were first
> proposing as a standard, I was assuming normalized ratings.

Actually I was simply choosing examples that happened to have high and
low votes of 100 and zero.

> It is arguable, however, that non-normalized ratings makes a better
> standard, since it corresponds to the principle of maximizing
> perceived utility.  Of course, this method cannot be implemented, but
> that doesn't matter to whether it can be used as a standard.
> 
> > If you want to start with unlimited absolute ratings, then any such
> > "Raw" ratings above 100 or below 0 should be handled by compressing or
> > clipping the values, so that all such values are at or near 100 (or 0),
> > without disturbing middle ratings.  This should allow the rated
> > scenarios to predict outcomes under various actual methods, since the
> > degree to which a candidate is "overqualified" could be considered
> > irrelevant.
> 
> That doesn't make sense to me.  Clipped ratings don't have the
> intuitive use as a standard that raw utility ratings do, and they
> can't be used in a method like normalized ratings can.  I could
> provide a more detailed criticism of this method, but I'm taking this
> suggestion as a momentary musing on your part.

I would use raw ratings for comparing various real methods, but use
whatever form of normalization is appropriate (if any) when applying a
particular benchmark method.

If using Averages as a benchmark, raw ratings would not work for obvious
reasons, but neither would linear normalization.  If you set the 0-100
scale to represent the boundaries of what could be considered relevant,
then clipping to these limits would be a simple way to preserve absolute
differences where they are most meaningful, while limiting extreme
influences.

You could also use a more complicated non-linear function for
normalizing, but I think the further improvement would be slight.  I
don't think minor differences in benchmark results are that significant
anyway; I'm more interested in situations where the benchmark and the
actual election method are at opposite extremes.


> > As for the absence of candidate C in the above example, I don't believe
> > that comparing isolated pairs of candidates is necessarily meaningful.
> > For example, if C had stayed out of the race, how do you know that C'
> > wouldn't have run in his place?  The A vs. B race, with B "normalized"
> > to zero, would just be a hypothetical construct that presents a
> > (potentially) distorted view of the voters preferences.
> 
> That's my point.  Average Ratings is so affected by the presence or
> absence of candidates like C, candidates with no support themselves,
> but which serve to alter the ratings of other candidates, that results
> can seem pretty arbitrary.  Of course, all realistic methods can be
> affected by the introduction or removal of non-winning candidates, but
> not to such profound a degree.
> 
> This is not the case if we use total utility (raw ratings), for
> example, if candidates A and B are running
>        A    B
> I  51  100  0
> II 49  0    100
> 
> And we add candidate C
> 
>     C
> I   -100
> II  -10
> 
> Now, if ratings are normalized, the effect is similar to the example
> above, and B wins.  However, A still has the greatest total utility,
> unaffected by the presence or absence of alternative C.

Which is why I wouldn't use linear normalization.  And of course raw
ratings would allow Average results to be skewed by extreme positive or
negative ratings.  Hence clipping, or something similar.

> 
> --snip--
> > > That is, I am defining a correct choice as the true best choice, not
> > > merely an accurate representation of the individual voter's thinking.
> > > Since what we want to find is the best candidate, this makes sense.
> >
> > That's fine, if you have an oracle to tell you who the best candidate
> > is.  The point of democracy is to let the voters decide who is the
> > best.
> 
> My method is not at all "Oracle" based.  The result depends on the
> voters just like in any other method.  However, I look at the voters
> as an admittedly fallible means of determining the objectively best
> candidate.  This is in contrast to those who decide that a particular
> method corresponds to the voter's will, and then answer any criticism
> by saying that as their method represents the will of the voters, it
> must be perfect.  Or those who claim that there is no best candidate,
> as all opinions are equally right.
> 
> I don't actually put you in either of these categories, but I point
> these positions out to explain that I use the goal of "best guess at
> best candidate" in the hope of not  falling into one of these traps.
> 
> > Why wouldn't you want an accurate representation of the voters'
> > thinking?
> 
> I'm not sure what you mean by this.  If I understand you correctly,
> you are questioning me for advocating deliberately throwing away
> information in choosing rankings over ratings.  However, to get the
> most information possible about voters thinking, we would use
> unbounded utility ratings, with each candidate rated from negative to
> positive infinity.  If we could ensure sincere votes, this would
> result in government by the most passionate, which is clearly not a
> good idea.  Therefore, I think we can conclude that at least in some
> cases, throwing away information can be a good thing.

And of course I am advocating clipped ratings :-)

But if ratings above 100 and below 0 are the product of extremist
thinking or emotionalism, then we have good reason for excluding this
information.  We could even go farther and say that no two candidates
can be more than 50 points apart (or some other number), without having
to discard ratings information entirely.

Just to be clear, I'm not that hung up on Average Ratings per se.  I
have used it because it's easy to understand and it takes both number of
voters and rating values into account.  It gives as much weight to
ratings as to number of voters, which may be viewed as an extreme (I
can't imagine why I would want to give more weight to ratings). 
Condorcet would be at the opposite extreme, not using ratings at all.


> > > But if we look at the more narrowly defined kind of wrong vote that
> > > your example is interested in, I still think it is handled well by
> > > ranking.  I do not agree with the contention that "50% of the votes
> > > would be absolutely wrong".  Unless error is playing favourites, it
> > > will serve to elevate A further over B at least as often as it
> > > elevates B over A.  Since more of a change is necessary to make B over
> > > A then to keep A over B, the result will be simply to reduce the
> > > margin of A's victory over B.  This is desirable if the difference
> > > between the two is so shaky, and tends to replicate some of the effect
> > > you hope to get from Ratings.
> >
> > Not all errors are random, especially within a single election.  The
> > voters who rate A over B could all be using the same bad information,
> > for example.  You could say that the errors are random across many
> > elections, of course, so that the likelihood that such an error would
> > have caused a reversal of position would always be less than 50%, but
> > would approach 50% when A and B approach equality in the ratings.  Only
> > errors that caused a reversal of position would change the outcome of
> > the election (at least in the example).
> 
> It's an interesting point.  I summarize it as follows:
> 
> People who rate two candidates similarly may be deciding on the basis
> of very little information (how the candidates feel about some minor
> issue for example).

Along with a larger amount of information showing the candidates to be
similar, of course.

> Since it is easy for the voter to be wrong on a single minor issue,
> we should avoid giving these votes the same weight as a vote that
> rates the candidates further apart (and is therefore likely based on
> more issues).
> 
> This isn't a real problem if the "wrong" decision is held only by a
> few, or even if voters are deciding randomly on this decision, because
> this would result only in a decreased margin for the better over the
> worse candidate.  Where it is a problem is where there is a widespread
> misconception about a single issue that causes people to tend towards
> deciding wrongly about it.
> 
> As well, for ratings to give a better result, there have to be a
> smaller number of more strongly felt B over A opinions to counter.
> Otherwise, ratings would come to the same conclusion as rankings.
> 
> But if we remove the assumption that B is the right choice, and
> instead try to determine who is the most likely best candidate based
> on the ratings, I suggest the answer is A.  After all, consider the B
> over A voters.  It is true that their decision to rank B over A is not
> that contrary to the A over B voters, since their actual difference in
> ratings is small.  But, the difference in opinion between an A over B
> and a B over A voter is great.
> 
> For example        A           B          C     D ...
> A over B           50          49         ?     ?
> B over A           0           100        ?     ?
> 
> The A over B voters see the candidates as very close, the B over A
> voters see them as radically different.  This difference must have a
> cause, and under your assumptions, this cause would be that the two
> groups actually differ on other issues.  So, who is right about these
> other issues?  I suggest that the A over B voters are more likely to
> be right simply because there are more of them.  And if the wider
> difference expressed by the B over A voters is caused simply by them
> being wrong on a number of issues, it doesn't make sense to use this
> greater difference against A.

Interesting.  If you break the votes into separate decisions on each
issue, then one wrong decision on the part of each of the A>B voters
should have the same effect as an equal number of wrong decisions
allocated among the B>A voters.  It might be easier for a small group to
make a wrong decision than for the large, but they would have to make
more wrong decisions to have the same impact, so it sort of balances
out.

This assumes that the likelihood of a wrong decision is inversely
proportional to the size of the group.  Offhand I would guess that this
would be the case if the errors were "random".  Errors caused by
extremism on the part of the voters might be more likely than average in
a small group.  On the other hand, if the errors were not the fault of
the voters, but rather the result of a political operative supplying bad
information on one issue, then the large group could more easily be the
cause of a wrong election outcome.

[snip]
[Bart:]
> > But when a scenario with nearly equally-rated candidates A and B is
> > transposed onto an election using rankings, the question of whether or
> > not A is ranked over, under, or equal to B will depend entirely on
> > strategy incentives and on how closely rated the two are.  If the two
> > are close enough, strategy will be more important to the voter than true
> > ranking.  Better to provide some incentive for equal ranking in such
> > cases, than to encourage coin-flipping or to allow strategy to take
> > priority.
> 
> I think that's a debatable point, but aren't we trying to avoid
> strategy discussions here.  Whether ranked methods or rating methods
> (including Approval) are best at avoiding strategy is its own huge
> issue.  Whether a ranked method is prone to strategy has no baring on
> whether it provides the best guess at best candidate based on sincere
> votes.

Good point about strategy, but "coin-flipping" might still apply.

> --snip--
> > > I think that I am on pretty solid ground with my averaging process,
> > > under two assumptions:
> > > 1.  That group II's perception of their selfish interest is the same
> > > as group I's perception of group II's selfish interest.
> > > 2.  That utility and normalized (rated from 0 to 100) utility are the
> > > same in the example.  Of course, I can just assume they are for this
> > > example, but an argument could be raised if this is rarely the case,
> > > and that therefore my example is atypical.
> > >
> > > If this is the case than group II's "altruistic" vote is designed to
> > > maximize utility.
> > >
> > > In reality, however, you don't have to agree with my precise method
> > > of averaging.  It is obvious that if group I wants to think of the
> > > community interest, this has to be some combination of group I and
> > > group II's selfish interests, because together they are the community.
> >
> > But if group II's selfish interests are harmful to the community, while
> > group I's are not, then group I would benefit the community more by
> > voting "selfishly".
> 
> Precisely.  It seems paradoxical, but if group I votes sincerely and
> selfishly, then the winner maximizes utility, if they vote sincerely
> and with the intent of maximizing utility, then the result does not.

I still don't buy this.  If group I averages their preferences with that
of group II, they are no longer voting sincerely; they are using a sort
of reverse strategy for whatever reason.  Since we have ruled out
strategy, this question seems to be moot.

> If, however, they vote strategically with the purpose of maximizing
> utility, they can do so by voting selfishly.

So the best strategic vote is the same as the best selfish vote.


> So, my point is that although it is quite possible for people to vote
> altruistically using Average ratings, and in fact they may do so as
> often as in any other method, that if they do this it will make the
> method tend to not maximize utility, which is often the main basis on
> which the method is advocated.

I don't know how else to respond to this, other than to say that I don't
agree with your definition of altruistic voting.  The idea that in order
to vote altrustically, a group should average their sincere preferences
with the likely vote of the opposing side, makes no sense to me.


> This is a practical problem, because all methods rely on some
> altruism in voting.  Since the ratings are normalized in Average
> Ratings, it will often by that a group can not properly express the
> extent to which they are damaged by a proposal.  Consider the
> following utilities
>        A     B
> I  51   100     90
> II 49   -400    100
> 
> Here I am attempting to represent a situation in which proposal A is
> particularly damaging to group II, even though it benefits group I,
> although only slightly.  In such an example, we would hope that group
> I would vote altruistically to pick B.  Of course, they can do this in
> any method, including Average Ratings.  However, since average ratings
> tends to punish altruistic voting, this is a problem.

If the ratings were clipped before averaging, the sincere average for A
is 51 and B gets approximately 95.  I'm not sure what you intend here.

Bart Ingles



More information about the Election-Methods mailing list