[EM] Underneath the hood: Controversy over the "best" voting system.

Mon May 3 11:04:09 PDT 2010

For years, voting systems were studied through the use of "criteria," 
standards which a system either passes or fails. These criteria often 
assumed a preference order, sometimes assuming that this preference 
order exists outside what is expressed on the ballot. Arrow's 
accomplishment was in showing that a simple set of criteria, that 
were intuitively satisfying as fair, could not simultaneously be 
satisfied by any voting system (though he wasn't really writing about 
voting systems, but about finding a social preference order from a 
set of individual preference orders).

However, Arrow neglected utility, and there is an obvious method for 
amalgamating individual *utilities* into a social preference order, 
based on the sum of utilities. Arrow knew he was doing this, and his 
argument against using utilities was that he could imagine no 
practical method of finding commensurable utilities. My suspicion is 
that biology knew better, and developed decision-making methods for 
the human brain that involve, in effect, comparing utilities (on an 
attraction/aversion scale) and, in the end, using a 
sum-of-reaction-strengths to decide between options.

Be that as it may, here is an approach for comparing voting systems, 
in terms of theoretical performance:

Given a set of absolute individual utilities for the universe of 
possible candidates, on a scale from maximum attraction to maximum 
aversion, the scale linearized so that the same step size anywhere in 
the scale represents the same effect on individual choice in terms of 
effort that the individual would dedicate to achieving that 
improvement, we can assume that different individuals will have a 
different overall range of utilities. A depressed individual, caring 
nothing about the world, may have a very small range of absolute 
utilities, a highly motivated and engaged individual may have high 
absolute preference strength. We will start by assuming that the 
linearized scale is expanded or contracted so that the individual's 
overall range represents their absolute motivation between the extremes.

There is then a clear possibility for a standard from which to 
determine a social preference order: Candidates are ranked in order 
of their sum of individual utilities on this absolute scale, across 
all participants.

There is no question but that collecting such absolute utility data 
is difficult or impossible. However, it can be approached, but more 
on that later.

There is an immediate application for this, suggested by the work of 
Warren Smith in his simulations (IEVS). I will suggest a "voting 
system" as a candidate to be ideal. The voting system will use as 
input the absolute individual profiles.

"No election," i.e., the choice to repeat the process possibly with 
new candidates, investigation of candidates, etc., must be one of the 
"candidates" rated.

The bot will amalgamate them in two ways: it will order them in terms 
of absolute utility sums, finding the "absolute winner." And it will 
then amalgamate them, using a "strategic voting robot," where each 
individual, having "voted" the true absolute utilities, sees their 
vote optimized into a Range vote of maximum strength, according to 
the bot's knowledge of all other votes. The bot is thus a perfect 
strategic "advisor," knowing all other votes. But since all other 
votes may affect the bot's vote, the bot must approach this 
iteratively, treating all "voters" the same at each stage. Thus each 
"round" of bot voting is based on the results of the previous round.

The bot can use normalized Range ballots, since the only information 
it needs is relative utility differences, it will be working for each 
voter to maximize the voter's expected outcome.

I will not describe the bot's detailed operation at this time, beyond 
saying that it will iterate using the results of the previous round 
to determine votes with maximum strategic power, always participating 
in the determination of social preference order with maximally 
effective votes, according to the best strategy definable from the 
previous round. The criterion for "best" is maximization of expected 
individual utility, this bot begins with first preference information 
to determine winner(1), and then uses this information to determine 
the optimal vote for all voters in subsequent rounds, defining 
"optimum" as being most likely to influence the eventual outcome, 
assuming that the preference sequence determined overall, absent this 
voter's vote, is as shown in the previous round, but that it may 
shift in the next round toward a tie thus making the voter's vote 
effective. The voter's vote for a candidate will be reduced from the 
normally optimal full-approval vote if it shifts the overall 
preference order as to higher preferences. The bot satisfies 
later-no-harm, casting the maximum fractional vote that does not harm 
the overall preference order.

Thus if the voter's previously voted preference, by the bot, slips 
below the preference position of the additional approval being added, 
due to the additional vote for the individual, the vote for all such 
voters is reduced to the factional value necessary to avoid this 
slip. To avoid problems with roundoff, the bot will be allowed to 
vote, for preliminary purposes, the exact fractional vote that 
creates a tie. Later examination will be performed to see if reducing 
this vote by a minimum increment will change the result. At any 
point, if removing or reducing a previously-cast vote improves the 
outcome for the individual, it is removed or reduced.

(The bot votes for all "preference groups" as if they were a 
coordinated block, following its algorithm).

So for each collection of preference profiles, there is determined an 
"absolute winner" and a "strategic winner," who may be different. The 
strategic winner is the candidate who will win if all of his or her 
relative supporters vote to maximize personal utility.

There is one additional consideration. A "voting likelihood standard" 
will be set, being an level of absolute utility below which a voter 
will not bother to participate in an election, or, alternatively, a 
formula is developed for the likelihood of such participation, and an 
"increment threshold" will be set, below which the voter and the bot 
will not act to improve outcome, there being no real preference.

To use this to test voting systems, a set of reference profiles 
(collection of absolute utilities for a set of individuals) should be 
developed, probably based on normal distributions in issue space 
distance. The development of this set of reference profiles should be 
based purely on preference and utility theory. It should be possible 
to predict the voting in some voting systems based on these profiles 
and data about the election, which would be a test of a set of 
preference profiles.

The set of reference profiles, if it is modified to include less 
likely profiles, i.e., if the probability of a profile is lower for 
some (for efficiency), each profile will have relative probability 
information attached to it. Again, I won't describe the details, and 
they are better left for those doing this work.

Then, to test a voting system, the system would be applied as 
"sincere" votes according to the information collected by the system, 
first. The "sincere winner" is then determined. With systems that 
consider preference strength, the absolute preference strength is 
used to determine the "absolute sincere winner," and it is normalized 
to determine the "normalized sincere winner," where the vote is 
normalized to the full range, and the "strategic winner" where the 
vote is optimized by bot.

Regret is defined as the absolute utility difference between an 
"absolute winner" and the result in the categories enumerated, and 
with each of these, the distribution is given, i.e., in how many 
elections out of the reference set did this difference arise, as 
modified by the probability data for the reference set, if it is not 
even across the profiles. Thus, for each system under test, the 
"average regret" is determined, plus the variation and distribution. 
When the system fails to find the absolute winner, how much loss of 
utility was there? How likely was this, in a series of, say, 1000 elections?

Performance of each system under common voting system criteria is 
also determined, and, particularly with regard to criteria that 
require a particular winner from the base preference profiles. How 
often did that winner prevail? How often did that winner lose? And 
what was the average regret in those cases? (With a Condorcet winner, 
the regret can be negative, i.e., the utility is improved by the 
Condorcet winner being beat by a sincere Range winner.)

There is a lot of work to be done to develop methods of comparing 
voting systems. The *goal* of voting systems should be considered. 
The approach described here assumes a value to maximizing absolute 
utility. There are situations where absolute utility can actually be 
determined, such as where there is a common medium of exchange and 
equalized relationships to that medium. But by working with 
simulations, the need to determine absolute utility is avoided. We 
are not proposing the use of absolute utility ratings, for example, 
in real elections. Rather, the performance of a voting system given 
absolute ratings as a measure guiding voting and voting strategy is considered.

In addition to what is described above, realistic models of voter 
behavior as to strategy, where the bot optimizing the vote is not 
available, may be of use as well. On the other hand, a multi-round 
system, in the primary, with "None of the above, other than those 
I've voted for" as an option, can be used, which ties the approval 
cutoff to a real-world and equalized measure, the preference for a 
candidate over holding a runoff, would be interesting. Simulating a 
runoff is impossible because the voter set will be different, but if 
we start with simulated preference profiles for the entire 
population, we can study what will happen if the preference profiles 
don't change (simulating the effect on turnout), but we know that one 
of the major arguments for repeated ballot is that voters gain new 
information. We can simulate this by incorporating an "ignorance 
factor," which makes the voter utility profile murky in the primary 
for some voters. Then this "murkiness" is removed in considering 
turnout for the runoff and how voters will vote there.

Without, however, some agreement on the purpose of voting, and what 
goals are appropriate for it, we will find it continually difficult 
to agree upon the relative performance of voting systems. A 
particular criterion failure, for example, may be so rare, and/or so 
low in damage to the voters involved, that it is negligible, even 
when it *looks* horrible. This is common in consideration of Range 
voting, for example, where supposedly, of 100 voters, 99 vote A,100, 
B,99, and one voter votes A,0, B,100. It's claimed that the A voters 
will be outraged that their favorite lost, but their votes indicate 
that they really didn't care! Where this argument makes sense is if 
there was an unsupported clone, C, who was rated 0 by all the A 
voters, and they only voted B 99 because they were worried that C might win.

What this boils down to is an argument that 99% of voters were 
freaked out by a no-hope C. No system can perform well if 99% of 
voters are totally ignorant of the real situation! In Bucklin, of 
course, this problem would not arise, and Bucklin has the reverse 
problem: if we assume that the utilities are sincere, the utility 
mazimizer, B, loses. But this is a very close election, as stated. 
The loss of overall utility is very low. In a real group, meeting 
personally, using Range for a quick assessment, and if I were the 
chair, I know what I'd do. I'd inform the meeting of the Range 
results, that there was a conflict between the Majority criterion and 
the range result. Depending on context, I'd suggest a motion. But it 
doesn't much matter what that motion would be, because it could be 
amended by a majority, and quickly. I'd certainly allow the B 
supporter to present the reasons for his or her vote. And then the 
majority would decide the election. I have known elections where a 
supermajority voted one way, then saw a minority report, and a 
supermajority changed its decision. If the situation were really such 
that the A voters had no true preference, as the Range votes seem to 
indicate, they might well stand aside, given a reasonable argument 
from the B supporter. But if, on the other hand, the votes of 99 for 
B were artificially high due to the presence of C, and the voters, 
now knowing that C wasn't a real option, weren't willing to support B 
any more, A would win. It would only take one voter!

I do not see public elections going so far to maximize utility that 
they would produce a result like this. It's purely a straw man, 
invented by those with reasons to propose a problem with Range 
Voting, as, for example, Saari.

But, of course, Borda, Saari's favorite, really does have the same 
"problem." Just make it 101 candidates, with 99 of them being truly 
awful, and with all the voters preferring A and B to them except one, 
the B supporter, who reverses the preference, putting B on top and A 
on the bottom. So, converting the Borda votes to the Range ones of 
0-100, and assuming that all voters rank all candidates, we end up 
with the same votes in Borda as in Range, except that now we really 
do suspect B of voting strategically, to bury A. B wins. The "sin" of 
Range here is having so many ratings, but use Borda and Range with 
the same resolution, in an election with a smaller number of 
candidates, the same phenomena can be shown.