[EM] Historical perspective about FairVote organization

Mon Mar 18 02:00:56 PDT 2013

On 03/17/2013 06:32 PM, Richard Fobes wrote:
> On 3/15/2013 2:12 AM, Kristofer Munsterhjelm wrote:
>> On 03/14/2013 06:45 PM, robert bristow-johnson wrote:
>>> IRV will prevent a true spoiler (that is a candidate
>>> with no viable chance of winning, but whose presence in the race changes
>>> who the winner is) from spoiling the election, but if the "spoiler" and
>>> the two leaders are all roughly equal going into the election, IRV can
>>> fail and *has* failed (and Burlington 2009 is that example).
>>
>> If you think about it, even Plurality is immune to spoilers... if the
>> spoilers are small enough. More specifically, if the "spoilers" have
>> less support in total than the difference in support between party
>> number one and two, Plurality is immune to them.
>>
>> So instead of saying method X resists spoilers and Y doesn't, it seems
>> better to say that X resists larger spoilers than Y. And that raises the
>> question of how much spoiler-resistance you need. Plurality's result is
>> independent of very small spoilers. IRV's is of somewhat larger
>> spoilers, and Condorcet larger still (through mutual majority or
>> independence of Smith-dominated alternatives, depending on the method).
>
> This is a good example of the need to _quantify_ the failure rate for
> each election method for each "fairness" criteria.
>
> Just a yes-or-no checkmark -- which is the approach in the comparison
> table in the Wikipedia "Voting systems" article -- is not sufficient for
> a full comparison.

Spoiler resistance is to some degree already quantified. If a method 
passes the majority criterion, then it's resistant to spoilers when a 
party or candidate has a majority. A method that passes mutual majority 
is resistant to spoilers outside a group that's ranked first by a 
majority. Independence from Smith-dominated alternatives gives 
resistance to spoilers not in the Smith set; and so on.

But you have a point. In the practical view, these are only interesting 
inasfar as they cover enough to make the method resistant against 
spoilers in general. That is, if an oracle told us that to get 
multiparty democracy where people don't think spoilers get in the way, 
all you need is ISDA and everything else is icing on the cake; then we 
wouldn't need to bother about anything more than ISDA. At least not 
unless the voters would find it unfair *on principle* to have something 
that didn't pass, say, independence from covered alternatives.

That's the division into three I've mentioned before. Performance under 
honesty, things that deter or make strategy unnecessary so we get to 
honesty in the first place, and consistency with itself (or, more 
broadly speaking, compliance with what the voters think should be held 
for the method to be fair).

In all three cases, we have approximations.

Bayesian regret is an approximation to performance under honesty. It 
holds if you assume certain things about what performance actually 
means: how to do interpersonal comparisons, and utilitarianism[2].

Criteria are approximations to the other two. The good thing about 
criteria is that they provide a bound. If I prove that a method passes 
independence from Smith-dominated alternatives (ISDA), then it passes 
ISDA outright. You don't have to worry about that the method only passes 
ISDA in the cases that are irrelevant to a real election. If it passes 
ISDA, it passes ISDA *everywhere*[1]. And I think that's why I try to 
make methods that pass many criteria, because if they pass some 
criterion X, then I can say "done" and move on without having to 
quantify *where* they're passed. This saves a lot of detective work 
determining if the areas where they pass are the areas we care about.

But beyond that, you're right. The approximations are not the real 
things. They're proxies we use because they're easier to investigate. 
And a method might seem to have contradictions when you look upon every 
possible ballot set yet be without such in the real world. For instance, 
if people voted exclusively on a left-right scale, then Condorcet always 
finds a CW and so passes later-no-harm, later-no-help, IIA and so on, in 
these cases. In that case, we could even use Borda IRV if that's what 
the people would prefer. The various monotonicity failures wouldn't be a 
problem because we'd never get to that domain. And if we had some way of 
knowing what level of spoiler resistance is enough (or conversely, what 
isn't), then we could exclude a lot of methods for either being too 
complex or for not passing the mark.

>> It's like reinforcing a bridge that would collapse when a cat walks
>> across it, so that it no longer does so, but it still collapses when a
>> person walks across it. Cat resistance is not enough :-)
>
> Great analogy. We need to start assessing _how_ _resistant_ each method
> is to each "fairness" criteria.

Yes, and these fairness criteria might not even be the same sort as the 
traditional criteria. They might be more vague, like "spoiler 
resistance", which then fails when the voters complain like they did in 
Burlington, and which would really be a meta-category including things 
like ISDA, mutual majority, and so on.

>> It would be really useful to know what level of resistance is enough,
>> but that data is going to be hard to gather.[...]
>
> Indeed, that is difficult.

Perhaps one could make mock elections in some way, or a game where 
candidates distribute benefits to certain groups of voters.

Polls are reasonably good at showing behavior under honesty, I think. 
But one may object that they don't show adaptation to the system in 
question. Both MO and David Wetzell have used arguments of this sort, 
and I think there's something to it. Consider the Range polls on 
rangevoting's site. These show great support and variety, and use of 
ratings values besides min and max. On the other hand, consider YouTube, 
which moved from Range-style voting to Approval style. They presumably 
did this because people voted min and max, although I don't know that 
for sure. If they did, then that shows that the YouTubers adapted to 
Range and started voting min and max.

A game or series of mock elections would have the advantage that it 
would include that adaptation element. However, the pressure might not 
be right. It could induce too much strategy (if the game is set up so 
candidates can only distribute power after each election, thus being 
maximally patronage-like). It could also induce too little. More 
generally, we wouldn't know if we hit the realistic spot. There would be 
no oracle we could ask that would say "yes, with these rules, the people 
will engage in just enough strategy that they would in a real election". 
Still, it would be better than nothing and we might be able to gain 
bounds from it. (If in the most patronage-based, most zero-sum variant, 
people still don't massively bury, then we know they won't in a real 
election, since the real election will be more "kind". Similarly, if the 
voters engaged in massive burial even in the most cooperative scenario, 
then we know that will be a problem in real elections too.)

>  > And beyond that we have even harder questions of how much resistance
>  > is needed to get a democratic system that works well. It seems
>  > reasonable to me that advanced Condorcet will do, but praxeology
>  > can only go so far. If only we had actual experimental data!
>
> My VoteFair site collects lots of data. I have used it to verify that
> VoteFair ranking accomplishes what it was designed to do. Not only has
> such testing been useful for refining the code for the single-winner
> portion (VoteFair popularity ranking, which is equivalent to the
> Condorcet-Kemeny method), but such testing has revealed that VoteFair
> representation ranking (which can be thought of as a two-seats-at-a-time
> PR method) also works as intended.
>
> As for praxeology ("the study of human conduct"), I also watch to see
> how people try to vote strategically. The attempts are interesting, but
> ineffective.
>
> I agree that using better ballots and better vote-counting methods in
> real situation -- using real data -- is essential for making real progress.

Could we use the polling data to get some information about, say, 
candidate variety? I think we could, at least to some extent. We could 
ask something like "how many elections with more than 20 voters have no 
CW?". I think you published stats like that once, but I don't remember 
what the values were.

Perhaps you could also ask the voters some time later if they were 
satisfied with the choice. That kind of "later polling" could uncover 
Burlington-type breakdowns if there were any. If they could rank the 
options in retrospect, it would also be possible to determine whether 
they would have been satisfied with, say, IRV; but I imagine that's too 
much to ask.

-

[1] There are still assumptions about the input, of course. If everybody 
goes on a burial spree, the Smith set may not mean what we think it 
means anymore, and then ISDA would no longer hold. Same thing with 
Majority Judgement and IIA. If people vote in a comparative manner, IIA 
no longer holds for it.

[2] I have also suggested another approximation for performance, but I 
haven't made code to implement it. It's the "games AI" approximation: 
you take a bunch of different games AIs (say, chess programs) and run 
their suggestions through a voting method, creating a "collective AI". 
The better the performance of the collective AI, the better the method. 
This could even be done in a "world champion vs the world" type match, 
where the individual "AIs" are human players. This would be a better 
metric than just using AIs, since then the various advisors could make 
suggestions and thus influence the vote in a way that might improve play.