[EM] Some strategy simulation results and comments
Kevin Venzke
stepjak at yahoo.fr
Sun Mar 13 14:19:22 PDT 2011
Hi all,
Here are some results from my strategy simulation.
Let me preface by reiterating what the interesting thing about this simulation
is: The AI voters have to learn each method, with no preprogrammed
understanding of strategy or even what the ballot possibilities are supposed
to mean. Furthermore the pre-election polling that informs the voters about
each other is conducted using the method itself.
Compare to the past, where the voters in my simulations have either been:
1. completely sincere,
2. basing strategy on approval polling, or
3. using method-specific strategy hard-coded (perhaps unconvincingly) by me.
So, some intro notes on methods and names:
Methods like CWP and Range and IBIFA are using 3-slot ballots. Nothing finer
is supported.
AWP and CWP are the Approval- and Cardinal-Weighted Pairwise methods by JGA.
For AWP "ex" and "im" versions are included based on whether approval is
explicit or implied by ranking.
RB is Random Ballot.
RndP means Random Pair (pairwise comparison between two candidates picked
randomly).
CdlA is my Conditional Approval. QR (Quick Runoff) and KH (King of the Hill)
are also my methods.
VDP (Venzke Disq Plurality) is what I will call VFA, so that VFA can refer to
a ballot format.
VFAR3 is VFA ballot runoff type 3, the only one included in this batch. It's
like TTR but with the ability for "against" votes to block a candidate from
the final two.
C//A is Condorcet//Approval. There is C//Aex and C//Aim depending on whether
the approval cutoff is explicitly placed, or implied from the fact of being
ranked.
ICA is my Improved Condorcet Approval. Also included are the FBC/SDSC/SFC-
satisfying MDDA and MAMPO methods.
SMDTR and IBIFA are Chris Benham methods. There is "IBIFAcb" (equal-ranking
(ER) allowed and opposition is defined as Chris originally defined
it), "IBIFApw" (ER permitted but pairwise opposition is used), and "IBIFAst"
(ER not permitted; pairwise opposition is used).
DMC is Definite Majority Choice which had popularity here for awhile.
TACC is Forest's Condorcet method. I wasn't sure, but approval is implicit.
VBV is "Venzke Bucklin variant." There is "VBVer" and "VBVst" based on whether
equal ranking is allowed or not.
QLTD is Woodall's Bucklin variant. Others are DAC, DHSC, and DSC. DSC also has
a "DSCer" version allowing equal ranking. I did allow equal ranking for DAC
and DHSC.
IRNR is Instant Runoff using Normalized Ratings.
For Borda, Black, and Baldwin, ER and truncation were not allowed.
C// anything means a Condorcet winner autowins before attempting the rest of
the method.
MAP is Majority Favorite//Antiplurality.
IFPP is Craig Carey's Improved FPP. It is IRV, but you will eliminate all
candidates with below-average FPs.
2sMMPO is MMPO on approval ballots. Also "MMPOFPP" is included which is MMPO
but breaking a tie with (original) FPs.
Basic Condorcet methods: WV and margins (which are Minmax or whatever you
have). Plus Raynaud(wv).
AER is Approval-Elim Runoff aka Approval AV.
Also included: Antip(lurality), FPP, IRV, TTR, Coombs, Appr(oval), Range3,
Bucklin, MCA. It is possible that Range fails to turn into Approval
strategically, not because of poor AI, but because with a small number of
voting blocs, the methods actually aren't the same strategically.
In this batch there were eight blocs allocated evenly across a 1D spectrum,
and for each "trial" the three candidates were placed randomly. The "base"
size of each bloc is the same, but only 50% of any is guaranteed to show up
for a given poll/election. There are only 1500 trials, which took about eight
hours on a single thread. Each trial is a scenario (candidate allocation) fed
to each of 52 included methods in turn, and then each method required about 3-
10 thousand polls (as needed to achieve a certain level of confidence of
understanding). In total there were 286 million polls.
Eight is not many, but there's no way around an unrealistically small number,
because the AI needs to be able to perceive that it can affect the outcome.
Simultaneously I ran batches for 1) the same scenario except one candidate is
guaranteed to be dead center on the spectrum, and 2) a completely random
scenario not based on a spectrum. But I won't discuss those at this time. We
have enough to go through! So let's start:
COMPETITIVENESS
A curse of many sims I've run is that most scenarios attempted aren't
competitive: One candidate will win virtually all the time. This suggests I am
mainly simulating poor nomination choices and it isn't very interesting to
study.
Fortunately, this "evenly spread voters" scenario, combined with more
intelligent voters, tended to be more competitive with around 40% of trials
having this property: "the candidate who won the most often, won fewer than
85% of the polls."
The most competitive: RndP (100%), RB (98.3%), QR (42.7%), AWP (implicit or
explicit cutoff) (42.4%). Skimming down the list other methods had: margins
41.9%, MCA 41.9%, WV 41.7%, TTR 41.1%, Range 40.9%, and then approaching the
bottom: Approval 39%, IRV 38.3%, FPP 37.2%, VDP 36.3%, 2-slot MMPO 24.9%,
Borda 22.1%, Antip 0.2%(!).
COMPROMISE
A voter used "compromise" strategy if they voted their second-favorite
candidate first, above the true favorite. Methods satisfying FBC are not
supposed to require this strategy, and we can check whether the AI perceived
this.
The "worst" methods for compromise were FPP (avg 20.7% of voters), then VDP
19.4%, Borda 17.4%, IFPP 17.1%, TTR 17.0%, RndP(!!) 16.0%, DSC 15.2%, IRV
15.0%, QR 13.4%, VFAR3 13.0%, C//IRV 11.9%, KH 10.6%, Black 10.2%, QLTD 9.9%,
VBVst 9.4%, C//KH 9.3%, Baldwin 9.2%, Bucklin 8.8%, MAP 8.5%, DAC 8.3%,
margins 7.6%, IBIFAs (all three; the ER scores may seem high) 6.3-6.6%, IRNR
6.1%, TACC 5.0%, C//Aex 4.5%, AER 3.7%, CdlA 3.7%, RB(!) 3.5%, Coombs 3.3%...
And now we are getting into some FBC methods. MDDA 3.3%, ICA 3.0%, Range 2.7%,
MCA 2.4%, MAMPO 2.0%, MMPO 2.0%, SMDTR 1.9%, Approval 1.2%, 2sMMPO 1.0%,
Antiplurality 0.1%.
Some FBC-failing methods also did as well as this group: DMC 3.1%, C//Aim
3.0%, WV 1.9%, AWPex 1.9%, VBVer 1.9%, AWPim 1.8%, CWP 1.5%, Raynaud 1.2%.
Congratulations to Raynaud.
We see already an oddity here and there. I do think some of it can be blamed
on methods with more ballot types being harder for the AI to learn. It's
possible that sometimes the sincere vote has no detectable benefit over the
strategic one, so the choice is basically arbitrary. The random methods also
seem harder, although curiously the AI doesn't request very many polls before
it believes it has understood how they work.
COMPRESSION
A voter used "compression" strategy if they rated another candidate equal to
their favorite, tied at the top. Around half the methods don't even allow this
type of ballot.
The methods with the most are rather predictable. 2sMMPO 44.0%, Approval
33.2%, Range 28.3% (interesting drop), SMDTR 23.4%, MCA 20.3%, MMPO 15.8%,
MAMPO 14.6%, Raynaud 12.4%, VBVer 11.8%, TACC 10.9%, WV 10.7%, CWP 10.0%, MDDA
9.7%, IRNR 8.6%, ICA 8.3%, CdlA 6.9%, C//Aim 6.4%, AWPim 5.6%, margins 4.7%,
C//Aex 4.5%, IBIFApw & IBIFAcb 4.3%, AWPex 4.3%, DMC 4.1% (!), DAC 3.5%, DHSC
2.3%, DSCer (1.4% and predicted to be none, so I'm glad).
TRUNCATION
A voter used "truncation" strategy if they bullet-voted for their favorite
(and had the option to vote for more candidates).
Again, it starts predictably. Appr 65.6%, Range 57.5%, 2sMMPO 54.9%, Bucklin
54.5%, DAC 50.7%, QLTD 50.5%, MCA 43.6%, MDDA 43.1%, IBIFA (all!) 35.7-38.8%,
VBVst 30.0%, TACC 25.8%, KH 25.7%, ICA 25.6%, IRNR 25.5%, VBVer 25.0%, SMDTR
22.9%, C//KH 22.2%, IFPP 22.2% (??), VDP 21.7%, AER 20.8%, VFAR3 20.0%, C//Aim
19.7%, C//IRV 19.3%, CdlA 18.3%, IRV 18.1% (!), QR 17.1%, DSC 15.7%, MAMPO
14.7%, DMC 13.7%, CWP 13.1%, AWPim 13.0%, WV 11.1%, margins 10.7%, C//Aex
10.6%, AWPex 10.0%, DSCer 8.5% (?), Raynaud 8.0%, MMPO 7.8%.
There is some oddness with the LNHarm methods. In theory there is no need to
truncate and only benefit in listing more options, but there is fairly high
truncation under IFPP, IRV, QR, and DSC. Only much lower in the list do we see
MMPO. My theory is that these five methods are listed roughly in increasing
order of how likely a lower preference is likely to be useful. In IFPP lower
preferences are most frequently unused, so the AI (I theorize) doesn't learn
to provide the preferences. In IRV the preferences are used more often, but
they can't help a higher preference. QR, DSC, and MMPO all seem to provide
increasing benefit to listing additional preferences, not just insulation from
harm from those preferences.
BURIAL
A voter used "burial" strategy if ranked their second choice strictly last. A
narrow definition I know.
This one's not as predictable. VFAR3 was worst with 33.9% voting "against"
the "wrong" candidate (which actually is how the method is meant to work).
Borda was next with 27.9%. VDP 25.8% (ditto comment on VFAR3). Baldwin 21.6%,
DSC 21.3% (surprised), MMPO 20.1%, C//Aex 20.0% (vs 7.7% for the implicit
version, rather confirming my suspicions), Black 18.8%, margins 18.6%, MAP
17.1%, DSCer(why better?) 16.5%, WV 16.3%, MAMPO 16.0%, AWPex 15.2%, Coombs
15.1%, SMDTR 15.1%, Antip 15.1%, IFPP (see paragraph above!) 14.9%, C//IRV
14.7%, Raynaud 14.4%, IRNR 13.6%, QR 12.4%, RndP(!!) 11.8%, C//KH 11.6%,
DMC 11.2%, CWP 10.6%, AER 10.5%, IRV 10.0%, KH 9.7%, CdlA 9.3%, AWPim 8.9%,
VBVst 8.4%, TACC 7.8%, ICA 7.8%, C//Aim 7.8%, VBVer 6.5%, IBIFAs 4.4-6.4%, MCA
6.3%, MDDA 5.2%, QLTD 4.5%, Range 3.8%, 2sMMPO 3.6%, Bucklin 3.1%, DAC 2.9%,
Approval 1.0%.
I expect most readers are skimming the above for the best Condorcet methods.
The best ones are apparently C//A implicit and TACC. C//KH did indeed see less
burial than C//IRV, and WV than margins. But congrats to C//A and TACC.
PUSH-OVER
I define this unusually. Push-over means you gave your least favorite
candidate a top ranking/rating. Normally push-over is defined in terms of what
it can accomplish in IRV, but since my voters don't have that kind of
mentality, it can't be defined that way.
This strategy was not very popular. In Antip it is simultaneously considered
burial and used by 15.2%. 2sMMPO 3.7%, RndP (!) 2.2%, TTR 1.5%, VFAR3 1.5%,
CdlA 1.1%, IRNR 1.1%, Appr (!) 1.0%, Raynaud 0.8%, CWP 0.6%, DMC 0.6%, margins
0.5%...
And we start to get into very small values. IRV was actually one of the "best"
methods with 0.07%. It is possible that IRV would do worse in a different
setup of the simulation. But pushover doesn't seem to be a big concern.
SINCERITY
I called a vote "sincere" if it didn't use any of the above strategies, *or*
the ballot used was the one I called "the" sincere one for the voter. I
wouldn't read too much into this figure.
The best method was Random Ballot, 96.5%. Antip 84.5%, TTR 81.6%, Coombs
81.6%, FPP 79.2%... Hm, do I need to stop once I say FPP was the fifth best?
I'll skip over some then. RndP was only 71.7%. AWPim 71.1%%, AWPex 68.8%, DMC
68.3%, CWP 65.4%, C//Aim 63.6%, Approval 61.2%, WV 60.4%, margins 58.9%, QR
57.1%, C//KH 56.8%, IRV 56.8%, ICA 55.7%, VBVer 55.3%, MMPO 54.8%, Borda
54.7%, C//IRV 54.0%, KH 54.0%, TACC 51.0%, DSC 47.8%, IRNR 47.1%, IBIFA er
versions 46.3-46.7%, IFPP 45.8%, MDDA 39.1%, QLTD 35.1%, DAC 34.7%, SMDTR
34.2%, Bucklin 33.6%, VDP 33.0%, VFAR3 32.5%, MCA 27.9%, Range 8.0%. (Range
has it hard with that middle slot.)
SINCERE CONDORCET WINNERS
How often did the election method elect the sincere CW when there was one? The
figures here were quite good. Well done AI voters!
As the numbers were close I'll give brackets:
99+%: AWPim (99.87%), AWPex, CdlA, C//Aim, VFAR3, C//Aex, ICA, VBVer, DMC
98+%: margins, IBIFA (all 3 in a row), MDDA, MCA, AER, Black, MAP, CWP, DAC,
MAMPO, MMPO, WV
97+%: QR, Bucklin, QLTD, VBVst, Baldwin, IRNR, Range
96+%: Raynaud, TTR, SMDTR, DHSC, TACC, Coombs, KH
95+%: Approval, C//IRV
94+%: DSCer(??), C//KH, IRV
rest: IFPP 91.5%, DSC 90.3%, 2sMMPO 87.7%, VDP 82.5%, Borda 81.2%, FPP 80.8%,
RndP 66.2%, Antip 55.9%, RB 45.8%.
Before we throw AWP a parade I want to note that I discovered it can fail the
Plurality criterion. I'm not sure this will be acceptable to everyone.
SINCERE CONDORCET LOSERS
The results for this were probably even better. The only methods above 1% were
RB 27.9%, Borda 1.6%, and Antip 1.0%. The best methods were DMC and VFAR3 (I'm
seeing 0%), followed by WV, TTR, SMDTR, IRV, MAMPO, margins, DAC... Very tiny
numbers.
UTILITY MAXIMIZERS
How often did the method elect the utility maximizing candidate? This can be
easily bracketed also.
best: Borda 82.5%.
79+%: VFAR3, CdlA, MAP, C//Aim, AWPim, AWPex, 2sMMPO, VBVer, ICA
78+%: C//Aex, DMC, MAMPO, IBIFA (all), QR, MMPO, MDDA, CWP, margins, Black,
MCA, VBVst
77+%: SMDTR, AER, IRV, Range, DAC, IRNR, TACC, Bucklin, WV, KH, QLTD, TTR,
Baldwin, Raynaud, C//KH, C//IRV
76+%: Approval
75+%: Coombs, DSCer(? again unsure why better than DSC)
rest: Antip 74.3%, IFPP 72.4%, DSC 71.8%, VDP 64.8%, FPP 62.1%, RndP 59.5%, RB
39.7%.
UTILITY MINIMIZERS
How often did the method elect the worst candidate wrt utility? This is not
exactly the reverse of the previous list.
worst: RB 26.3%, RndPair 4.2%, FPP 4.0%, VDP 3.3%, Borda 2.6%, IFPP 2.2%, DSC 2.0%
1.0+%: IRV 1.6%, C//KH, C//IRV, TTR, KH, Coombs, Antip, TACC
0.5+%: Raynaud, Approval, IRNR, DSCer, SMDTR, 2sMMPO, Baldwin, Range, QLTD
0.3+%: WV, MMPO, CWP, Bucklin, MAMPO, DAC, MCA
0.2+%: margins, AER, VBVst, MDDA, Black, DMC, IBIFA (er and cb)
rest: QR, IBIFAst, VFAR3, C//Aex, MAP, C//Aim, ICA, CdlA, VBVer, AWPex, AWPim.
No perfect scores.
UTILITY AND REGRET
I defined regret as the absolute shortfall from the best available option.
Sorting these lists doesn't give a different order for this data. Results will
be the average utility of the winning candidate.
best: Borda (65.2), MAP (65.1), CdlA, AWPex, AWPim, VBVer, VFAR3, C//Aim, ICA,
C//Aex, DMC (65.0), IBIFAcb, MDDA, MAMPO, IBIFAst, margins, CWP3
64+: Black, IBIFApw, Range, 2sMMPO, MMPO, TACC, SMDTR, MCA, VBVst, DAC, AER,
QR, QLTD, IRNR, WV, Bucklin, IRV, Raynaud, Baldwin, KH, TTR, C//IRV, C//KH,
Approval, DHSC, DSCer, Coombs IFPP, DSC
rest: Antip (63.2), VDP, FPP (62.5), RndP (61.0), RB (54.4).
VOTE STABILITY
I tracked the avg percentage of voters who wanted to change their vote from
poll to poll. At the high end I think we have some garbage values due to there
being many ways to vote that don't really change anything. For example, IFPP
scored 45.1%, which is hard to understand. The voters presumably see a
difference between different lower preferences when actually they are not
affecting anything.
The low end may be more informative. RB had .04%, FPP .25%, Antip .28%,
RndPair .29%, Borda .88%, Approval 3.1% (a bit surprisingly low I thought),
Coombs 3.8%, MAP 5.2%, Black and Baldwin 6ish%, Range 9.5%, IBIFAs 12.3%, TACC
12.6%, DAC 12.7%, ICA 13.3%.
SPOILERS
I wanted to know what percentage of voters perceived that one of the losing
candidates had spoiled the outcome. The voter assumes that it would've been a
two-on-two FPP race without the one candidate.
RB 34.7%, Antip 26.0%, RndPair 21.1%, FPP 11.8%, Borda 10.6%, VDP 10.6%,
2sMMPO 6.8%, DSC 5.9%, IFPP 5.2%, IRV 3.3%, C//KH 3.2%, C//IRV 2.9%, Approval
2.7%, KH 2.6%, Coombs 2.4%, TACC 2.2%, SMDTR 2.0%, TTR 2.0%, Raynaud 1.9%,
Range 1.7%, IRNR 1.7%, QLTD 1.4%, VBVst 1.4%, Bucklin 1.3%, QR 1.3%, WV 1.1%,
MMPO 1.1%, DAC 1.1%, MAMPO 1.0%, CWP .95%, MCA .79%, MDDA .77%, IBIFAs .6-
.69%, margins .61%, DMC .53%, VBVer .35%, C//Aex .27%, ICA .27%, C//Aim .22%,
VFAR3 .21%, CdlA .15%, AWPex .13%, AWPim .07%.
SPOILERS BY 3RD PLACE ONLY
I also wanted to see how many voters felt the race had been spoiled by the
candidate who placed last in top rankings/ratings (and who also did not win
himself).
Worst was Antip 20.4%, RB 19.5%, RndP 11.5%, Borda 6.4%, 2sMMPO 4.0%,
Approval .66%, SMDTR .64%, IRV .57%, Range .35%, C//KH .33%,
MAMPO .32%, ...skip a bit... MCA .096%, ICA, CWP, QLTD, DSC, Bucklin, QR, DAC,
VBVst, FPP (!), VFAR3, C//Aex, AER, C//Aim, CdlA, WV, DMC, IFPP (.029%), AWPex
(.0055%), AWPim, margins (!), TTR.
As far as IRV having *any*: This should be because the presence of the third
candidate affected others' strategy, so it wasn't as simple as eliminating the
3rd candidate and being done with him. As far as FPP doing really well: Surely
this is because nobody was foolish enough to try voting for 3rd place, so,
spoiler averted.
You may ask then: If voters are smart enough to try to avoid spoilers, are
these metrics any good? It's little consolation that last place isn't spoiling
the election when this occurs because he isn't getting any votes. So let's try
one final metric:
TOP RANKINGS/RATINGS OF THIRD PLACE
In other words, for the candidate who received the fewest "TR"s, how many did
he get on average?
Antip was the highest at 47.1%, which is not surprising since you aren't
allowed to vote for fewer than two candidates.
Next were 2sMMPO 33.2%, Approval 29.5%, Range 27.2%, SMDTR 26.2%, MCA 24.6%,
MMPO 23.3%, MAMPO 22.6%, Raynaud 21.9%, CWP 21.6%, WV 20.6%, VBVer 20.3%, MDDA
20.4%, ICA 19.1%, TACC 19.1%, AWPim 18.9%, IRNR 18.9%, CdlA 18.7%, AWPex
18.4%, C//Aim 18.2%, DMC 17.4%, C//Aex 16.9%, RB 16.9%, Coombs 15.8%, AER
15.3%, IBIFAcb/pw 15.1%, MAP 14.8%, DAC 14.5%, margins 14.3%, Bucklin 13.6%,
IBIFAst 13.5%, C//KH 13.5%, KH 12.9%, QLTD 12.4%, Black 12.3%, C//IRV 12.0%,
VBVst 11.8%, VFAR3 11.6%, QR 10.8%, IRV 10.4%, TTR 9.7%, Borda 9.2%, DSC 6.7%,
IFPP 6.8%, VDP 3.5%, FPP 2.3%.
Some of the methods allow compression, and multiple top rankings. It is a bit
unclear what this percentage "should" be. My hunch is that it should be at
least as high as the RB value of 16.9%. If it's below that then it must be
that some candidates are not getting the full top ratings/rankings they would
expect, most probably because their supporters view it as unhelpful to vote
for them.
----
Note that these figures are only gathered for the trials where the AI has
supposedly already learned how the method works. Practice polls and
hypothetical polls aren't included.
Also, I can already say that the rankings I got out of a random (non-spectrum-
based) scenario are not quite the same as the ones here. I'll have to talk
about that later.
Thanks for reading, and any thoughts.
Kevin Venzke
More information about the Election-Methods
mailing list