[EM] Some new utility simulations
Kevin Venzke
stepjak at yahoo.fr
Sun Jun 6 21:08:59 PDT 2010
Hello,
I've been working on a 3-candidate simulation on 1D or 2D issue space
that uses pre-election approval polls to inform truncation strategy in
rank methods.
The basic setup is the three candidates are bolted in place (i.e. only
one scenario is examined), and then there are 10k approval polls followed
by 10k elections. In all of these, groups of voters are distributed
randomly in a circle around the origin (in the 2D case) for each trial.
The approval polls are a tricky thing. The precision of the polls probably
has a significant effect on the eventual results from the elections.
Precision is affected by number of voting blocs, number of trials,
frequency of "reboots" in the trials. Reboots proved necessary because
the polling can fall into a rut that was not inevitable, and I wanted
to get the same outcome for the same setup every time.
When the polls are done it moves to the elections.
Here are the methods (roughly grouped) that are compared at the moment:
A. Approval: Using zero-info above-mean strategy (ApprZIS) and using the
results from the polling (ApprPoll).
B. Range: Normalized sincere (RangeNS).
C. Sincere, strictly ranked methods using no strategy:
FPP, MinMax, IRV, DSC, QR, VFA, SPST.
D. Methods using truncation strategy based on the approval polls:
MinMax(wv), MinMax(margins), MMPO, C//A, Conditional Approval (CdlA),
DAC, Bucklin.
E. LNHarm methods also using truncation:
IRV-tr, DSC-tr, QR-tr.
I have implemented a couple of other methods but omitted them either
because they were worse than expected (Raynaud, 2-slot MMPO) or it was
too unclear that they would be voted sincerely (Maj//Antiplurality).
I'm not exactly sure what results to gather from this sim. It's clear
that the choice of scenario has a great effect on the quality of the
various methods, but I can't think of a comprehensive way of evaluating
the situation in general.
Here's a basic 1D scenario with candidates at -50, 0, and 50. (The voters
are placed at +-100 on each axis available, not to exceed Euclidean
distance of 100 from the origin.)
BEST 1 0 48.16706 0
Bucklin .8856 .0007 48.45712 1.2
MMstrict .8852 .0007 48.45847 1.2
CdlA .8852 .0007 48.45847 1.2
DAC .8851 .0007 48.45937 1.2
RangeNS .8285 0 48.73432 2.4
ApprZIS .7666 0 49.30137 4.9
ApprPoll .7787 0 49.31736 5
DSC .7036 .0034 49.48676 5.7
C//A .71 .0209 49.73658 6.8
MMWV .7097 .0209 49.7375 6.8
QR .6617 .0145 50.05157 8.2
MMmarg .6546 .0539 50.26441 9.1
SPST .5443 .0154 50.91098 11.9
IRV .5622 .0769 51.27729 13.5
IRV-tr .5231 .0515 51.33794 13.8
QR-tr .4975 .0547 51.59578 14.9
MMPO .5885 .1356 51.64151 15.1
VFA .4448 .0778 52.13669 17.3
DSC-tr .4198 .0571 52.22204 17.7
FPP .4192 .0579 52.23298 17.7
WORST 0 1 71.07445 100
"BEST" and "WORST" are what you get if you consistently elect the best
and worst candidate in terms of utility (average distance from voters,
lower being better).
So each method has four values after it: How often the method elected
the best candidate, how often the worst, the average distance of the
winner (the utility), and a normalization of this so that BEST is 0 and
WORST is 100.
The candidates were fairly evenly distributed. If we want some center
squeeze we could try -10 0 10:
BEST 1 0 48.37085 0
DAC .9499 .0072 48.38398 .3
Bucklin .9444 .007 48.3858 .3
MMstrict .9406 .0052 48.38713 .3
CdlA .9406 .0052 48.38713 .3
DSC .9292 .0068 48.39459 .5
QR .9079 .0086 48.41177 .9
SPST .9075 .0086 48.41229 .9
RangeNS .8834 0 48.41618 1
C//A .9021 .0178 48.41858 1.1
MMWV .8963 .0178 48.42112 1.2
QR-tr .8592 .0222 48.44625 1.8
FPP .8505 .0243 48.4507 1.9
IRV-tr .8507 .0308 48.45502 2
DSC-tr .8505 .0291 48.45574 2
MMmarg .8516 .0445 48.46732 2.3
MMPO .8672 .0493 48.47064 2.4
IRV .8512 .0496 48.47273 2.4
VFA .8508 .0496 48.47325 2.4
ApprZIS .7799 .0023 48.51882 3.5
ApprPoll .6096 .0025 49.0423 16.1
WORST 0 1 52.52236 100
One thing I noticed is that Approval and Range tended to be good at not
electing the worst candidate. But they weren't always very good at
electing the best candidate.
Also, there may be something to Abd's thought that first-preferences
are a helpful indicator in finding the best winner. Notice that not
only did FPP beat IRV, but IRV with truncation beat IRV, which runs a
bit contrary to the typical thought that more information is better.
You can see that of these three FPP was least likely to pick the worst
candidate, and fully-ranked IRV was the most likely. (They were about
the same at picking the best candidate.)
Something to note is that while you might think FPP must be an
increasingly bad method as the center squeeze situation gets worse, it's
also true that the more severe center squeeze you have, the more uncertain
it is that the center candidate actually has the median voter's support.
Let's move the center voter to 8 to be a near-clone of the candidate at
10:
BEST 1 0 48.35805 0
MMstrict .9452 .0196 48.38671 .6
CdlA .9452 .0196 48.38671 .6
DSC .8889 .0204 48.3902 .7
QR .8842 .0224 48.39247 .8
SPST .8836 .0224 48.39254 .8
Bucklin .9041 .0308 48.3965 .9
DAC .872 .0308 48.40057 1
RangeNS .8365 .01 48.40567 1.1
DSC-tr .8322 .0448 48.41082 1.2
QR-tr .8348 .0455 48.41165 1.2
IRV-tr .8323 .046 48.41252 1.3
MMmarg .8298 .0463 48.4129 1.3
C//A .8304 .0423 48.41427 1.3
MMWV .8288 .0421 48.41595 1.4
IRV .8307 .0512 48.418 1.4
VFA .8301 .0512 48.41806 1.4
MMPO .8292 .0482 48.41813 1.4
FPP .8471 .0648 48.43528 1.8
ApprZIS .6941 .0242 48.46894 2.6
ApprPoll .6971 .0283 48.46909 2.6
WORST 0 1 52.4845 100
Happy to see QR doing pretty well here. Bucklin and DAC have been somewhat
dethroned. FPP is back to being bad (which makes sense due to the vote-
splitting) and Approval is still having problems electing the best
candidate.
If we try to confuse the methods by centering the candidates oddly we can
get an interesting result. Place the candidates at -50, -30, -20:
BEST 1 0 51.8419 0
DAC .9679 .0041 51.85693 .1
Bucklin .9674 .004 51.8572 .1
MMstrict .9647 .0036 51.8577 .1
CdlA .9647 .0036 51.8577 .1
DSC .9344 .005 51.88032 .3
QR .9193 .007 51.89808 .5
SPST .9175 .007 51.90079 .5
C//A .9357 .023 51.9166 .6
MMWV .933 .023 51.91767 .6
ApprPoll .9407 .0016 51.93407 .8
QR-tr .8892 .0216 51.94456 .9
FPP .8864 .021 51.94644 .9
IRV-tr .8884 .0233 51.94926 .9
IRV .8884 .0243 51.95284 .9
DSC-tr .8864 .0245 51.95435 1
VFA .8866 .0243 51.95555 1
RangeNS .8401 .0002 52.00223 1.4
MMmarg .8921 .0486 52.00734 1.4
MMPO .8819 .0815 52.15816 2.8
ApprZIS .2495 .0007 54.21854 21.4
WORST 0 1 62.9444 100
ApprZIS is predictably much worse than ApprPoll. The far left candidate
is not very viable but there is no way to know this. RangeNS also suffers
due to this. Other methods cope about the same as before since they are
informed by the polls or are not approval-based.
There are outcomes very different from these especially if we move to 2D.
Let's do that and try placing candidates at -10,0 10,5 10,-5, so that
we have two near-clones on the right who differ slightly on one axis.
BEST 1 0 65.42126 0
RangeNS .8353 .0276 65.60041 2.7
DSC .8014 .0512 65.70736 4.3
MMstrict .8014 .0441 65.72435 4.6
MMmarg .7879 .045 65.72974 4.7
C//A .7815 .0478 65.74102 4.9
ApprZIS .7527 .0423 65.75481 5.1
DSC-tr .7849 .0749 65.76045 5.2
QR .7912 .0556 65.76487 5.2
IRV-tr .7755 .0562 65.7662 5.2
MMWV .7739 .053 65.76801 5.3
DAC .758 .0597 65.77155 5.3
IRV .7733 .0604 65.79535 5.7
Bucklin .7381 .057 65.80159 5.8
QR-tr .7711 .0726 65.80724 5.9
SPST .782 .0875 65.81032 5.9
MMPO .7542 .058 65.81198 5.9
ApprPoll .7284 .0521 65.81401 6
VFA .7689 .0882 65.82866 6.2
CdlA .7612 .0649 65.83314 6.3
FPP .705 .188 66.30692 13.5
WORST 0 1 71.94286 100
RangeNS is now the best method and DSC is second! DAC, Bucklin, and
especially CdlA have fallen quite a ways relatively.
Here's another odd one: Put the candidates in a large triangle (as though
they're trying to avoid any voter's actual position) as -100,-100
100,-100 0,100:
BEST 1 0 112.2975 0
RangeNS .8921 .0084 113.244 2
ApprZIS .8772 .0135 113.5693 2.7
ApprPoll .8801 .0279 113.6294 2.9
Bucklin .8762 .0272 113.6771 3
DAC .8719 .0271 113.7457 3.1
DSC .8609 .0166 113.9418 3.6
SPST .8537 .0213 114.147 4
VFA .8538 .0213 114.1503 4
FPP .8537 .0216 114.1822 4.1
MMstrict .8499 .0201 114.1887 4.1
DSC-tr .8489 .0236 114.2393 4.2
IRV .8478 .0237 114.2898 4.3
QR .8459 .028 114.3784 4.5
QR-tr .831 .0264 114.6033 5
CdlA .8276 .036 114.7558 5.3
IRV-tr .8113 .031 114.9541 5.8
C//A .807 .0329 115.0151 5.9
MMmarg .8008 .0344 115.1501 6.2
MMWV .7893 .0439 115.4359 6.8
MMPO .778 .0644 115.921 7.9
WORST 0 1 157.8483 100
The Condorcet methods with truncation are the worst, and strictly-ranked
MinMax is actually beaten by FPP. The situation is odd enough that I
may look into it further.
One thing that strikes me so far is that none of the methods really seem
to be that bad! Of course that's a subjective judgment and depends a
bit on what scenarios you're interested in. But if one method has a
normalized score (far right value) of 4% and another is 2%, it's debatable
that you can call the former "twice as bad." The absolute amount of
difference probably very small, if you look at the third column instead.
For the most part the methods seemed to be fairly sensitive.
Kevin Venzke
More information about the Election-Methods
mailing list