[EM] Reduction for rated multiwinner methods
Kristofer Munsterhjelm
km_elmet at t-online.de
Fri Mar 27 06:16:29 PDT 2015
On 03/09/2015 10:33 PM, Alexander Praetorius wrote:
> I am not sure if i reall ask for whatever zou ean bz "continuous
> election", but from what you write, i for sure might ask for it compared
> to the other option you give me, which would be a "candidate election".
>
> To express it again in my words. There are "a lot of voters" who can
> each vote on a "number" in the interval of ]-oo,+oo[ and they can change
> their vote in every "tick", where each "tick" might be a few nanoseconds
> after the former tick.
> The "result" of each "tick" is calculated from all the numbers voted on
> in each "tick".
> The "result" will be a number in the interval of [-B, A], where the
> numbers B and A are calculated from the "overall picture of votes" in
> each tick, maybe also taking into account former "ticks".
>
> I'm searching for a method that calculates a mean, where extreme votes
> w>A or w<-B are counted with the value of A or B respectively.
> If a voter does NOT CHANGE his vote in the next "tick" it will be
> counted with the value of the former "tick". (Think of a "tick" as a
> "micro period of a few nanoseconds")
>
> If a vote w>A is casted and a future "tick" changes the value of A, the
> value with which this vote w>A will be counted will adapt to the new A.
> Only if the new A will make w <= A, then w will NOT be affected by A,
> because it is well within the current Interval of [-B,A]
That does seem like what I'm calling a continuous election. It could be
discrete (if you want an integer result), but in that case, it would be
easy to convert to it by rounding.
The setting you give, where extreme votes are disregarded or clipped,
seems to be close to robust statistics: you want to find out some value
or estimate (in your case, of the wishes of the voters) without having
extreme ("outlier") votes affect the result too much. The statistical
perspective makes no assumptions on what the voters may decide to do, as
long as the votes are extreme. So methods taken from robust statistics
should resist extreme-value strategy as long as not too many employ the
strategy.
The two most common approaches to deal with extreme values are trimming
and Winsorizing; and of these, trimming is most common. For both of
these, your A and B limits come from the number that the lowest or
highest x percentile voted, for some x. This lets one adjust sensitivity
of the method against resilience to strategy (I'll get back to the
implications of that). Both of them also reduce to methods equivalent to
the median when you set x to the highest level possible.
So in both methods, you have an A and B. Say A is the lower of the two,
i.e. the lower limit. Then if you use trimming, that will throw away
every vote with a rating less than A or greater than B. That doesn't
mean that these are not taken into account. If someone votes just below
the current A, and then two others come along and votes a very low
value, then because A is based on the percentile (x% from the bottom),
it will shift and the first voter's vote will then be counted.
As an example, consider the votes [-10, -8, -5, 3, 6, 7, 10] and suppose
that A and B are set to the value of the lower and upper 36% of the
votes. This turns out to be the third from either end, so the trimmed
mean throws away the two most extreme votes on either end, while the
Winsorized method clips their values to A and B.
As it stands, the trimmed mean is the mean of [-5, 3, 6] = 1.33, and the
Winsorized mean is the mean of [-5, -5, -5, 3, 6, 6, 6] = 0.86. Now
suppose two voters introduce the extreme value of 1000 each. I've set
the 36% value so that this still is equivalent to removing the two
extremes on either end, even after adding two votes.
After the voters have done so, the full vote list is [-10, -8, -5, 3, 6,
7, 10, 1000, 1000]. The trimmed mean is the mean of [-5, -3, 6, 7, 10] =
3, and the Winsorized mean is the mean of [-5, -5, -5, 3, 6, 7, 10, 10,
10] = 3.44.
Note that in the trimmed case, the 1000-votes were never directly
altered the mean. However, they had an indirect effect by shifting the
window so that the votes with rating 10 were now included. If more
people contributed, voting for, say, 5 and 6, then the rating of 10
might again be pushed back behind the curtain, as it were.
-
You've mentioned that one of the things you find problematic with median
and trimmed/Winsorized methods is that they might lead to sudden
changes. Consider an example like:
[-100 -100 -50 50 100 200].
The median might swing to -100 to 100 if either of these gain a
majority, and this swing, you say, might be too extreme for the voters.
But there's an unavoidable tradeoff here. The method itself can't know if
[-100 -100 -50 50 100 200]
means that there are six honest voters and their real consensus is 16.7
(the total mean here), or if it means that there are two honest voters
and four strategizing extreme voters, and the most extreme voter on the
+ side just happened to write down a larger number than his counterpart
on the - side.
If it is the former, then the method should take all the votes into
account. If it is the latter, then it should modify the extreme votes so
the strategy does not pay off. And if it is the latter, then it should
be as unaffected by those extreme votes as possible. In other words, it
has to have a sudden change because it treats the example above as [-50
50] and any ordinary voting system using the mean will have a rather
sudden change when you add, say a 100 to a list of (-50, 50).
Hence, the more the method ignores extreme values, the more it is prone
to shift when additional ballots show that something that used to be
considered extreme no longer should be. Winsorized methods are a little
softer in that regard, but you can still contrive settings where the
jump is rather dramatic, e.g.
[-10^9 -10^9 -1 0 0 1 10^9 10^9]
where the limit is set so that the one-billion votes are clipped to -1
and 1 respectively. Add enough 10^9 votes on the right side and the
method will suddenly shift from 1 to something greater than a million.
-
Finally, I'd like to answer your Winsorizing question, and then argue in
favor of the median:
> This goes in the right direction.
> But: What if current votes would be [-100, -99,
> -10,-10,-10,-8,7,10,10,10,99,100] or in a more extreme version
> [-10,-10,-10,-10,-10,-8,7,10,10,10,10,10] ?
>
> What could be the "mean" in those two examples?
> How would that be affected, if the voter who chooses his vote to be
> weight w=-8 to switch to 100?
Let's take the first one first. That is,
[-100, -99, -10,-10,-10,-8,7,10,10,10,99,100].
And let's set A and B to the third from each end, in this case 10 and -10.
Then the Winsorized mean is the mean of
[-10, -10, -10,-10,-10,-8,7,10,10,10,10, 10] = -0.0833 as above. Note
that 99 and 100 were clipped to 10, and -99 and -100 were clipped to -10.
Now, suppose the -8 voter altered his vote to 100. Now the full thing is
[-100, -99, -10,-10,-10,7,10,10,10,99,100,100]. The votes at third from
each end are -10 and 99 respectively, so the Winsorized mean is the mean of
[-10, -10, -10,-10,-10,7,10,10,10,99,99,99]
which is 23.67. If the -8-voter altered his vote to a million, the
Winsorized mean would still be 23.67.
This might be what you desired, but suppose that the 99 vote was also
strategic. Then you'd want to have a less sensitive method, e.g. one
that sets A and B to the fourth from each end. If you did that, you'd get:
Winsorized mean first time around:
[-10, -10, -10,-10,-10,-8,7,10,10,10,10, 10] = -0.0833
Full thing after -8 becomes 100:
[-100, -99, -10,-10,-10,7,10,10,10,99,100,100], so A and B are -10 and
10 respectively, and the modified list is
[-10, -10, -10,-10,-10,7,10,10,10,10, 10]
which gives a Winsorized mean of 1.417.
Here you can see that the further towards the center you set the
barriers, the more it takes to change the value. Yet, since these are
all responsive to the people, it's obvious that with enough added votes,
the result *will* change. That's true for the median as well.
And just for completion's sake, I'll do the other one as well.
[-10,-10,-10,-10,-10,-8,7,10,10,10,10,10]
Take the third from each end as A and B to give A = -10, B = 10, so the
Winsorized mean is here the exact same thing as the ordinary mean,
namely -0.0833.
Now suppose the -8 voter switches to 100:
[-10,-10,-10,-10,-10,7,10,10,10,10,10,100]
Again A and B are -10 and 10 respectively, so the mean is the mean of
[-10,-10,-10,-10,-10,7,10,10,10,10,10,10] = 1.417. It would have been so
even if the -8-voter switched to a million.
(Do note that as the number of votes increase, the A and B spots will
change from being "third from each end". But here we're dealing with the
same number of votes every time, so I've mentioned the cutoff in terms
of list index rather than percentile for simplicity's sake.)
-
The median is an extension of majority rule in this way: suppose you're
deciding on something like a tax rate. If you pick something that's
greater than the median, at least a majority would prefer a lower rate.
If you pick something that is less than the median, at least a majority
wouldn't mind paying more.
So if you want majority rule, where one man has one vote, i.e. that all
those who desire a lower rate than what you proposed pull equally hard,
and all those who desire a higher rate than what you proposed pull
equally hard in the other direction, then median is the way to go.
If you want to incorporate the strength of preference and depart from
majority rule - e.g. if someone who desires a rate of 0% should pull
harder than someone who desires a rate of 10% if your proposal is 20%,
then the above is no longer applicable. But strategy means that you
might not be able to trust the voters' expressions: they might pick a
more extreme number just to pull more strongly. Thus you get into the
strategy concern above: the more attention you pay to strength, the more
vulnerable your method is to extreme positions. So there is a balance,
and that is particularly true when you don't set any limits on the
numbers that may be submitted as votes.
More information about the Election-Methods
mailing list