[EM] Reduction for rated multiwinner methods

Fri Mar 27 06:16:29 PDT 2015

On 03/09/2015 10:33 PM, Alexander Praetorius wrote:
> I am not sure if i reall ask for whatever zou ean bz "continuous
> election", but from what you write, i for sure might ask for it compared
> to the other option you give me, which would be a "candidate election".
>
> To express it again in my words. There are "a lot of voters" who can
> each vote on a "number" in the interval of ]-oo,+oo[ and they can change
> their vote in every "tick", where each "tick" might be a few nanoseconds
> after the former tick.
> The "result" of each "tick" is calculated from all the numbers voted on
> in each "tick".
> The "result" will be a number in the interval of [-B, A], where the
> numbers B and A are calculated from the "overall picture of votes" in
> each tick, maybe also taking into account former "ticks".
>
> I'm searching for a method that calculates a mean, where extreme votes
> w>A or w<-B are counted with the value of A or B respectively.
> If a voter does NOT CHANGE his vote in the next "tick" it will be
> counted with the value of the former "tick". (Think of a "tick" as a
> "micro period of a few nanoseconds")
>
> If a vote w>A is casted and a future "tick" changes the value of A, the
> value with which this vote w>A will be counted will adapt to the new A.
> Only if the new A will make w <= A, then w will NOT be affected by A,
> because it is well within the current Interval of [-B,A]

That does seem like what I'm calling a continuous election. It could be 
discrete (if you want an integer result), but in that case, it would be 
easy to convert to it by rounding.

The setting you give, where extreme votes are disregarded or clipped, 
seems to be close to robust statistics: you want to find out some value 
or estimate (in your case, of the wishes of the voters) without having 
extreme ("outlier") votes affect the result too much. The statistical 
perspective makes no assumptions on what the voters may decide to do, as 
long as the votes are extreme. So methods taken from robust statistics 
should resist extreme-value strategy as long as not too many employ the 
strategy.

The two most common approaches to deal with extreme values are trimming 
and Winsorizing; and of these, trimming is most common. For both of 
these, your A and B limits come from the number that the lowest or 
highest x percentile voted, for some x. This lets one adjust sensitivity 
of the method against resilience to strategy (I'll get back to the 
implications of that). Both of them also reduce to methods equivalent to 
the median when you set x to the highest level possible.

So in both methods, you have an A and B. Say A is the lower of the two, 
i.e. the lower limit. Then if you use trimming, that will throw away 
every vote with a rating less than A or greater than B. That doesn't 
mean that these are not taken into account. If someone votes just below 
the current A, and then two others come along and votes a very low 
value, then because A is based on the percentile (x% from the bottom), 
it will shift and the first voter's vote will then be counted.

As an example, consider the votes [-10, -8, -5, 3, 6, 7, 10] and suppose 
that A and B are set to the value of the lower and upper 36% of the 
votes. This turns out to be the third from either end, so the trimmed 
mean throws away the two most extreme votes on either end, while the 
Winsorized method clips their values to A and B.

As it stands, the trimmed mean is the mean of [-5, 3, 6] = 1.33, and the 
Winsorized mean is the mean of [-5, -5, -5, 3, 6, 6, 6] = 0.86. Now 
suppose two voters introduce the extreme value of 1000 each. I've set 
the 36% value so that this still is equivalent to removing the two 
extremes on either end, even after adding two votes.

After the voters have done so, the full vote list is [-10, -8, -5, 3, 6, 
7, 10, 1000, 1000]. The trimmed mean is the mean of [-5, -3, 6, 7, 10] = 
3, and the Winsorized mean is the mean of [-5, -5, -5, 3, 6, 7, 10, 10, 
10] = 3.44.

Note that in the trimmed case, the 1000-votes were never directly 
altered the mean. However, they had an indirect effect by shifting the 
window so that the votes with rating 10 were now included. If more 
people contributed, voting for, say, 5 and 6, then the rating of 10 
might again be pushed back behind the curtain, as it were.

-

You've mentioned that one of the things you find problematic with median 
and trimmed/Winsorized methods is that they might lead to sudden 
changes. Consider an example like:

[-100 -100 -50 50 100 200].

The median might swing to -100 to 100 if either of these gain a 
majority, and this swing, you say, might be too extreme for the voters.

But there's an unavoidable tradeoff here. The method itself can't know if

[-100 -100 -50 50 100 200]

means that there are six honest voters and their real consensus is 16.7 
(the total mean here), or if it means that there are two honest voters 
and four strategizing extreme voters, and the most extreme voter on the 
+ side just happened to write down a larger number than his counterpart 
on the - side.

If it is the former, then the method should take all the votes into 
account. If it is the latter, then it should modify the extreme votes so 
the strategy does not pay off. And if it is the latter, then it should 
be as unaffected by those extreme votes as possible. In other words, it 
has to have a sudden change because it treats the example above as [-50 
50] and any ordinary voting system using the mean will have a rather 
sudden change when you add, say a 100 to a list of (-50, 50).

Hence, the more the method ignores extreme values, the more it is prone 
to shift when additional ballots show that something that used to be 
considered extreme no longer should be. Winsorized methods are a little 
softer in that regard, but you can still contrive settings where the 
jump is rather dramatic, e.g.

[-10^9 -10^9 -1 0 0 1 10^9 10^9]

where the limit is set so that the one-billion votes are clipped to -1 
and 1 respectively. Add enough 10^9 votes on the right side and the 
method will suddenly shift from 1 to something greater than a million.

-

Finally, I'd like to answer your Winsorizing question, and then argue in 
favor of the median:

> This goes in the right direction.
> But: What if current votes would be [-100, -99,
> -10,-10,-10,-8,7,10,10,10,99,100] or in a more extreme version
> [-10,-10,-10,-10,-10,-8,7,10,10,10,10,10] ?
>
>     What could be the "mean" in those two examples?

> How would that be affected, if the voter who chooses his vote to be
> weight w=-8 to switch to 100?

Let's take the first one first. That is,
[-100, -99, -10,-10,-10,-8,7,10,10,10,99,100].

And let's set A and B to the third from each end, in this case 10 and -10.

Then the Winsorized mean is the mean of
[-10, -10, -10,-10,-10,-8,7,10,10,10,10, 10] = -0.0833 as above. Note 
that 99 and 100 were clipped to 10, and -99 and -100 were clipped to -10.

Now, suppose the -8 voter altered his vote to 100. Now the full thing is
[-100, -99, -10,-10,-10,7,10,10,10,99,100,100]. The votes at third from 
each end are -10 and 99 respectively, so the Winsorized mean is the mean of
[-10, -10, -10,-10,-10,7,10,10,10,99,99,99]

which is 23.67. If the -8-voter altered his vote to a million, the 
Winsorized mean would still be 23.67.

This might be what you desired, but suppose that the 99 vote was also 
strategic. Then you'd want to have a less sensitive method, e.g. one 
that sets A and B to the fourth from each end. If you did that, you'd get:

Winsorized mean first time around:
[-10, -10, -10,-10,-10,-8,7,10,10,10,10, 10] = -0.0833
Full thing after -8 becomes 100:
[-100, -99, -10,-10,-10,7,10,10,10,99,100,100], so A and B are -10 and 
10 respectively, and the modified list is
[-10, -10, -10,-10,-10,7,10,10,10,10, 10]
which gives a Winsorized mean of 1.417.

Here you can see that the further towards the center you set the 
barriers, the more it takes to change the value. Yet, since these are 
all responsive to the people, it's obvious that with enough added votes, 
the result *will* change. That's true for the median as well.

And just for completion's sake, I'll do the other one as well.
[-10,-10,-10,-10,-10,-8,7,10,10,10,10,10]

Take the third from each end as A and B to give A = -10, B = 10, so the 
Winsorized mean is here the exact same thing as the ordinary mean, 
namely -0.0833.

Now suppose the -8 voter switches to 100:
[-10,-10,-10,-10,-10,7,10,10,10,10,10,100]

Again A and B are -10 and 10 respectively, so the mean is the mean of
[-10,-10,-10,-10,-10,7,10,10,10,10,10,10] = 1.417. It would have been so 
even if the -8-voter switched to a million.

(Do note that as the number of votes increase, the A and B spots will 
change from being "third from each end". But here we're dealing with the 
same number of votes every time, so I've mentioned the cutoff in terms 
of list index rather than percentile for simplicity's sake.)

-

The median is an extension of majority rule in this way: suppose you're 
deciding on something like a tax rate. If you pick something that's 
greater than the median, at least a majority would prefer a lower rate. 
If you pick something that is less than the median, at least a majority 
wouldn't mind paying more.

So if you want majority rule, where one man has one vote, i.e. that all 
those who desire a lower rate than what you proposed pull equally hard, 
and all those who desire a higher rate than what you proposed pull 
equally hard in the other direction, then median is the way to go.

If you want to incorporate the strength of preference and depart from 
majority rule - e.g. if someone who desires a rate of 0% should pull 
harder than someone who desires a rate of 10% if your proposal is 20%, 
then the above is no longer applicable. But strategy means that you 
might not be able to trust the voters' expressions: they might pick a 
more extreme number just to pull more strongly. Thus you get into the 
strategy concern above: the more attention you pay to strength, the more 
vulnerable your method is to extreme positions. So there is a balance, 
and that is particularly true when you don't set any limits on the 
numbers that may be submitted as votes.