[EM] Dectecting Clone Sets
Kristofer Munsterhjelm
km-elmet at broadpark.no
Mon Nov 16 23:41:55 PST 2009
fsimmons at pcc.edu wrote:
> Here's a suggestion for detecting clone sets based on Range Ballots:
>
> Define the distance between two candidates as the square root of the
> sum (over the ballots) of the squared diffference of their respective
> ratings.
>
> If the ballots are approval style, this becomes the square root of
> the number of ballots on which just one of the two candidates is
> approved.
>
> Use these distances to do a cluster analysis of the candidates.
>
> The tree structure of the clusters gives the tree structure of the
> clone sets.
>
> If you have a clone dependent method, like Copeland, that you would
> like to "de-clone," you might find this clone set detectiion method
> to be useful.
I wonder if that could be used to fix the problem of my Kemeny-based
forced clustering method, which is that it degrades when the "clusters"
(supporters of various candidates) overlap. It still beats STV (and even
QPQ) according to my simulations, but improvement is improvement, no?
The problem is this: If you have 2 winners, and everybody votes bloc
style for one of two "virtual parties", then the clustering method
minimizes sum distance by allocating every voter to the right bloc.
However, if we add another seat and candidate, call the candidate A, and
A is unanimously voted top, then we'd ideally want the method to pick A,
and then pick the other seats as if A had not been in the running*.
Plain clustering doesn't do that: instead, one of the seats will have A
elected (as he deserves), but the minimization won't pick randomly from
later preferences, so the set of ballots not allocated to that seat
won't be representative of the ballots, had not A run.
One way to fix this is to use elimination, but that's hardly monotonic.
I've been considering adding a hierarchy to the clustering, which is
similar to what you're saying above, but I haven't quite found out how
to do that yet.
Since my multiwinner method uses rankings, I use Kemeny distance to
determine the distance between ballots and a "representative" ballot for
a cluster. Then I simply find the orderings that minimize the total
distance between ballots allocated to the clusters, and to the relevant
cluster's representative ballot, through brute force (which is why the
temporal complexity is so horrible).
In any case, even if your tree clustering idea can't be used for my
method, perhaps the idea of clustering by distance to remove clones
could be adapted to ranked methods by using the Kemeny distance between
ballot orders.
-
* This is true only if the distribution of additional preferences for
those who vote for A is equal to that of the voters in general, A
notwithstanding. There might be additional constraints as well, which is
why I'm using "unanimous A top" as an example - since removing A doesn't
alter the other candidate distributions in any way.
More information about the Election-Methods
mailing list