[EM] Another 3-slot method

Fri Feb 20 17:01:22 PST 2004

Each ballot in the ballot set B is converted into a vector of ones, zeros,
and negative ones in the most natural way that you can imagine.

Next, a special set S of vectors, each of which has all components set to
zero except two components, a one and a zero, is also created.

The "distance" from each ballot vector to each member of S is computed.

The member of S whose sum of distances from the ballot vectors is smallest
picks the winner (the candidate corresponding to the one) and the loser
(the candidate corresponding to the minus one).

Note that if there are N candidates, the set S has N*(N-1) vectors, so
this method is summable in a data structure of that size.

Different ways of gauging distance give rise to different variations of
the method.

The Hamming distance (i.e. sum of absolute differences in vector
components) is unacceptable because it gives too much weight to large
families of clones. [Each member of the family gets a full component all
to itself, and all of those components receive full weight in the Hamming
distance calculation.]

The "root mean square" or Euclidean distance suffers from the same
problem.

The "sup norm" or L_infinity norm distance (i.e. the max of the absolute
differences in components) solves the clone problem, but is too
insensitive, in my opinion.

Here's a "distance" that is (in my humble opinion) much more appropriate:

Let v and w be two vectors whose components are all members of the set

     {-1,0,1} .

The distance from v to w is a weighted sum of the absolute differences in
components, where the weight p_k for the k_th component is chosen as
follows:

Let x1 and y1 be the number of components equal to one in v and w,
respectively.  Let x2 and y2 be the number of minus ones in v and w
respectively.

If v_k is a one and w_k is a zero, then the weight p_k is 1/x1.

If v_k is a one and w_k is a minus one, then p_k is 1/(x1+y2).

If w_k is a one and v_k is zero, then the weight p_k is 1/y1.

If w_k is a one and v_k is a zero, then p_k is 1/(y1+x2).

If v_k is a minus one and w_k is zero, then p_k is 1/x2.

If w_k is a minus one and v_k is zero, then p_k is 1/y2.

In all other cases p_k is zero.

Note that the only way that the weighted sum can be zero is if the two
vectors v and w are identical.

The intent of this rather elaborate scheme of weights is to force a family
of clones to share their weights, so that they do not have undue influence
on the distance.  [Members of the same family tend to end up in the same
slot, so (the non-zero) slots have their weights divvied up among the
co-habitants of the slot.]

Perhaps somebody can take the idea for this distance and simplify it, or
at least simplify its description.

Generally speaking, methods that minimize sums of distances to the winning
vector tend to have high Condorcet efficiency.  When the ballot space is
"linear" (with respect to the distance) then the median ballot is the one
that minimizes the sum of distances, so minimizing sums of distances in
cases where there is no CW can be thought of as generalizing the one
dimensional CW in a different direction.

Kemeny's method works along these lines, but his "distance" is as
sensitive to clones as the Hamming distance, and for similar reasons.

In Kemeny's case the set S is replaced by the set P of all permutations of
the candidates, and the distances between permutations (i.e. between
ballot rankings and members of P) are calculated as the number of
transpositions required to convert one permutation into the other.

Imagine the number of transpositions required to move a candidate through
a large family of clones, and you can see the problem with this way of
measuring distance.

That's all for now.

Enjoy!

Forest