[EM] How close to IIAC?

Fri Apr 23 13:51:16 PDT 2010

Suppose that voters fill out questionnaires of twenty or more yes/no answers.  What is a good way to 
calculate the “distance” between questionnaires?  (Remember this is the key to getting an IIAC 
compliant voting system.)

The big problem is that some of the questions are apt to be clones of each other.  Suppose for example 
that of the twenty questions on a questionnaire, the first fifteen were basically the same question in 
disguise.  Then almost all of the voters who voted yes on the first question would answer yes on the next 
fourteen questions, which would make those fourteen questions not only redundant, but would also 
distort the perception of distance between questionnaires if you used any of the standard metrics 
(Hamming, Euclidean, etc.) on sets of vectors of zeroes and ones.

First suggestion:   
Have each voter assign weights to the questions to reflect their relative  importance to that voter.  Then 
normalize the weights so that they add to 100.  Then given two questionnaires  q1 and q2, the semi-
metric
rho(q1,q2) is the sum of the q1’s weights on all of the questions that q1 and q2 disagree.  This is a 
measure of how far the q1 voter thinks that the q2 voter differs from her on important questions.   In this 
first suggestion the proposed metric is

   d(q1,q2)=rho(q1,q2)+rho(q2,q1).

Second suggestion:
1.	Create a binary tree with the questionnaires as the leaves and a subset of the questions as 
nodes as follows.  The root node is the question on which the voters are most evenly balanced (break 
ties randomly).  Each subsequent node X is the question on which the voters that answered correctly to 
arrive at that node are most evenly divided (breaking ties randomly).
2.	Once all of the questionnaires have been classified as leaves on this binary tree.  Assign to 
each question a weight equal to the probability that a random leaf has that question as an ancestor.
3.	The distance between two questionnaires is the sum of the weights of the questions on which 
they differ.

Any other good ideas?