[EM] Summable opinion space discovery.

Kristofer Munsterhjelm km-elmet at broadpark.no
Wed Sep 23 08:56:00 PDT 2009


While trying to find a solution to another problem, I discovered 
something that might be used to opinion space from ballots (and the 
candidates' position in that space) in a summable manner.

Consider a rating- or approval matrix m, where m{voter_1, A} is 
voter_1's rating or approval (0 or 1) of candidate A. Say there are v 
voters and c candidates. Each candidate can now be assigned a 
v-dimensional point according to the ratings by every voter, and we can 
calculate ballot similarity between two candidates by determining the 
distance (according to some metric) between the two candidates' assigned 
points.

"m" itself is obviously not summable, because it depends on the number 
of voters v. However, we might build a distance matrix q where q{A, B} 
is the distance between A and B according to the metric in question. As 
long as the metric itself is summable, q will be.

For instance, using Euclidean distance, q could contain the sum of the 
squares of the coordinate differences, and there could be another array 
(call it w) listing the number of voters who had any opinion about each 
candidate. Then q and w are summable, and we can determine true 
Euclidean distance by doing sqrt(q{A, B}), or normalized Euclidean 
distance (RMSE) by sqrt(q{A,B}/max(w{A},w{B})).

Summing q will only give us the relative distances, however, and not the 
v-dimensional data; but if there is a fixed dimensional opinion space 
and voters adhere to it, then we don't need the full v-dimensional data 
- if there are far more voters than dimensions in opinion space, then 
only the latter is of interest, and if there are fewer, we don't have 
enough information to construct opinion space in any case.

But even if we know opinion space is d-dimensional, v >> d, we still 
only have the relative distances. What we want is to pick points in 
d-dimensional space so that the difference between the distances with 
respect to the "synthesized" points and the recorded distances is 
minimized. In other words, to find c "synthetic" d-dimensional points 
(P.cand_1 ... P.cand_v) so as to solve, least squares wise

   a = sum 1..v
    b = sum 1..v
     dist(P.cand_a, P.cand_b) = q'{a, b}

where q' is the matrix q, post-processed as required (as with the 
Euclidean example there - by applying square root for ordinary 
Euclidean, or division and then square root for normalized Euclidean). 
I'm not sure if this problem can be feasibly solved, but one can reach 
local optima by using algorithms originally designed to determine 
network host location based on latency (relative distance), such as the 
Vivaldi synthetic coordinate algorithm.

-

The intuitive concept is this: if voters consistently rate two 
candidates differently (but not necessarily different in the same way), 
then the two candidates are probably at different positions in opinion 
space. In essence, the voters are "judges" from which one can determine 
the position. The q matrix business is just a way to make it summable.

-

There are two problems with my idea as mentioned above.
First, this won't be summable with all metrics. Say there's a "metric" 
that simply uses extraordinary precision to store all the v coordinates. 
Then this metric imposes on q a limitation that each individual number 
must be represented with a number of digits proportional to v. Even 
though the size of q is independent on v, the capacity required to 
transfer q now depends on v, and thus the system is no longer summable.

Second, consider two candidates that occupy the same place in "true" 
opinion space, but with one (A) being markedly inferior to the other 
(B). Then, voters will consistently rate the two differently, which will 
mean that q{A, B} will be large. Yet, they both occupy the same place in 
true opinion space. One way to handle this would be to normalize the 
ratings. Another would be to simply have d+1 dimensional space, where 
the d+1-th dimension is "ability". There might be hybrid solutions, 
where one forces the first dimension to be related to the mean rating of 
the candidate in question, but I haven't considered that yet.


However, if these problems can be solved, this summable opinion space 
concept could possibly be used to make a summable method that is PR. It 
would probably not be Droop proportional, but if it directly detects 
opinion space and then allocates candidates to reproduce it, it would at 
least be proportional.

Perhaps there is a more direct way of building opinion space (by using 
SVD or PCA), but I never got that to work. In any case, this is a rough 
idea, so don't pick too much on the details :-)



More information about the Election-Methods mailing list