[EM] High precision Yee diagrams

Sat Jan 22 14:44:16 PST 2011

Ordinary Yee diagrams are constructed by, for each point, producing a 
Gaussian distribution centered on that point, and have each sample of 
the Gaussian vote on candidates in order of preferences. This approach 
will always contain some noise, since the sample size is finite.

Some times, having this noise is useful, since it lets us see how the 
voting method acts in the presence of actual noise. Other times, though, 
  it might be useful to see how the voting method acts when there is no 
  noise, to understand what instances of strange behavior, such as 
monotonicity oddities with IRV, don't disappear as you add more and 
more voters who vote in the same manner.

Thus, here's a suggestion for how to make high precision Yee diagrams. 
The primary idea is that instead of sampling from a Gaussian, we can 
simply determine the volume of the Gaussian within a certain region - 
the region of voters that all have the same ranking.

This method would have two steps. In the first step, we determine the 
regions (convex polygons) where every voter inside the region submits 
the same rating. This step has to be done only once per Yee diagram. In 
the second step, we center the Gaussian on a given point (the point 
corresponding to the pixel whose color is to be determined), calculate 
the volume of that Gaussian within each same-voter region, and use each 
region's volume as the weighting for the region's associated ranking.

The second step can be done quite simply. Solve the double integral for 
a Gaussian (with input variable center and standard deviation) with 
respect to a region of the shape of a right angle triangle where two of 
the legs are parallel to the axes. Then, to determine the volume over an 
arbitrary triangle, rotate the axes so that the longer line is parallel 
to one of the axes, and split the triangle into two right angle 
triangles of the above form. To determine the volume over an arbitrary 
convex polygon, just triangulate it and add up the triangle integrals.

This leaves the first step. At first glance, that seems to be 
prohibitive. If we have n candidates, there are n! possible ways to rank 
them, and so one would think that the combinatorial explosion would make 
the algorithm completely impractical. I thought so, too, but it seems 
that dimensionality significantly limits the explosion. For instance, if 
there are regions belonging to candidates A, B, and C, so that the 
distance from a given point in A to the closest point in C is always 
greater than the distance from that point to the closest point in B, 
then there can be no "A > C > B" voters in the A region (or at all, for 
that matter).
Though I think it's possible to contrive examples where the number of 
distinct ballots grow superpolynomially with respect to the number of 
candidates, I think such examples will be unlikely in practice. I have 
no proof either way, though, but if any of you have, go ahead and 
mention it :-)

The first step is recursive. Initially, the whole region is assigned to 
"*" (i.e. no preference), and is defined by a bounding box. For each 
region, we construct the Voronoi diagram of all candidates but the ones 
already specified (which in the initial case is nobody, so we build the 
whole Voronoi diagram). This divides the space into a number of smaller 
regions, to which the same division can be applied: e.g. we find the * 
region to contain an A region, and the A region to contain a B and C 
region (turning the A region into an "A > B" region and an "A > C" 
region), and so on. We end the recursion when all candidates have been 
added to every ranking, and are left with a data structure of the form:

  A > B > C: (vertex coordinates of the convex polygon for all voters 
voting this ranking)
  A > C > B: (vertex coords for "A > C > B")
  B > A > C: (vertex coords for "B > A > C")
  C > A > B: (vertex coords for "C > A > B")

so that actually determining how many (up to scaling) vote for each 
ranking is as simple as getting the integral of the polygon with respect 
to a Gaussian centered on the pixel in question.

-

That should work. It would take some time programming, but it should 
work. It wouldn't work for cardinal methods, though, since the rating 
should in that case somehow reflect the distance between each Gaussian 
sample and the candidates, and since this doesn't use sampling at all, 
there is no sample to get the distance of.

It might be possible to define mean distances and get ratings from that 
(e.g. center of gravity or somesuch) but only for cardinal systems where 
one ballot that gives rating x for candidate A along with another that 
gives rating y for candidate A is equal to one ballot that gives (x+y) to A.

If there's anything I've missed, or if you have any comments, go ahead!