[EM] Exact spatial model probabilities?

Wed Jan 26 04:53:44 PST 2022

On 26.01.2022 00:10, Daniel Carrera wrote:
> 
> 
> On Tue, Jan 25, 2022 at 3:26 PM Kristofer Munsterhjelm
> <km_elmet at t-online.de <mailto:km_elmet at t-online.de>> wrote:
> 
>> Perhaps the better approach is to use MC or quasi MC. Or meet in the
>> middle somehow, e.g. let f(A>B>C, A.x, A.y, B.x, B.y, C.x, C.y) be the
>> fraction of voters who vote A>B>C when the candidates are at the given
>> coordinates; then use billiard sampling to determine f, and sample using
>> MC or a perturbed over the coordinates. This could perhaps be more
>> accurate than using MC for both picking the candidate locations and
>> counting voter proportions.
> 
> I'm not familiar with billiard sampling. Is this it?
> 
> https://www.sciencedirect.com/science/article/pii/S037722171400280X
> <https://www.sciencedirect.com/science/article/pii/S037722171400280X>

Yes, that's the one. My implementation is here:
https://github.com/kristomu/quadelect/tree/master/src/linear_model

I used it to sample pairs of elections where e.g. in the second
election, A is raised by one step (to check monotonicity failures), or A
is cloned (to check clone dependence). My idea at the time, I think, was
that sampling this constrained election space would cover all of that
space more quickly, so there would be a higher chance of getting a
failure example from a method that's e.g. only nonmonotone in some very
particular case.

> I have an idea that might be very naive, but here it goes. The problem
> is hard in part because you want to compute f(A>B>C) for every
> permutation of candidates. You could make a quicksort-like algorithm to
> iteratively partition the space:
> 
> 1) Start with a population of V voters and C candidates spread in some
> N-dimensional space.
> 
> 2) Select any two candidates {A,B}. Partition the voters based on
> whether they are in the A>B camp or the A<B camp. Now you have two
> disjoint groups. Partition each group based on where they fall on the
> {A,C} contest. Partition those based on where they fall on the {B,C}
> debate, and so on.

That seems related to how you'd determine the Voronoi cells for a
continuous voter distribution. There's an algorithm to calculate a
Delaunay triangulation on c (number of candidates) points in R^d (number
of issues in issue space) in time O(c log c + c^floor(d/2)):
https://team.inria.fr/datashape/files/2017/01/alea.pdf p. 24. The
Delaunay triangulation is a dual of the Voronoi map, so I *think*
(although I'm not sure) that this means you can also get the latter in
the same time complexity.

So when figuring out the proportion who'd vote A>B>C in the continuous
(V->infty) case, you could first create a Voronoi map for all the
candidates. The volume of this region gives you how many voters would
vote A first. Then you'd take the region for A and calculate the Voronoi
map within it for every candidate but A excluded. The subregion that's
now closest to B gives you the number of voters who vote A>B>...; and so on.

For a finite number of voters, there's of course no need to deal with
continuous regions, just recursively partition the V voters. But in this
case, we can directly calculate the voters' rankings: a voter's ballot
is A>B>C if that voter is closest to A, next closest to B, and most
distant to C. This also takes V*C^2 time.

However, that doesn't solve the integral. My idea was to get the (as
close to exact as possible) probability that some random chosen voter
will vote A>B>C. Say that this is p(A>B>C).

Since the voters are independent, this then gives a probability that
some random election of V voters will be of a particular type; e.g. you
can figure out the probability that a three-candidate election will be
e_s = (A>B>C, A>B>C, B>C>A). Let the probability that we will encounter
election e be p(e). Then the probability of the example election is just
p(e_s) = p(A>B>C) * p(A>B>C) * p(B>C>A).

The exact manipulability of some method M is then

sum over all elections e of V voters: manip(M, e) * p(e)

where manip(M, e) is 1 if that election is manipulable, 0 otherwise.

If we set V finite before we calculate the probabilities p(A>B>C) etc.,
then I think the voters are no longer independent and the trick fails.
Either that or we get sampling error so that we could just as well just
directly infer what rank each voter votes.

The reason I'd like to get exact probabilities is that if I want to make
a method that minimizes manipulability for the spatial model, or
optimizes some linear combination of VSE and strategy resistance, then
I'd rather not want it to overfit. And if there's sampling noise, the
method is more likely to overfit.

Really, the fundamental problem is that I have no way to "regularize"
the manipulability optimization problem.

The most obvious measure of complexity is that the election method (for
a particular number of candidates and voters) is a union of convex
polytopes so that if an election falls inside this union, then A wins.
Then the more polytopes in that union, and the more vertices per
polytope, the more complex the method. (IRV would fare very poorly by
this metric.) But fitting a non-convex polytope (as the union would be)
to the smallest number of convex polytopes is NP-hard. So difficulty
abounds.

-km