[EM] A more Condorcetian Hare-based proportional method

Tue Jun 28 05:14:05 PDT 2016

(This is a sketch of the method that I mentioned in my post to Robert,
that I haven't been able to make into an actual method because of a
particular snag.)

Suppose that X and Y are two lists of pairwise comparisons (e.g. X might
be "A>B B>C" and Y might be "C>A C>B B>A") so that the pairwise
combinations can be combined into a linear order (i.e. there are no cycles).

Then let X be stronger than Y if: X's first comparison has a stronger
defeat strength than Y's first, or if they're both equal and X's second
comparison has a stronger defeat strength than Y's, and so on (leximax).
Missing entries count as 0, so a list of defeat strengths (20 10 5) is
stronger than one of (20 10) but not one of (20 11).

Let such a list be maximal if no other list is better than it.
Clearly, when not further constrained, the maximal list gives the
orderings to be reassembled to find the Ranked Pairs winner in a
single-winner election[1].

The idea for multiwinner is then pretty simple, on a high level:

Let a particular ballot subset of size k be maximal if its
maximal list (when only considering the votes in this subset) is no
weaker than any maximal list from any other ballot subset of size k.
(Here, comparing lists from different subsets implies comparing defeat
strengths calculated from the respective subsets.)

Then:

1. Randomly pick a maximal Hare quota-sized subset of the ballots.
2. Feed the associated ordering for the maximal Hare subset to
Ranked Pairs, conducting an RP election on that subset of ballots.
3. Elect the winner and eliminate him from every ballot.
4. If the council is full, we're done.
5. Otherwise remove the Hare subset in question from the total ballot
set and go to 1.

This is basically the same kind of logic as sequential PAV (or
approximation algorithm A for Monroe here:
http://arxiv.org/pdf/1312.4026.pdf ), but with scoring being based on
maximal lists rather than a simple reweighted sum of ratings.

Note that something alike the elimination in step 3 is forced. To see
why, consider an election where everybody agrees A is best. If we don't
eliminate A, every subset's MAM election will choose A.

The "ideal gerrymander" interpretation is also pretty easy to understand
for this method: step one finds the best district to give a
representative. Step two then proceeds to find who that representative
should be, and then all voters in that imaginary district are removed
from consideration and the process continues with what's left.

-

All the magic resides in step 1. We have to find some way of finding and
picking a maximal subset. This is easy if there are no ties at each
level. Suppose for instance that a Hare quota is 100, and A>B is the
highest defeat strength pair we know of with 80 voters saying A>B. Then
we automatically know that all those A>B voters are part of the maximal
subset, along with 20 more we don't know anything about. No matter what
the other pairs in the maximal subset is going to be, A>B has to be in
it for the subset to be maximal to begin with.

Having locked down the A>B voters, we could then look for which other
pairwise preferences are stronger than any others, subject to the
constraint that we can only pick 20 voters outside the already locked-in
subset. And so it continues until the 100 voters have all been locked in.

The snag lies in that I can't see a way of get rid of lookahead, and
with lookahead, having to consider every option may cause an exponential
blowup. Consider this scenario:

At stage 1, A>B and B>C are tied for strongest defeat; say with a Hare
quota's worth of 100 votes.
Suppose it turns out that if we choose B>C, then the next strongest
defeat is C>D with 60 votes, but the A>B majority and B>C majority is
not the same, so if we choose A>B, the next strongest defeat is D>C with 50.
So we should choose B>C rather than A>B because B>C starts a maximal
list while A>B doesn't.
But the method can't know that in advance! So it would presumably have
to consider both A>B and B>C and backtrack once it finds out which are
better. But then we can bury the crucial tie-breaking distinction
beneath enough ties and the cost to try every combination becomes
prohibitive.

Perhaps there is a set covering-esque problem hiding in there somewhere.
Or maybe memoization can somehow help for low numbers of seats by
showing that the Hare quotas are broad enough to exclude many voters in
one swoop, so the combinatorial explosion doesn't occur, or there's
optimal substructure somehow so dynamic programming works... but any
such tricks will make the method very complex.

I don't think that method would be monotone, either, but that's just a
hunch.

-

[1] A similar reduction is possible for River: just state that the lists
must follow River's constraint that no candidate can be on the losing
side of a pairwise comparison more than once.