[EM] Hare clustering MAM-based PR, and a tie problem.

Wed Feb 9 07:57:13 PST 2022

I was going through some old posts of mine and found this idea for a
MAM/Ranked Pairs-based PR method. It's based around the idea of making a
number of virtual constituencies (one for each seat), and then trying to
allocate voters to each so that the case for the winner (according to
the Ranked Pairs method run in that constituency) is as strong as
possible, i.e. the choice of winner or winning order is as justified as
it can be.

When I last had this thought, I stopped early because I couldn't make
sure that it would indeed choose the optimum in each constituency; but
recently, I realized that if it can't, then single-winner Ranked Pairs
can't, either.

But that doesn't make a lot of sense unless I describe just what
single-winner RP's score/metric is, and how it maximizes it, and then
how it (perhaps) doesn't. So I'll do that first, then get to the PR
method afterwards.

Suppose we have a social ordering, A>B>C. Then both Kemeny and RP assign
a score to this ordering, that is a function of the pairwise victories
consistent with this ordering (in this case A>B, B>C, and A>C). Kemeny's
score is the sum of the strength of victories, while RP's is the value
of the strongest victory, ties broken by the value of the next
strongest, ties broken by the value of the third strongest, and so on.

Then both Kemeny and RP, in the absence of ties, calls an election by
producing the social ordering that maximizes its respective score
function. That is, Kemeny chooses the one whose sum of consistent
pairwise victories is maximized, and RP the one whose largest pairwise
victory is maximized, and among these, the one whose next largest
pairwise victory is maximized, and so on.

But if there are ties, then we're in trouble. The Ranked Pairs algorithm
works by starting at the strongest pairwise victory, locking that in,
then proceeding to the next strongest. This is a greedy approach, and it
maximizes the RP score when there are no ties. But if there's a tie,
then it might be that earlier choices close off later choices in ways
that we can't determine from the start.

E.g. suppose the defeat strengths are as follows, in sorted order by
strength:

C>A: 100
D>A: 99
A>B: 80
B>C: 80
B>D: 75
(something else): 60
...

First we admit C>A and D>A. Then we have a choice of either admitting
A>B or B>C; we can't admit both.

So suppose we admit A>B. Then we have {C,D}>A>B; admitting B>C would
produce a cycle, and so would B>D. So we skip, go to the something else,
and the score vector is [100, 99, 80, 60, ...].

On the other hand, suppose we admit B>C. Then we have D>A and B>C>A. Now
we can admit B>D and get B>D>C>A with score vector [100, 99, 80, 75, 60,
...], which is better by the RP metric.

But the problem here is that when we were considering how to break the
strength-80 tie, we didn't know which would close off the ability to
admit B>D. So ordinary RP only maximizes the score vector in the absence
of ties.

So now, the MAM/RP-based method. The idea is this: through linear
programming we can find what's the greatest strength pairwise victory
supported by a Hare quota of the voters without having to determine who
the voters is.

To maximize the RP score in a constituency (subject to the caveats
above), we keep a list of pairwise victories that we've affirmed, just
like in ordinary Ranked Pairs. Then for each unaffirmed pairwise
victory, we use LP to determine which Hare quota consistent with what
we've already locked will maximize the strength of that victory; and
then we choose the one that doesn't produce a cycle, and that has
maximum strength.

For instance, suppose the Hare quota is 1000. When starting out, we find
that it's possible to choose a Hare quota of voters so that everybody
prefers A>B. So we lock A>B.
Then we try every other pairwise preference, and in essence for X>Y ask:
what's the largest support you can give X>Y by choosing a Hare quota of
voters all of whom rank A>B?
Suppose B>C has largest such support 850, while C>D has largest support
900. Okay, then we admit C>D.
Then we ask for all other pairwise preferences X>Y: what's the largest
support you can give X>Y by choosing a Hare quota of voters, all of whom
rank A>B, and 900 of whom rank C>D?

And so on. As we admit more and more pairwise preferences, the Hare
quota's worth of voters becomes increasingly more distinct until the
constituency is uniquely defined. At that point we assign the RP winner
to this constituency, remove these voters (and the winner) from the
election, and repeat with the next constituency.

But the tie problem shows up here, too. What should we do if the number
of seats is rather large so that (for the first step) it's easy to find
a Hare quota all of whom support X>Y, no matter who X and Y are? We have
to choose one of them, but we can't be sure we'll maximize the RP score
because of the path dependence problem I mentioned. So that's where I
got stuck.

But then I realized that single winner RP has this problem too! So it
shouldn't be a strike against the legitimacy of the RP procedure; we
just need to break the tie somehow. Choosing the pairwise preference
with most total support over the whole electorate is probably as good a
way as any.

So this multiwinner method should be implementable, although it would be
very slow: for each step, we'd have to go through every pairwise
preference to find the one with maximum attainable support; and worst
case, we have to go through n^2 steps. That makes for O(n^4) calls to
the LP solver, which, though it's polynomial in the number of voters,
may have a very high polynomial degree indeed.

Perhaps there is some dynamic programming trick that could be used
instead of linear programming, but I don't see it at the moment.

And perhaps there is a way to look ahead and actually maximize the RP
score, but I doubt it is. I suspect it's related to that determining if
candidate X can be made an RP winner by breaking ties in a particular
way -- is NP-complete.

-km