[EM] Fast Condorcet-Kemeny calculation times, clarification of NP-hardness issue

Warren Smith warren.wds at gmail.com
Sun Mar 4 14:44:32 PST 2012

```On Sun, Mar 4, 2012 at 3:44 PM, Richard Fobes
<ElectionMethods at votefair.org> wrote:
> Finally, after reading the articles cited by Warren Smith (listed at the
> bottom of this reply) plus some related articles, I can reply to his
> insistence that Condorcet-Kemeny calculations take too long to calculate.
>  Also, this reply addresses the same claim that appears in Wikipedia both in
> the "Kemeny-Young method" article and in the comparison table within the
> Wikipedia "Voting systems" article (in the "polynomial time" column that
>
> One source of confusion is that Warren, and perhaps others, regard the
> Condorcet-Kemeny problem as a "decision problem" that only has a "yes" or
> "no" answer.  This view is suggested by Warren's reference (below and in
> other messages) to the problem as being NP-complete, which only applies to
> decision problems.  Although it is possible to formulate a decision problem
> based on one or more specified characteristics of the Condorcet-Kemeny
> method, that is a different problem than the Condorcet-Kemeny problem.

--the optimization problem is at least as hard as the decision
problem.You are erroneously creating the impression I somehow
was unaware of this, or that you somehow have here got some new
insight.  Neither is true.

> In the real world of elections, the Condorcet-Kemeny problem is to calculate
> a ranking of all choices (e.g. candidates) that maximizes the sequence score
> (or minimizes the "Kemeny score").
>
> Clearly the Condorcet-Kemeny problem is an optimization problem, not a
> decision problem (and not a search problem).  It is an optimization problem
> because we have a way to measure how closely the solution reaches its goal.
>
> (For contrast, consider the NP-hard "subset sum problem" in which the goal
> is to determine whether a specified list of integers contains a subset that
> can be added and/or subtracted to yield zero.  Any subset either sums to
> zero or it doesn't sum to zero.  This makes it easy to formulate the related
> decision (yes/no) problem that asks whether such a subset exists for a given
> set of numbers.)

> Because the Condorcet-Kemeny problem is an optimization problem, the
> solution to the Condorcet-Kemeny problem can be an approximation.  If this
> approach is used, it becomes relevant to ask how closely the approximation
> reaches the ranking that has the highest sequence score. Yet even this
> question -- of "how close?" -- is not a decision problem (because it goes
> beyond a yes or no answer).
>
> Keeping in mind that VoteFair popularity ranking calculations are
> mathematically equivalent to the Condorcet-Kemeny method, my claim is that
> VoteFair popularity ranking calculations yield, at the least, the same
> top-ranked choice, and the same few top-ranked choices, as the solution
> produced by examining every sequence score -- except (and this is the
> important part) in cases where the voter preferences are so convoluted that
> any top-ranked choice and any few top-ranked choices would be controversial.
>  As one academic paper elegantly put it: "garbage in, garbage out".
>
> More specifically, here is a set of claims that more rigorously state the
> above ambiguous claim.
>
> Claim 1: For _some_ _instances_, a polynomial-time calculation can identify
> the full ranking that produces the highest Condorcet-Kemeny sequence score.

--oh whoo-whee.   Here's another claim: for SOME planets, I can
readily find a million dollars in gold piled up right next to me.

> Claim 2: For _some_ _instances_, a polynomial-time calculation can rank the
> top most-popular candidates/choices and this partial ranking will be the
> same as the top portion of the full ranking as determined by identifying the
> highest Condorcet-Kemeny sequence score.
>
> Claim 3: For the _remaining_ _instances_ (not covered in Claims 1 and 2), an
> approximation of the full Condorcet-Kemeny ranking can be calculated in
> polynomial time.

--what kind of "approximation"?  I can find an "approximation" to
a million dollars in gold, namely, 1 penny.

> Claim 4: For any cases in which the top-ranked candidate/choice according to
> the VoteFair popularity ranking algorithm differs from the top-ranked
> candidate/choice according to a full calculation of all sequence scores, the
> outcome of a runoff election between the two candidates/choices would be
> difficult to predict.
>
> As done in the academic literature, I am excluding the cases in which more
> than one sequence has the same highest sequence score.

--I'm not sure what that meant, but it sounds like garbage too.

> To help clarify the validity of these claims, I'll use an analogy.
>
> Consider a special case of the rigorously studied Traveling Salesman Problem
> (TSP), which is NP-hard to solve.  (The TSP also can be expressed as a
> decision problem, in which case the decision problem is NP-complete, but
> that variation is not the problem discussed here.)
>
> The special case -- which I will refer to as the non-returning Traveling
> Salesman Problem -- is that we want to know which city the salesman visits
> first, and we want to know, with successively less interest, which city the
> salesman visits second, third, and so on.  Additionally, for this special
> case, we specify that the cities to be visited are roughly located between a
> beginning point "B" and and ending point "E".
>
> To make this special case mathematically equivalent to the normal Traveling
> Salesman Problem in which the salesman returns to the starting city, we
> create a path of closely spaced cities (labeled "+" below) that lead back to
> the starting city "B".
>
> Here is a diagram of this problem.  Remember that the most important thing
> we want to know is which city ("*") the salesman visits first.
>
> B = Beginning city
> * = City to visit
> E = Ending city for main portion
> + = City on path back to beginning
> (periods = background; assumes monospace font)
>
> Instance 1:
> .................................................B.
> .....................................*............+
> ..................................................+
> .....................................*............+
> ...................................*..............+
> ..............................*...................+
> ..................................................+
> ................................*.................+
> .........................*........................+
> ......................*.....*.....................+
> ..................................................+
> ..................*..*.....*......................+
> ..........*....*..................................+
> .......*...............*..........................+
> ..........*......*................................+
> .....*...............*............................+
> .........*....*.........*.........................+
> ..........*........*..............................+
> .............*....................................+
> E.................................................+
> +.................................................+
> +.................................................+
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>
> In this case it is obvious which city is the first one on the path from B to
> E.  And it is obvious which are the next four cities on the path.
>
> What we do not know is the sequence of cities after that (for the path that
> is shortest).

--golly,we are taking up time on a red herring aren't we?
To be clear, let me state a few facts that may have escaped Fobes.

For traveling saleman problem (TSP),

1. decision problem - is there a tour shorter than X? - is NP-hard.

2. optimization problem - finding best tour, is NP-hard.

3. FInding even the first step in the best tour, is NP-hard.

4. Approximate optimization problem: finding an approximately best
tour (for a general distance matrix) to within a factor of 9999
billion, is NP-hard.

5. Finding just the first edge, on any tour (not necessarily the best
tour) whose total tour-cost is within a factor of 9999 billion of the
cost of the optimal tour... is NP-hard.

Are you getting the picture yet?  Don't be fooled by Fobes trying to
act as though I had somehow not realized this.  I knew all this ages
ago,
and tried (unsuccessfully) to impart some semblance of a clue to
Fobes.  OK, back to Fobesian essay now...

> Now let's consider a different instance of this non-returning Traveling
> Salesman Problem.
>
> Instance 2:
> .................................................B.
> ..........................*.......................+
> ........................*....*....................+
> ................*.........*...*...................+
> .............*.........*....*...*.*...............+
> ................*...*......*.....*...*............+
> .......................*......*...*......*........+
> ..........*......*.........*......*...*...........+
> .............*........*.........*......*..........+
> ..................*.........*......*..............+
> .........*.....*.......*..........................+
> .............*.....*..........*....*..............+
> ..................*..*.....*......................+
> ..........*....*..................................+
> .......*...............*..........................+
> ..........*......*................................+
> .....*...............*............................+
> .........*....*.........*.........................+
> ..........*........*..............................+
> .............*....................................+
> E.................................................+
> +.................................................+
> +.................................................+
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>
> In this instance we cannot know which city is the first city on the shortest
> path until we know the shortest path through all the cities.
>
> Calculating the absolute shortest path in a convoluted case like Instance 2
> might require a calculation time that is super-polynomial (more than what
> can be expressed as a polynomial function of the city count).
>
> However, we can estimate the shortest path.
>
> Such an approximation might identify a first city that is different from the
> first city on the absolute shortest path. If the "wrong" city is identified
> as the first-visited city, it is understandable that this occurs because
> there is not a clearly identifiable first-visit city in this instance.
>
> This analogy can be extended to the Condorcet-Kemeny problem.
>
> In normal election situations, the most important part of the solution is
> the first-ranked winner.  In fact, most voting methods are not _designed_ to
> identify more than the first-ranked winner.
>
> In contrast, the Condorcet-Kemeny problem is designed to identify a full
> ranking.  Accordingly, the second-most important part (of solving the
> Condorcet-Kemeny problem) is to identify the top few highest-ranked choices.
>
> Both of these important goals can be achieved without fully ranking all the
> choices.  This is analogous to solving Instance 1 of the non-returning
> Traveling Salesman Problem.

--In the TSP with general distance matrix, I repeat, even finding just
THE ONE FIRST STEP of the best tour, or any non-best but approximately
best tour, is NP-hard.  In the Kemeny problem, just finding the winner
alone, without trying to find the rest of the order, still is NP-hard.
I knew all this, and said all this, to Fobes, ages ago.  One
day maybe it will penetrate.

> The importance of calculating the few top-ranked choices, and the reduced
> importance of calculating the lower-ranked choices, is further demonstrated
> when the Condorcet-Kemeny method is used to aggregate (merge/join/etc.)
> separate rankings from different search engines (to yield "meta-search"
> results, which is the intended goal specified by IBM employees who authored
> one of the cited articles about Condorcet-Kemeny calculations).
>  Specifically, a search-engine user is unlikely to look at the search
> results beyond the first few pages, which means that carefully calculating
> the full meta-search ranking for thousands of search results is pointless,
> and therefore the calculation time for a full ranking is irrelevant.
>
> (As a further contrast, to clarify this point about a partial solution being
> useful, the subset-sum problem does not have a partial solution. All that
> matters is the existence of at least one solution, or the absence of any
> solution.)
>
> Therefore, in some instances we can solve the NP-hard Condorcet-Kemeny
> problem "quickly" (in polynomial time) in the same way that we can "quickly"
> (in polynomial time) solve some instances -- such as Instance 1 -- of the
> NP-hard non-returning Traveling Salesman Problem.

--and in some instances, there is a pile of gold right next to me.
This is laughable.  The statement "in some instances my algorithm can
work" is essentially equivalent to the statement "my algorithm does
not work."

It is NOT ACCEPTABLE to have a voting algorithm that works only "in
or they do not.  "work" means always.  Not sometimes.   If they even
fail
one time, then it was an invalid algorithm.

I'm really annoyed that I have to keep on doing this.   You need
to take computer science 101.

> In instances where we use an approximate solution for the Condorcet-Kemeny
> problem, the approximate solution can be calculated in polynomial time.

--again, the use of the catch-all, utterly meaningless, word
"approximate."  1 penny is an "approximation" to 1 million dollars. It
is not a very good approximation.  With no goodness guarantee, this is
all totally useless.

When Fobes says "I have an approximation" it is equivalent to "I am
dreaming, but I feel very good in my dream, so why doesn't the rest of
the world feel good?"  Because you have no guarantee, so you have
nothing.  That's why.  "Fobes feels good" is simply NOT ACCEPTABLE as
a justification for a voting algorithm.

>  Specifically, the algorithm used for VoteFair popularity ranking, which
> seeks to maximize the Condorcet-Kemeny sequence score, always can be solved
> in polynomial time (as evidenced by all the programming loops being
> bounded).

--And I can "seek to find a million dollars in gold" using an
algorithm guaranteed to stop in 1 minute.  I can absolutely guarantee
it.
So what?  Why should anybody care?

> To further clarify these points,

--by which Fobes means "to further try to obscure the truth at great length"...

>consider the following instance of the
> non-returning Traveling Salesman Problem.
>
> Instance 3:
> .................................................B.
> ..........................*.......................+
> ........................*....*....................+
> ................*.........*...*...................+
> .............*.........*....*...*.*...............+
> ................*...*......*.....*...*............+
> .......................*......*...*......*........+
> .................*.........*......*...*...........+
> .............*........*.........*......*..........+
> ..................*.........*......*..............+
> .......................*..........................+
> ...................*..............................+
> ..................*..*............................+
> ..........*....*..................................+
> .......*...............*..........................+
> ..........*......*................................+
> .....*...............*............................+
> .........*....*.........*.........................+
> ..........*........*..............................+
> .............*....................................+
> E.................................................+
> +.................................................+
> +.................................................+
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>
> For this instance, we can calculate the absolute shortest path through the
> group of cities closest to the starting point "B" without also calculating
> the absolute shortest path through the group of cities closest to the ending
> point "E".
>
> Similarly some instances of the Condorcet-Kemeny problem do not require
> calculating the exact order of lower-ranked choices (e.g. candidates) in
> order to exactly find the maximum-sequence-score ranking of the top-ranked
> choices.

> Now that the word "instance" and the concept of a partial order are clear, I
> will offer proofs for Claims 1, 2, and 3.

--great.  A ton of irrelevant diagrams about an unrelated problem are
offered as "clarification" and now for a ton of proofs of irrelevant
and useless claims, are offered.  Oh joy.

> Proof of Claim 1: If an instance has a Condorcet winner and each
> successively ranked choice is pairwise preferred over all the other
> remaining choices, this instance can be ranked in polynomial time.
>
> Proof of Claim 2: If an instance has a Condorcet winner and the next few
> successively ranked choices are each pairwise preferred over all the
> remaining choices, the top-ranked choices for this instance can be ranked in
> polynomial time.
>
> Proof of Claim 3: There are polynomial-time approximation methods that can
> efficiently find a sequence that has a Condorcet-Kemeny sequence score that
> is close to the largest sequence score.
>
> (Clarification: I am not claiming that a ranking result based on
> approximation will have the same fairness characteristics that are
> attributed to the "exact" Condorcet-Kemeny method.)
>
> Using lots of real-life data, plus data that has unusual calculation-related
> characteristics, I have tested the VoteFair ranking algorithm against the
> full approach that calculates all sequence scores for up to six choices.  In
> all these cases there are no differences in the top-ranked choice, nor are
> there any differences in the full ranking for the cases that have no ties.
>  (The cases that involve ties involve multiple sequences that have the same
> highest score, the resolution of which is not specified in the
> Condorcet-Kemeny method.)
>
> Of course Claim 4 would be difficult to prove. (This claim says that if the
> two methods do not identify the same winner, the outcome of a runoff
> election would be difficult to predict.)  The point of Claim 4 is to clarify
> the concept of "controversial" and state that if the two methods identify
> different winners, neither winner is uncontroversial.

--in other words, Fobes has a lot of dreams that his algorithm somehow
works well some of the time.  He has absolutely nothing to base this
on other than his own personal feelings.  We don't know when it'll
work well and when it'll work badly.  Sounds like a great voting
method.

> As a reminder (especially for anyone skimming), I am not saying that the
> Traveling Salesman Problem is mathematically related to the Condorcet-Kemeny
> problem (beyond both being categorized as NP-hard problems).  Instead I am
> using the well-studied traveling salesman problem as an analogy to clarify
> characteristics of the Condorcet-Kemeny problem that some election-method
> experts seem to misunderstand.

--well, YOU misunderstand.  Not necessarily anybody else.

> Perhaps the misunderstanding arises because the Condorcet-Kemeny method must
> fully rank all the choices in order to identify the top-ranked choice.  In
> contrast, other methods do the opposite, namely they identify the top-ranked
> choice and then, if a further ranking is needed, the process is repeated
> (although for instant-runoff voting and the Condorcet-Schulze method the
> process of calculating the winner yields information that can be used to
> determine some or all of a full ranking).
>
> If anyone has questions about the calculations done by the open-source
> VoteFair popularity ranking software, and especially about its ability to
> efficiently identify the highest sequence score based on meaningful voter
> preferences, I invite them to look at the clearly commented code.  The code
> is on GitHub (in the CPSolver account) and on the Perl CPAN archive (which
> is mirrored on more than two hundred servers around the world).

--normally, people would feel embarrassed about widely distributing
garbage.  To Fobes, the fact he has widely distributed it, seems in
his mind to constitute proof it is not garbage!  QED!

> In summary, although the Condorcet-Kemeny method is mathematically
> categorized as an NP-hard problem, the instances that are NP-hard to solve
> involve either the less-important lower-ranked choices (analogous to
> Instance 1 in the non-returning Traveling Salesman Problem),

--wrong. Complete and utter lie.  Determining just the single winner,
is NP-hard.

> or involve
> convoluted top-ranked voter preferences that yield controversial results
> (analogous to Instances 2 and 3), or both.

--oh golly.  My voting method might misbehave in a difficult-for-it
election.  But it works great in easy-for-it elections!

Gee Fobes.  Couldn't we always say that about ANYTHING?

So in other words your whole diatribe means NOTHING?

> For all other instances -- which
> include all meaningful election situations -- score-optimized top-ranking
> results can be calculated in polynomial time.

--oh I see. So the game is: "Fobes' method works great, except when it
doesn't.  But when it doesn't I hereby solve the problem by branding
that a 'non-meaningful election situation.'    The definition of "non
meaningful" hereby is "my method fails."

But golly, couldn't anybody always do that with any method at all?

THIS IS NOT ACCEPTABLE.

> Clearly, in contrast to what Warren Smith and Markus Schulze and some other
> election-method experts claim, the calculation time required by the
> Condorcet-Kemeny method is quite practical for use in real-life elections.

--you've proven the opposite.  This is one of the most laughable and

> I'll close with a quote from the article by (IBM researchers) Davenport and
> Kalananam that Warren cited: "NP-hardness is a only [sic] worst case
> complexity result which may not reflect the difficulty of solving problems
> which arise in practice."

--indeed, it may not.  NP-hard problems can often be solved quickly
just not always. Having an election method that sometimes succeeds, is
NOT ACCEPTABLE.

> About the citations below: I was not able to read the article by Bartholdi,
> Tovey, and Trick because it requires paying a \$35 fee.  Alas, it is the
> article that other articles refer to for the proof of NP-hardness.

--libraries tend to be free.  But you have to go to them.  Also
more than one NP-hardness proofs have been found.

```