[EM] Fast Condorcet-Kemeny calculation times, clarification of NP-hardness issue

Richard Fobes ElectionMethods at VoteFair.org
Fri Mar 30 23:24:50 PDT 2012


This is a continuation of the debate about the calculation time for the 
Condorcet-Kemeny method.

On 3/4/2012 2:44 PM, Warren Smith wrote:
...
 > ...  In the Kemeny problem, just finding the winner
 > alone, without trying to find the rest of the order, still is NP-hard.
...
 > --great.  A ton of irrelevant diagrams about an unrelated problem are
 > offered as "clarification" and now for a ton of proofs of irrelevant
 > and useless claims, are offered.  Oh joy.
...
 > --in other words, Fobes has a lot of dreams that his algorithm somehow
 > works well some of the time.  He has absolutely nothing to base this
 > on other than his own personal feelings.  We don't know when it'll
 > work well and when it'll work badly.  Sounds like a great voting
 > method.
...
(The full context for Warren's above quotations appears at the bottom of 
this message.)

Warren, as someone who is well-educated in mathematics, surely you 
understand the difference between "specific instances" of a problem and 
the generalized problem.  I know you're smart, and some other forum 
participants seem to have the same misunderstanding, so apparently I 
have not been clear enough, so I'll try to make this concept even clearer.

I agree that the _generalized_ Condorcet-Kemeny problem -- when there is 
_no_ specific _data_ available -- is correctly categorized as being 
NP-hard.  That's because there are _some_ instances in which it is 
impossible to know with certainty which sequence has the highest 
sequence score without calculating all, or at least most of, the 
sequence scores.

However, the moment that election data becomes available, valid results 
can be calculated quickly using the Condorcet-Kemeny method.

To understand this seeming contradiction, it is important to realize 
that the "NP-hard-to-calculate" _instances_ that involve the winning 
candidate only occur when the "fully-calculated" highest-ranked 
(winning) candidate would not easily win a runoff election against a 
winner identified using a well-designed approximation method that 
optimizes the Condorcet-Kemeny sequence score.

More specifically, meaningful results can be _proven_ quickly.

To see how, please look at some specific instances of pairwise counts 
from actual surveys/elections.

Case 1:  Here are pairwise counts for choices A, B, C, D, E, F, G, H, I, 
arranged in the order in which the choices were listed on the ballot:

  [   ---       5      11       4       6       9      10       5       4  ]
  [    11     ---      13       5      12      15      14       8       9  ]
  [     4       2     ---       1       3       4       7       1       4  ]
  [    11      11      14     ---      13      14      13       8      10  ]
  [    10       4      13       3     ---       9      11       4       8  ]
  [     7       1      12       2       6     ---       9       5       3  ]
  [     6       1       8       3       4       6     ---       4       3  ]
  [     9       8      14       7      12      11      12     ---       8  ]
  [    11       7      12       6       7      12      12       7     ---  ]

(If line wrapping occurs, please view the attached text-only copy of 
this message.)

Layout explanation:  The dashes occupy the positions where a choice 
would be compared with itself.  The arrangement is the same as used in 
Wikipedia.  As a reminder of that convention, the top row could be 
labeled "Prefer A over ...", the second row could be labeled "Prefer B 
over ..." etc. down to the last row, which could be labeled "Prefer I 
over ...".  The left-most column could be labeled "... A", the second 
column could be labeled "... B", etc.  (In case they are useful, here 
are examples of specific counts: 5 voters pairwise prefer choice A over 
choice B, 11 voters pairwise prefer choice A over choice C, 4 voters 
pairwise prefer choice A over choice D, ... , 11 voters pairwise prefer 
choice B over choice A, ... 11 voters pairwise prefer choice I over 
choice A, ..., and 7 voters pairwise prefer choice I over choice H.)

Now, here are the same pairwise counts arranged in the sequence D, B, H, 
E, I, A, F, G, C, which is one of the two sequences that produces the 
highest Condorcet-Kemeny sequence score of 399:

  [   ---      11       8      13      10      11      14      13      14  ]
  [     5     ---       8      12       9      11      15      14      13  ]
  [     7       8     ---      12       8       9      11      12      14  ]
  [     3       4       4     ---       8      10       9      11      13  ]
  [     6       7       7       7     ---      11      12      12      12  ]
  [     4       5       5       6       4     ---       9      10      11  ]
  [     2       1       5       6       3       7     ---       9      12  ]
  [     3       1       4       4       3       6       6     ---       8  ]
  [     1       2       1       3       4       4       4       7     ---  ]

The other sequence that has the same highest sequence score is the same 
sequence except that choices B and H swap places.  This means that 
choices B and H are tied for second place.

Without doing any calculations, just by looking at the numbers, it's 
obvious that no other sequence can produce a higher score!

Keep in mind (as explained in an earlier message) that the pairwise 
counts in the upper-right triangular area are the ones that sum together 
to equal the sequence score.

The lack of any other sequence yielding a higher sequence score is 
obvious because the smallest (pairwise) count in the upper right is 8, 
and there is only one count in the lower left that equals or exceeds 
that value, namely the 8 for the voters who prefer H over B (in the 
third row, second column).  All the other values in the lower-left 
triangular area are less than 8, so rearranging the sequence to move any 
combination of those counts into the upper right cannot increase the 
sequence score.  (As already pointed out, swapping choices B and H yield 
the same highest sequence score.)

The VoteFair ranking calculation algorithm for finding these two 
sequences is much, much faster than the N-factorial approach, where N is 
the number of choices.

In other words, when the voters have a clear pattern of preferences, the 
correctness of the results can be calculated much, much faster than the 
long calculation time that is implied by the NP-hard categorization of 
the _generalized_ Condorcet-Kemeny problem.  In fact, in this case, the 
correctness can be recognized -- and proven -- just by looking at the 
numbers, without the aid of a computer.

Case 2:  Here is a real-life example of unclear (muddled) voter 
preferences in which it is necessary to check almost all the sequence 
scores in order to determine which sequence score is the highest.  (As 
I've said before, muddled preferences more often occur when there are 
only a few voters.)

These are the pairwise counts in the order A, B, C, D, which is the 
order listed on the ballot:

  [   ---       5       4       8  ]
  [     5     ---       8       5  ]
  [     6       2     ---       5  ]
  [     2       5       5     ---  ]

The sequence with the highest score is the sequence B, C, A, D, which 
puts the pairwise counts into this arrangement, where the sum of the 
counts in the upper-right triangular area equals the sequence score of 37:

  [   ---       8       5       5  ]
  [     2     ---       6       5  ]
  [     5       4     ---       8  ]
  [     5       5       2     ---  ]

In this case it is not obvious that this sequence produces the highest 
sequence score.  Specifically, the 5's in the upper right (triangular 
area) and the 5's in the lower left (triangular area) suggest that other 
sequences that rearrange these counts onto opposite sides of the 
diagonal might produce a higher score.  If this kind of pattern appeared 
in a case with 50 candidates, lots and lots of sequences would need to 
be checked to be sure it's the highest possible score.

Notice that this case does not have a clear winner.

Specifically, choice B is the Condorcet-Kemeny winner, yet choice A 
would have a good chance of winning a runoff election against choice B. 
  In fact, the pairwise counts indicate that 5 voters prefer A over B, 
and the other 5 voters prefer B over A, so these pairwise counts suggest 
that A and B are essentially tied for first place.

Indeed, calculating all the sequence scores reveals that the following 
sequences have a sequence score of 35, which is close to the highest 
score of 37:

Sequence A, B, C, D:

  [   ---       5       4       8  ]
  [     5     ---       8       5  ]
  [     6       2     ---       5  ]
  [     2       5       5     ---  ]

Sequence A, B, D, C:

  [   ---       5       8       4  ]
  [     5     ---       5       8  ]
  [     2       5     ---       5  ]
  [     6       2       5     ---  ]

Sequence A, D, B, C:

  [   ---       8       5       4  ]
  [     2     ---       5       5  ]
  [     5       5     ---       8  ]
  [     6       5       2     ---  ]

Sequence B, A, C, D:

  [   ---       5       8       5  ]
  [     5     ---       4       8  ]
  [     2       6     ---       5  ]
  [     5       2       5     ---  ]

Sequence B, A, D, C:

  [   ---       5       5       8  ]
  [     5     ---       8       4  ]
  [     5       2     ---       5  ]
  [     2       6       5     ---  ]

These runner-up sequence scores (of 35) put choices A and B in either 
first or second place, which makes it clear that choices A and B are 
more popular than choices C and D.

(Choices C and D are the two least popular choices, but their relative 
ranking is not clear from just looking at the data, without calculating 
the sequence scores.)

Let's suppose that an optimization algorithm "got stuck" at the 
sequences that have a score of 35, and failed to find the sequence that 
has the higher score of 37, and consequently identified choice A as the 
winner.  That's the "wrong" winner compared to the "fully calculated" 
winner of choice B.

Yet, the outcome of a runoff election between choice A and choice B 
would be difficult to predict!  (As stated above, the pairwise counts 
for these two choices indicate an exact tie.)

This example demonstrates that when voter preferences are unclear, if an 
optimization technique identifies a top-ranked candidate who is 
different from the top-ranked candidate based on finding the highest 
sequence score, then the outcome of a runoff election between these two 
candidates would be difficult to predict.

Case 3:  To more broadly understand this point, consider a variation 
from Case 1, and suppose that an approximation algorithm yielded the 
wrong sequence such that the counts below that are labeled "bb" are big 
numbers and the counts labeled "ss" are small numbers.

  [   ---      11       8      13      10      11      14      13      14  ]
  [     5     ---       8      12       9      11      15      14      13  ]
  [     7       8     ---      12       8       9      11      12      14  ]
  [     3       4       4     ---       8      10       9      11      13  ]
  [     6       7       7       7     ---      ss      ss      ss      ss  ]
  [     4       5       5       6      bb     ---      ss      ss      ss  ]
  [     2       1       5       6      bb      bb     ---      ss      ss  ]
  [     3       1       4       4      bb      bb      bb     ---      ss  ]
  [     1       2       1       3      bb      bb      bb      bb     ---  ]

Even with the lowest-ranked choices being very wrongly ranked (according 
to the Condorcet-Kemeny criteria), the highest-ranked choices are still 
correctly ranked.  And it is easy to verify the correctness of the 
ranking of the higher-ranked choices.

In other words, if the pairwise counts that involve the more popular 
choices are clear and unambiguous, using an approximation and getting 
the wrong results in the lower-ranked choices does not lead to making 
any mistake about the ranking of the higher-ranked choices (and in 
particular the winning choice).

These same patterns apply even in cases involving one thousand or more 
choices.  This understanding explains the usefulness of this method in 
other (non-election) applications, such as the application indicated in 
one of Warren's citations, in which IBM researchers express interest in 
using the Condorcet-Kemeny method to meta-rank website search results.

Just in case anyone reading here doesn't yet see the ease with which a 
person -- without the aid of computer calculations -- can verify either 
the correctness of the results or the muddled preferences of the voters, 
here are additional examples:

Case 4:  Ballot-listed sequence:

  [   ---     156     170     179     149      86     114      62  ]
  [    78     ---     137     156     128      52      72      51  ]
  [    67      99     ---     143     112      46      55      30  ]
  [    57      77      92     ---      80      41      48      34  ]
  [    95     116     134     162     ---      82      80      64  ]
  [   153     187     192     198     166     ---     145      80  ]
  [   126     167     186     191     168      98     ---      42  ]
  [   198     211     232     228     207     180     217     ---  ]

The same pairwise counts sorted into the sequence that produces the 
highest sequence score:

  [   ---     180     217     198     211     207     232     228  ]
  [    80     ---     145     153     187     166     192     198  ]
  [    42      98     ---     126     167     168     186     191  ]
  [    62      86     114     ---     156     149     170     179  ]
  [    51      52      72      78     ---     128     137     156  ]
  [    64      82      80      95     116     ---     134     162  ]
  [    30      46      55      67      99     112     ---     143  ]
  [    34      41      48      57      77      80      92     ---  ]

Here again we can quickly verify, without the use of a computer, that no 
other sequence could produce a higher score.  That's because all the 
numbers in the lower-left triangular area are smaller than every number 
in the upper-right triangular area, which are the numbers that sum 
together to equal the sequence score.

If anyone thinks that having more choices makes things more difficult, 
it doesn't.

Case 5:  Ballot-listed sequence:

  [   ---      89      88     101      96      96      66     111 
98     116      97      67  ]
  [    50     ---      59      79      72      63      48      83 
70      82      69      48  ]
  [    51      67     ---      84      74      62      47      96 
81      91      70      49  ]
  [    37      47      40     ---      48      38      32      61 
52      62      41      31  ]
  [    40      54      49      73     ---      53      27      68 
55      69      54      24  ]
  [    46      65      63      86      72     ---      47      91 
76      95      75      47  ]
  [    76      82      81      95     100      83     ---     102 
95     108      91      70  ]
  [    27      42      25      58      51      33      23     --- 
37      56      35      27  ]
  [    38      55      40      68      65      48      31      78 
---      85      59      32  ]
  [    21      42      31      58      49      29      18      60 
34     ---      36      19  ]
  [    43      58      55      80      70      51      36      86 
64      85     ---      39  ]
  [    73      82      78      95     102      83      59      99 
92     108      88     ---  ]

Here are the same pairwise counts sorted into the sequence that produces 
the highest sequence score:

  [   ---      70      76      83      81      82      91      95 
100      95     108     102  ]
  [    59     ---      73      83      78      82      88      92 
102      95     108      99  ]
  [    66      67     ---      96      88      89      97      98 
96     101     116     111  ]
  [    47      47      46     ---      63      65      75      76 
72      86      95      91  ]
  [    47      49      51      62     ---      67      70      81 
74      84      91      96  ]
  [    48      48      50      63      59     ---      69      70 
72      79      82      83  ]
  [    36      39      43      51      55      58     ---      64 
70      80      85      86  ]
  [    31      32      38      48      40      55      59     --- 
65      68      85      78  ]
  [    27      24      40      53      49      54      54      55 
---      73      69      68  ]
  [    32      31      37      38      40      47      41      52 
48     ---      62      61  ]
  [    18      19      21      29      31      42      36      34 
49      58     ---      60  ]
  [    23      27      27      33      25      42      35      37 
51      58      56     ---  ]

Showing examples with 50 choices would lead to line-wrapping problems in 
a message, but don't lead to any calculation problems.

Yes, doing the optimization calculations for 50 choices takes longer 
than for fewer choices, but the calculation time still is in minutes -- 
not the years or lifetimes that Warren claims.

Circular ambiguity is what increases the calculation time.  However, the 
increase is polynomial -- not N-factorial -- in the number of choices. 
Therefore it's worth looking at a revealing example of circular ambiguity.

Case 6:  Here are the pairwise counts arranged in the unsorted 
(ballot-listing) sequence:

  [   ---       5       6       7  ]
  [     7     ---       6       5  ]
  [     6       6     ---       9  ]
  [     5       7       3     ---  ]

In this example, these 8 sequences have the same highest score:

Sequence: B , C , A , D
Sequence: B , C , A , D
Sequence: A , C , D , B
Sequence: B , A , C , D
Sequence: B , C , A , D
Sequence: C , A , D , B
Sequence: C , B , A , D
Sequence: C , D , B , A

Here is the matrix for one of the highest-score sequences:

  [   ---       6       7       5  ]
  [     6     ---       6       9  ]
  [     5       6     ---       7  ]
  [     7       3       5     ---  ]

Notice that there are some relatively big numbers in the lower-left 
area, and some relatively small numbers in the upper-right area.  This 
means that we cannot visually (or quickly) verify that this sequence 
would be one of the sequences with the highest score.

Also notice that the voter preferences are so muddled that these are the 
only clear patterns that are easy to see in the highest-score sequences: 
(1) Choice D is the least popular;  (2) Choice A probably does not 
deserve to win;  (3) As a consequence, choices B and C are essentially 
tied for first place.

Also notice that the pairwise counts for choices B and C indicate that 
half the voters (six) prefer B over C, and the other half (six) prefer C 
over B.  (BTW, this pairwise-comparison cross-check method is available 
for all the Condorcet methods.)

If this high level of circular ambiguity were to occur in a case with 50 
candidates, an approximation would produce results that are as good as 
the "full-calculation" method.

If an election has 135 candidates -- as happened in the special recall 
election that Arnold Schwarzenegger won to become governor of California 
-- the lower-ranked choices can be dropped from the calculations, and 
the top few candidates can be carefully ranked -- using either the 
"full" method or the optimization method -- to ensure that the sequence 
with the highest score is correctly identified.

So, wrapping up this explanation:

If the Condorcet-Kemeny problem were in the field of encryption, then of 
course only an exact solution would be relevant.

But the Condorcet-Kemeny problem is an optimization problem -- or it can 
be regarded as a sorting problem -- where the goal is to check various 
sequences and find the one (or ones in the case of ties) that move the 
biggest pairwise counts into the upper-right triangular area of a 
matrix, while moving the smallest pairwise counts into the lower-left 
triangular area.

Doing this optimization can be done fast, even when 50 (or more) 
candidates are in the race.  And the result is easy to visually verify 
-- without the aid of a computer -- as to whether the ranking involves 
some muddled voter preferences at any ranking levels, and, if so, which 
candidates are involved.

At the ranking levels where the voter preferences are not muddled, a 
well-designed approximation algorithm -- particularly the one in the 
recently released, open-source, VoteFair ranking software -- efficiently 
yields the same results as the full-calculation method.

I'm not the only person to recognize that Condorcet-Kemeny results are 
not really that time-consuming; here is a recent quote from Kristofer 
Munsterhjelm: "Kemeny isn't that unreasonable in practical use. My 
integer linear programming implementation even manages 20-30 candidates, 
though it does take quite a bit of time on the high end."

The calculation algorithm in VoteFair ranking is highly efficient, and 
it does handle 50 choices within a few minutes.

Speaking of which, I'm still looking forward to Warren supplying a 
40-candidate or 50-candidate case (as ballot preferences, not pairwise 
counts because they might not correlate with a real ranking scenario) 
that he thinks would take a long time to calculate, and I'll be happy to 
measure the calculation time.  And I'll share the sorted pairwise counts 
in matrix form so that anyone can visually verify that the full ranking 
sequence is correct, and that if there is a deserving winner then that 
candidate is correctly ranked in first place.

Richard Fobes

-------- full reply from Warren is below -----------

On 3/4/2012 2:44 PM, Warren Smith wrote:
 > On Sun, Mar 4, 2012 at 3:44 PM, Richard Fobes
 > <ElectionMethods at votefair.org>  wrote:
 >> Finally, after reading the articles cited by Warren Smith (listed at the
 >> bottom of this reply) plus some related articles, I can reply to his
 >> insistence that Condorcet-Kemeny calculations take too long to 
calculate.
 >>   Also, this reply addresses the same claim that appears in 
Wikipedia both in
 >> the "Kemeny-Young method" article and in the comparison table within the
 >> Wikipedia "Voting systems" article (in the "polynomial time" column that
 >> Markus Schulze added).
 >>
 >> One source of confusion is that Warren, and perhaps others, regard the
 >> Condorcet-Kemeny problem as a "decision problem" that only has a 
"yes" or
 >> "no" answer.  This view is suggested by Warren's reference (below and in
 >> other messages) to the problem as being NP-complete, which only 
applies to
 >> decision problems.  Although it is possible to formulate a decision 
problem
 >> based on one or more specified characteristics of the Condorcet-Kemeny
 >> method, that is a different problem than the Condorcet-Kemeny problem.
 >
 > --the optimization problem is at least as hard as the decision
 > problem.You are erroneously creating the impression I somehow
 > was unaware of this, or that you somehow have here got some new
 > insight.  Neither is true.
 >
 >
 >
 >> In the real world of elections, the Condorcet-Kemeny problem is to 
calculate
 >> a ranking of all choices (e.g. candidates) that maximizes the 
sequence score
 >> (or minimizes the "Kemeny score").
 >>
 >> Clearly the Condorcet-Kemeny problem is an optimization problem, not a
 >> decision problem (and not a search problem).  It is an optimization 
problem
 >> because we have a way to measure how closely the solution reaches 
its goal.
 >>
 >> (For contrast, consider the NP-hard "subset sum problem" in which 
the goal
 >> is to determine whether a specified list of integers contains a 
subset that
 >> can be added and/or subtracted to yield zero.  Any subset either sums to
 >> zero or it doesn't sum to zero.  This makes it easy to formulate the 
related
 >> decision (yes/no) problem that asks whether such a subset exists for 
a given
 >> set of numbers.)
 >
 >
 >
 >
 >> Because the Condorcet-Kemeny problem is an optimization problem, the
 >> solution to the Condorcet-Kemeny problem can be an approximation. 
If this
 >> approach is used, it becomes relevant to ask how closely the 
approximation
 >> reaches the ranking that has the highest sequence score. Yet even this
 >> question -- of "how close?" -- is not a decision problem (because it 
goes
 >> beyond a yes or no answer).
 >>
 >> Keeping in mind that VoteFair popularity ranking calculations are
 >> mathematically equivalent to the Condorcet-Kemeny method, my claim 
is that
 >> VoteFair popularity ranking calculations yield, at the least, the same
 >> top-ranked choice, and the same few top-ranked choices, as the solution
 >> produced by examining every sequence score -- except (and this is the
 >> important part) in cases where the voter preferences are so 
convoluted that
 >> any top-ranked choice and any few top-ranked choices would be 
controversial.
 >>   As one academic paper elegantly put it: "garbage in, garbage out".
 >>
 >> More specifically, here is a set of claims that more rigorously 
state the
 >> above ambiguous claim.
 >>
 >> Claim 1: For _some_ _instances_, a polynomial-time calculation can 
identify
 >> the full ranking that produces the highest Condorcet-Kemeny sequence 
score.
 >
 > --oh whoo-whee.   Here's another claim: for SOME planets, I can
 > readily find a million dollars in gold piled up right next to me.
 >
 >> Claim 2: For _some_ _instances_, a polynomial-time calculation can 
rank the
 >> top most-popular candidates/choices and this partial ranking will be the
 >> same as the top portion of the full ranking as determined by 
identifying the
 >> highest Condorcet-Kemeny sequence score.
 >>
 >> Claim 3: For the _remaining_ _instances_ (not covered in Claims 1 
and 2), an
 >> approximation of the full Condorcet-Kemeny ranking can be calculated in
 >> polynomial time.
 >
 > --what kind of "approximation"?  I can find an "approximation" to
 > a million dollars in gold, namely, 1 penny.
 >
 >> Claim 4: For any cases in which the top-ranked candidate/choice 
according to
 >> the VoteFair popularity ranking algorithm differs from the top-ranked
 >> candidate/choice according to a full calculation of all sequence 
scores, the
 >> outcome of a runoff election between the two candidates/choices would be
 >> difficult to predict.
 >>
 >> As done in the academic literature, I am excluding the cases in 
which more
 >> than one sequence has the same highest sequence score.
 >
 > --I'm not sure what that meant, but it sounds like garbage too.
 >
 >> To help clarify the validity of these claims, I'll use an analogy.
 >>
 >> Consider a special case of the rigorously studied Traveling Salesman 
Problem
 >> (TSP), which is NP-hard to solve.  (The TSP also can be expressed as a
 >> decision problem, in which case the decision problem is NP-complete, but
 >> that variation is not the problem discussed here.)
 >>
 >> The special case -- which I will refer to as the non-returning Traveling
 >> Salesman Problem -- is that we want to know which city the salesman 
visits
 >> first, and we want to know, with successively less interest, which 
city the
 >> salesman visits second, third, and so on.  Additionally, for this 
special
 >> case, we specify that the cities to be visited are roughly located 
between a
 >> beginning point "B" and and ending point "E".
 >>
 >> To make this special case mathematically equivalent to the normal 
Traveling
 >> Salesman Problem in which the salesman returns to the starting city, we
 >> create a path of closely spaced cities (labeled "+" below) that lead 
back to
 >> the starting city "B".
 >>
 >> Here is a diagram of this problem.  Remember that the most important 
thing
 >> we want to know is which city ("*") the salesman visits first.
 >>
 >> B = Beginning city
 >> * = City to visit
 >> E = Ending city for main portion
 >> + = City on path back to beginning
 >> (periods = background; assumes monospace font)
 >>
 >> Instance 1:
 >> .................................................B.
 >> .....................................*............+
 >> ..................................................+
 >> .....................................*............+
 >> ...................................*..............+
 >> ..............................*...................+
 >> ..................................................+
 >> ................................*.................+
 >> .........................*........................+
 >> ......................*.....*.....................+
 >> ..................................................+
 >> ..................*..*.....*......................+
 >> ..........*....*..................................+
 >> .......*...............*..........................+
 >> ..........*......*................................+
 >> .....*...............*............................+
 >> .........*....*.........*.........................+
 >> ..........*........*..............................+
 >> .............*....................................+
 >> E.................................................+
 >> +.................................................+
 >> +.................................................+
 >> +++++++++++++++++++++++++++++++++++++++++++++++++++
 >>
 >> In this case it is obvious which city is the first one on the path 
from B to
 >> E.  And it is obvious which are the next four cities on the path.
 >>
 >> What we do not know is the sequence of cities after that (for the 
path that
 >> is shortest).
 >
 > --golly,we are taking up time on a red herring aren't we?
 > To be clear, let me state a few facts that may have escaped Fobes.
 >
 > For traveling saleman problem (TSP),
 >
 > 1. decision problem - is there a tour shorter than X? - is NP-hard.
 >
 > 2. optimization problem - finding best tour, is NP-hard.
 >
 > 3. FInding even the first step in the best tour, is NP-hard.
 >
 > 4. Approximate optimization problem: finding an approximately best
 > tour (for a general distance matrix) to within a factor of 9999
 > billion, is NP-hard.
 >
 > 5. Finding just the first edge, on any tour (not necessarily the best
 > tour) whose total tour-cost is within a factor of 9999 billion of the
 > cost of the optimal tour... is NP-hard.
 >
 > Are you getting the picture yet?  Don't be fooled by Fobes trying to
 > act as though I had somehow not realized this.  I knew all this ages
 > ago,
 > and tried (unsuccessfully) to impart some semblance of a clue to
 > Fobes.  OK, back to Fobesian essay now...
 >
 >> Now let's consider a different instance of this non-returning Traveling
 >> Salesman Problem.
 >>
 >> Instance 2:
 >> .................................................B.
 >> ..........................*.......................+
 >> ........................*....*....................+
 >> ................*.........*...*...................+
 >> .............*.........*....*...*.*...............+
 >> ................*...*......*.....*...*............+
 >> .......................*......*...*......*........+
 >> ..........*......*.........*......*...*...........+
 >> .............*........*.........*......*..........+
 >> ..................*.........*......*..............+
 >> .........*.....*.......*..........................+
 >> .............*.....*..........*....*..............+
 >> ..................*..*.....*......................+
 >> ..........*....*..................................+
 >> .......*...............*..........................+
 >> ..........*......*................................+
 >> .....*...............*............................+
 >> .........*....*.........*.........................+
 >> ..........*........*..............................+
 >> .............*....................................+
 >> E.................................................+
 >> +.................................................+
 >> +.................................................+
 >> +++++++++++++++++++++++++++++++++++++++++++++++++++
 >>
 >> In this instance we cannot know which city is the first city on the 
shortest
 >> path until we know the shortest path through all the cities.
 >>
 >> Calculating the absolute shortest path in a convoluted case like 
Instance 2
 >> might require a calculation time that is super-polynomial (more than 
what
 >> can be expressed as a polynomial function of the city count).
 >>
 >> However, we can estimate the shortest path.
 >>
 >> Such an approximation might identify a first city that is different 
from the
 >> first city on the absolute shortest path. If the "wrong" city is 
identified
 >> as the first-visited city, it is understandable that this occurs because
 >> there is not a clearly identifiable first-visit city in this instance.
 >>
 >> This analogy can be extended to the Condorcet-Kemeny problem.
 >>
 >> In normal election situations, the most important part of the 
solution is
 >> the first-ranked winner.  In fact, most voting methods are not 
_designed_ to
 >> identify more than the first-ranked winner.
 >>
 >> In contrast, the Condorcet-Kemeny problem is designed to identify a full
 >> ranking.  Accordingly, the second-most important part (of solving the
 >> Condorcet-Kemeny problem) is to identify the top few highest-ranked 
choices.
 >>
 >> Both of these important goals can be achieved without fully ranking 
all the
 >> choices.  This is analogous to solving Instance 1 of the non-returning
 >> Traveling Salesman Problem.
 >
 > --In the TSP with general distance matrix, I repeat, even finding just
 > THE ONE FIRST STEP of the best tour, or any non-best but approximately
 > best tour, is NP-hard.  In the Kemeny problem, just finding the winner
 > alone, without trying to find the rest of the order, still is NP-hard.
 >   I knew all this, and said all this, to Fobes, ages ago.  One
 > day maybe it will penetrate.
 >
 >> The importance of calculating the few top-ranked choices, and the 
reduced
 >> importance of calculating the lower-ranked choices, is further 
demonstrated
 >> when the Condorcet-Kemeny method is used to aggregate (merge/join/etc.)
 >> separate rankings from different search engines (to yield "meta-search"
 >> results, which is the intended goal specified by IBM employees who 
authored
 >> one of the cited articles about Condorcet-Kemeny calculations).
 >>   Specifically, a search-engine user is unlikely to look at the search
 >> results beyond the first few pages, which means that carefully 
calculating
 >> the full meta-search ranking for thousands of search results is 
pointless,
 >> and therefore the calculation time for a full ranking is irrelevant.
 >>
 >> (As a further contrast, to clarify this point about a partial 
solution being
 >> useful, the subset-sum problem does not have a partial solution. All 
that
 >> matters is the existence of at least one solution, or the absence of any
 >> solution.)
 >>
 >> Therefore, in some instances we can solve the NP-hard Condorcet-Kemeny
 >> problem "quickly" (in polynomial time) in the same way that we can 
"quickly"
 >> (in polynomial time) solve some instances -- such as Instance 1 -- 
of the
 >> NP-hard non-returning Traveling Salesman Problem.
 >
 > --and in some instances, there is a pile of gold right next to me.
 > This is laughable.  The statement "in some instances my algorithm can
 > work" is essentially equivalent to the statement "my algorithm does
 > not work."
 >
 > It is NOT ACCEPTABLE to have a voting algorithm that works only "in
 > some instances."  Get it through your head.  Algorithms either work,
 > or they do not.  "work" means always.  Not sometimes.   If they even
 > fail
 > one time, then it was an invalid algorithm.
 >
 > I'm really annoyed that I have to keep on doing this.   You need
 > to take computer science 101.
 >
 >> In instances where we use an approximate solution for the 
Condorcet-Kemeny
 >> problem, the approximate solution can be calculated in polynomial time.
 >
 > --again, the use of the catch-all, utterly meaningless, word
 > "approximate."  1 penny is an "approximation" to 1 million dollars. It
 > is not a very good approximation.  With no goodness guarantee, this is
 > all totally useless.
 >
 > When Fobes says "I have an approximation" it is equivalent to "I am
 > dreaming, but I feel very good in my dream, so why doesn't the rest of
 > the world feel good?"  Because you have no guarantee, so you have
 > nothing.  That's why.  "Fobes feels good" is simply NOT ACCEPTABLE as
 > a justification for a voting algorithm.
 >
 >>   Specifically, the algorithm used for VoteFair popularity ranking, 
which
 >> seeks to maximize the Condorcet-Kemeny sequence score, always can be 
solved
 >> in polynomial time (as evidenced by all the programming loops being
 >> bounded).
 >
 > --And I can "seek to find a million dollars in gold" using an
 > algorithm guaranteed to stop in 1 minute.  I can absolutely guarantee
 > it.
 > So what?  Why should anybody care?
 >
 >> To further clarify these points,
 >
 > --by which Fobes means "to further try to obscure the truth at great 
length"...
 >
 >> consider the following instance of the
 >> non-returning Traveling Salesman Problem.
 >>
 >> Instance 3:
 >> .................................................B.
 >> ..........................*.......................+
 >> ........................*....*....................+
 >> ................*.........*...*...................+
 >> .............*.........*....*...*.*...............+
 >> ................*...*......*.....*...*............+
 >> .......................*......*...*......*........+
 >> .................*.........*......*...*...........+
 >> .............*........*.........*......*..........+
 >> ..................*.........*......*..............+
 >> .......................*..........................+
 >> ...................*..............................+
 >> ..................*..*............................+
 >> ..........*....*..................................+
 >> .......*...............*..........................+
 >> ..........*......*................................+
 >> .....*...............*............................+
 >> .........*....*.........*.........................+
 >> ..........*........*..............................+
 >> .............*....................................+
 >> E.................................................+
 >> +.................................................+
 >> +.................................................+
 >> +++++++++++++++++++++++++++++++++++++++++++++++++++
 >>
 >> For this instance, we can calculate the absolute shortest path 
through the
 >> group of cities closest to the starting point "B" without also 
calculating
 >> the absolute shortest path through the group of cities closest to 
the ending
 >> point "E".
 >>
 >> Similarly some instances of the Condorcet-Kemeny problem do not require
 >> calculating the exact order of lower-ranked choices (e.g. candidates) in
 >> order to exactly find the maximum-sequence-score ranking of the 
top-ranked
 >> choices.
 >
 >> Now that the word "instance" and the concept of a partial order are 
clear, I
 >> will offer proofs for Claims 1, 2, and 3.
 >
 > --great.  A ton of irrelevant diagrams about an unrelated problem are
 > offered as "clarification" and now for a ton of proofs of irrelevant
 > and useless claims, are offered.  Oh joy.
 >
 >> Proof of Claim 1: If an instance has a Condorcet winner and each
 >> successively ranked choice is pairwise preferred over all the other
 >> remaining choices, this instance can be ranked in polynomial time.
 >>
 >> Proof of Claim 2: If an instance has a Condorcet winner and the next few
 >> successively ranked choices are each pairwise preferred over all the
 >> remaining choices, the top-ranked choices for this instance can be 
ranked in
 >> polynomial time.
 >>
 >> Proof of Claim 3: There are polynomial-time approximation methods 
that can
 >> efficiently find a sequence that has a Condorcet-Kemeny sequence 
score that
 >> is close to the largest sequence score.
 >>
 >> (Clarification: I am not claiming that a ranking result based on
 >> approximation will have the same fairness characteristics that are
 >> attributed to the "exact" Condorcet-Kemeny method.)
 >>
 >> Using lots of real-life data, plus data that has unusual 
calculation-related
 >> characteristics, I have tested the VoteFair ranking algorithm 
against the
 >> full approach that calculates all sequence scores for up to six 
choices.  In
 >> all these cases there are no differences in the top-ranked choice, 
nor are
 >> there any differences in the full ranking for the cases that have no 
ties.
 >>   (The cases that involve ties involve multiple sequences that have 
the same
 >> highest score, the resolution of which is not specified in the
 >> Condorcet-Kemeny method.)
 >>
 >> Of course Claim 4 would be difficult to prove. (This claim says that 
if the
 >> two methods do not identify the same winner, the outcome of a runoff
 >> election would be difficult to predict.)  The point of Claim 4 is to 
clarify
 >> the concept of "controversial" and state that if the two methods 
identify
 >> different winners, neither winner is uncontroversial.
 >
 > --in other words, Fobes has a lot of dreams that his algorithm somehow
 > works well some of the time.  He has absolutely nothing to base this
 > on other than his own personal feelings.  We don't know when it'll
 > work well and when it'll work badly.  Sounds like a great voting
 > method.
 >
 >> As a reminder (especially for anyone skimming), I am not saying that the
 >> Traveling Salesman Problem is mathematically related to the 
Condorcet-Kemeny
 >> problem (beyond both being categorized as NP-hard problems). 
Instead I am
 >> using the well-studied traveling salesman problem as an analogy to 
clarify
 >> characteristics of the Condorcet-Kemeny problem that some 
election-method
 >> experts seem to misunderstand.
 >
 > --well, YOU misunderstand.  Not necessarily anybody else.
 >
 >> Perhaps the misunderstanding arises because the Condorcet-Kemeny 
method must
 >> fully rank all the choices in order to identify the top-ranked 
choice.  In
 >> contrast, other methods do the opposite, namely they identify the 
top-ranked
 >> choice and then, if a further ranking is needed, the process is repeated
 >> (although for instant-runoff voting and the Condorcet-Schulze method the
 >> process of calculating the winner yields information that can be used to
 >> determine some or all of a full ranking).
 >>
 >> If anyone has questions about the calculations done by the open-source
 >> VoteFair popularity ranking software, and especially about its 
ability to
 >> efficiently identify the highest sequence score based on meaningful 
voter
 >> preferences, I invite them to look at the clearly commented code. 
The code
 >> is on GitHub (in the CPSolver account) and on the Perl CPAN archive 
(which
 >> is mirrored on more than two hundred servers around the world).
 >
 > --normally, people would feel embarrassed about widely distributing
 > garbage.  To Fobes, the fact he has widely distributed it, seems in
 > his mind to constitute proof it is not garbage!  QED!
 >
 >> In summary, although the Condorcet-Kemeny method is mathematically
 >> categorized as an NP-hard problem, the instances that are NP-hard to 
solve
 >> involve either the less-important lower-ranked choices (analogous to
 >> Instance 1 in the non-returning Traveling Salesman Problem),
 >
 > --wrong. Complete and utter lie.  Determining just the single winner,
 > is NP-hard.
 >
 >> or involve
 >> convoluted top-ranked voter preferences that yield controversial results
 >> (analogous to Instances 2 and 3), or both.
 >
 > --oh golly.  My voting method might misbehave in a difficult-for-it
 > election.  But it works great in easy-for-it elections!
 >
 > Gee Fobes.  Couldn't we always say that about ANYTHING?
 >
 > So in other words your whole diatribe means NOTHING?
 >
 >>   For all other instances -- which
 >> include all meaningful election situations -- score-optimized 
top-ranking
 >> results can be calculated in polynomial time.
 >
 > --oh I see. So the game is: "Fobes' method works great, except when it
 > doesn't.  But when it doesn't I hereby solve the problem by branding
 > that a 'non-meaningful election situation.'    The definition of "non
 > meaningful" hereby is "my method fails."
 >
 > But golly, couldn't anybody always do that with any method at all?
 >
 > THIS IS NOT ACCEPTABLE.
 >
 >> Clearly, in contrast to what Warren Smith and Markus Schulze and 
some other
 >> election-method experts claim, the calculation time required by the
 >> Condorcet-Kemeny method is quite practical for use in real-life 
elections.
 >
 > --you've proven the opposite.  This is one of the most laughable and
 > pathetic screeds I ever read.
 >
 >> I'll close with a quote from the article by (IBM researchers) 
Davenport and
 >> Kalananam that Warren cited: "NP-hardness is a only [sic] worst case
 >> complexity result which may not reflect the difficulty of solving 
problems
 >> which arise in practice."
 >
 > --indeed, it may not.  NP-hard problems can often be solved quickly
 > just not always. Having an election method that sometimes succeeds, is
 > NOT ACCEPTABLE.
 >
 >> About the citations below: I was not able to read the article by 
Bartholdi,
 >> Tovey, and Trick because it requires paying a $35 fee.  Alas, it is the
 >> article that other articles refer to for the proof of NP-hardness.
 >
 > --libraries tend to be free.  But you have to go to them.  Also
 > more than one NP-hardness proofs have been found.
 >

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: CondorcetKemenyCalculationTimeClarification_2012March30.txt
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20120330/8e204d82/attachment-0004.txt>


More information about the Election-Methods mailing list