This algorithm, at first read, seems that it could still be NP-hard if all candidates were part of a (three-candidate???) cycle with all other candidates. Which is, of course, a case which would hardly ever arise in real life; but one which must be considered in giving a worst-case execution time order for the algorithm.<div>


<br></div><div>Jameson<br><br><div class="gmail_quote">2011/12/21 Richard Fobes <span dir="ltr"><<a href="mailto:ElectionMethods@votefair.org">ElectionMethods@votefair.org</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


As previously promised, I am revealing how Condorcet-Kemeny calculations can be done fast.<br>

<br>

How fast?  Calculating a "complex" case with 12 choices is so fast that I haven't bothered to time the calculations.  As an estimate, it is on the order of a couple of seconds or less.<br>

<br>

The estimated time for calculating results for 50 choices (!) is less than a minute -- which is far less than a "lifetime".<br>

<br>

The algorithm is described in the documentation that accompanies the just-announced VoteFair ranking software.<br>

<br>

The full explanation of the algorithm is described on these three web pages:<br>

<br>

    <a href="http://www.votefair.org/calculation_details_popularity.html" target="_blank">http://www.votefair.org/<u></u>calculation_details_<u></u>popularity.html</a><br>

<br>

    <a href="http://www.votefair.org/insertion_sort_ranking.html" target="_blank">http://www.votefair.org/<u></u>insertion_sort_ranking.html</a><br>

<br>

    <a href="http://www.votefair.org/choice_specific_pairwise_score.html" target="_blank">http://www.votefair.org/<u></u>choice_specific_pairwise_<u></u>score.html</a><br>

<br>

For convenience, below is the description for the "insertion-sort" part of the algorithm, which is the lowest-level part of the algorithm.  It is followed by a brief description of the full algorithm (excluding the initial estimation algorithm).<br>


<br>

As a reminder, the Condorcet-Kemeny method calculates a full ranking of all choices; it does not just find the single winner.<br>

<br>

Also consider that the Condorcet-Kemeny method does not specify how to resolve sequence-score ties.  The VoteFair ranking software does resolve such sequence-score ties by identifying which choices are tied at which ranking levels.<br>


<br>

Richard Fobes<br>

<br>

<br>

-------- Insertion-sort part of the algorithm ---------<br>

<br>

This explanation clarifies how the<br>

well-known insertion-sort algorithm is applied<br>

to VoteFair popularity ranking in a way that<br>

greatly reduces the number of calculations<br>

needed to find maximum sequence scores.<br>

(This  method is just part of the full<br>

algorithm, which is explained [elsewhere].)<br>

<br>

Consider an example in which there are five<br>

choices named A, B, C, D, and E, with a final<br>

sort order that matches this alphabetical<br>

order.<br>

<br>

Notation: The notation A>B refers to how<br>

many voters pairwise prefer choice A over<br>

choice B, and the notation B>A refers to how<br>

many voters pairwise prefer choice B over<br>

choice A.  This notation always uses the<br>

"greater-than" symbol ">", and never uses<br>

the "less-than" symbol "<".<br>

<br>

At an intermediate stage in this sorting<br>

example, suppose that the choices A, C, and E<br>

have been sorted -- into this correct<br>

order -- and choice B is about to be<br>

sorted, and choice D remains unsorted.<br>

The pairwise counts for this arrangement are<br>

shown below.  The asterisks show the<br>

separation between sorted choices and unsorted<br>

choices.<br>

<br>

      |       |       |       |       |       |<br>

      |   A   |   C   |   E   |   B   |   D   |<br>

      |       |       |       |       |       |<br>

 -----*************************<u></u>-------+-------+<br>

      * \     |       |       *       |       |<br>

  A   *   \   |  A>C  |  A>E  *  A>B  |  A>D  |<br>

      *     \ |       |       *       |       |<br>

 -----+-------+-------+-------+<u></u>-------+-------+<br>

      *       | \     |       *       |       |<br>

  C   *  C>A  |   \   |  C>E  *  C>B  |  C>D  |<br>

      *       |     \ |       *       |       |<br>

 -----+-------+-------+-------+<u></u>-------+-------+<br>

      *       |       | \     *       |       |<br>

  E   *  E>A  |  E>C  |   \   *  E>B  |  E>D  |<br>

      *       |       |     \ *       |       |<br>

 -----*************************<u></u>-------+-------+<br>

      |       |       |       | \     |       |<br>

  B   |  B>A  |  B>C  |  B>E  |   \   |  B>D  |<br>

      |       |       |       |     \ |       |<br>

 -----+-------+-------+-------+<u></u>-------+-------+<br>

      |       |       |       |       | \     |<br>

  D   |  D>A  |  D>C  |  D>E  |  D>B  |   \   |<br>

      |       |       |       |       |     \ |<br>

 -----+-------+-------+-------+<u></u>-------+-------+<br>

<br>

The diagonal line passes through empty cells<br>

-- that would otherwise represent a<br>

choice's comparison with itself, such as<br>

A>A.<br>

<br>

The diagonal line also is the border between<br>

the upper-right triangular area and the<br>

lower-left triangular area.  The sequence<br>

score for the current sequence is the sum of<br>

all the pairwise counts in the upper-right<br>

triangular area (currently A>C + A>E +<br>

A>B + A>D + C>E + C>B + C>D +<br>

E>B + E>D + B>D).<br>

<br>

The goal of these calculations is to find<br>

the maximum sequence score, which means the<br>

goal is to change the sequence so that the<br>

largest pairwise counts move into the<br>

upper-right triangular area, leaving the<br>

smallest pairwise counts in the lower-left<br>

triangular area.<br>

<br>

The first step in sorting choice B is to<br>

consider the possibility of moving it to the<br>

left of choice E, which would form the sequence<br>

A, C, B, E.  Here is the pairwise-count<br>

matrix for this sequence.  The asterisks<br>

now include choice B because this is a possible<br>

sort order.<br>

<br>

      |       |       |       |       |       |<br>

      |   A   |   C   |   B   |   E   |   D   |<br>

      |       |       |       |       |       |<br>

 -----*************************<u></u>********-------+<br>

      * \     |       |       |       *       |<br>

  A   *   \   |  A>C  |  A>B  |  A>E  *  A>D  |<br>

      *     \ |       |       |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       | \     |       |       *       |<br>

  C   *  C>A  |   \   |  C>B  |  C>E  *  C>D  |<br>

      *       |     \ |       |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       |       | \     |       *       |<br>

  B   *  B>A  |  B>C  |   \   |  B>E  *  B>D  |<br>

      *       |       |     \ |  ---  *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       |       |       | \     *       |<br>

  E   *  E>A  |  E>C  |  E>B  |   \   *  E>D  |<br>

      *       |       |  ---  |     \ *       |<br>

 -----*************************<u></u>********-------+<br>

      |       |       |       |       | \     |<br>

  D   |  D>A  |  D>C  |  D>B  |  D>E  |   \   |<br>

      |       |       |       |       |     \ |<br>

 -----+-------+-------+-------+<u></u>-------+-------+<br>

<br>

The only pairwise counts that crossed the<br>

diagonal line are the (underlined) counts<br>

B>E and E>B, which swapped places.<br>

All the other pairwise counts that move do not<br>

cross the diagonal line; they stay on the same<br>

side of the diagonal line.<br>

<br>

As a result, the score for this sequence,<br>

compared to the score for the previous<br>

sequence, increases (or decreases if negative)<br>

by the amount B>E minus E>B.  In<br>

this case (assuming there are no complications<br>

that are explained later) the sequence score<br>

has increased because in the final<br>

(alphabetical) sort order, choice B appears<br>

before choice E.<br>

<br>

The next step in sorting choice B is to<br>

consider the possibility of moving it to the<br>

left of choice C, which would form the sequence<br>

A, B, C, E.  Here is the pairwise-count<br>

matrix for this sequence.<br>

<br>

      |       |       |       |       |       |<br>

      |   A   |   B   |   C   |   E   |   D   |<br>

      |       |       |       |       |       |<br>

 -----*************************<u></u>********-------+<br>

      * \     |       |       |       *       |<br>

  A   *   \   |  A>B  |  A>C  |  A>E  *  A>D  |<br>

      *     \ |       |       |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       | \     |       |       *       |<br>

  B   *  B>A  |   \   |  B>C  |  B>E  *  B>D  |<br>

      *       |     \ |  ---  |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       |       | \     |       *       |<br>

  C   *  C>A  |  C>B  |   \   |  C>E  *  C>D  |<br>

      *       |  ---  |     \ |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       |       |       | \     *       |<br>

  E   *  E>A  |  E>B  |  E>C  |   \   *  E>D  |<br>

      *       |       |       |     \ *       |<br>

 -----*************************<u></u>********-------+<br>

      |       |       |       |       | \     |<br>

  D   |  D>A  |  D>B  |  D>C  |  D>E  |   \   |<br>

      |       |       |       |       |     \ |<br>

 -----+-------+-------+-------+<u></u>-------+-------+<br>

<br>

The only pairwise counts that crossed the<br>

diagonal line are the (underlined) counts<br>

B>C and C>B, which swapped places.<br>

The other pairwise counts that moved remained<br>

on the same side of the diagonal line.<br>

<br>

The score for this sequence increases (or<br>

decreases if negative) by the amount B>C<br>

minus C>B.  In this case the sequence<br>

score has increased because (in the final<br>

alphabetical order) choice B appears before<br>

choice C.<br>

<br>

The final step in sorting choice B is to<br>

consider the possibility of moving it to the<br>

left of choice A, which would form the sequence<br>

B, A, C, E.  Here is the matrix for this<br>

sequence.<br>

<br>

      |       |       |       |       |       |<br>

      |   B   |   A   |   C   |   E   |   D   |<br>

      |       |       |       |       |       |<br>

 -----*************************<u></u>********-------+<br>

      * \     |       |       |       *       |<br>

  B   *   \   |  B>A  |  B>C  |  B>E  *  B>D  |<br>

      *     \ |  ---  |       |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       | \     |       |       *       |<br>

  A   *  A>B  |   \   |  A>C  |  A>E  *  A>D  |<br>

      *  ---  |     \ |       |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       |       | \     |       *       |<br>

  C   *  C>B  |  C>A  |   \   |  C>E  *  C>D  |<br>

      *       |       |     \ |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       |       |       | \     *       |<br>

  E   *  E>B  |  E>A  |  E>C  |   \   *  E>D  |<br>

      *       |       |       |     \ *       |<br>

 -----*************************<u></u>********-------+<br>

      |       |       |       |       | \     |<br>

  D   |  D>B  |  D>A  |  D>C  |  D>E  |   \   |<br>

      |       |       |       |       |     \ |<br>

 -----+-------+-------+-------+<u></u>-------+-------+<br>

<br>

The only pairwise counts that crossed the<br>

diagonal line are the (underlined) counts<br>

B>A and A>B, which swapped places.<br>

The other pairwise counts that moved remained<br>

on the same side of the diagonal line.<br>

<br>

The score for this sequence increases (or<br>

decreases if negative) by the amount B>A<br>

minus A>B.  In this case the sequence<br>

score has decreased because (in the final<br>

alphabetical order) choice B appears after, not<br>

before, choice A.<br>

<br>

At this point choice B has been tested at<br>

each position within the sorted portion.<br>

The  maximum sequence score (for the sorted<br>

portion) occurred when it was between choices A<br>

and C.  As a result, choice B will be<br>

moved to the position between choices A and<br>

C.<br>

<br>

Notice that the full sequence score did not<br>

need to be calculated in order to find this<br>

"local" maximum.  These calculations only<br>

need to keep track of increases and decreases<br>

that occur as the being-sorted choice swaps<br>

places with successive already-sorted<br>

choices.<br>

<br>

The pairwise-count matrix with choice B in<br>

the second sort-order position (between A and<br>

C) is shown below.  Now choice D is the<br>

only unsorted choice.<br>

<br>

      |       |       |       |       |       |<br>

      |   A   |   B   |   C   |   E   |   D   |<br>

      |       |       |       |       |       |<br>

 -----*************************<u></u>********-------+<br>

      * \     |       |       |       *       |<br>

  A   *   \   |  A>B  |  A>C  |  A>E  *  A>D  |<br>

      *     \ |       |       |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       | \     |       |       *       |<br>

  B   *  B>A  |   \   |  B>C  |  B>E  *  B>D  |<br>

      *       |     \ |       |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       |       | \     |       *       |<br>

  C   *  C>A  |  C>B  |   \   |  C>E  *  C>D  |<br>

      *       |       |     \ |       *       |<br>

 -----*-------+-------+-------+<u></u>-------*-------+<br>

      *       |       |       | \     *       |<br>

  E   *  E>A  |  E>B  |  E>C  |   \   *  E>D  |<br>

      *       |       |       |     \ *       |<br>

 -----*************************<u></u>********-------+<br>

      |       |       |       |       | \     |<br>

  D   |  D>A  |  D>B  |  D>C  |  D>E  |   \   |<br>

      |       |       |       |       |     \ |<br>

 -----+-------+-------+-------+<u></u>-------+-------+<br>

<br>

Choice D would be sorted in the same<br>

way.  Of course the maximum sequence score<br>

would occur when choice D is between choices C<br>

and E, so D is moved there.<br>

<br>

      |       |       |       |       |       |<br>

      |   A   |   B   |   C   |   D   |   E   |<br>

      |       |       |       |       |       |<br>

 -----*************************<u></u>****************<br>

      * \     |       |       |       |       *<br>

  A   *   \   |  A>B  |  A>C  |  A>D  |  A>E  *<br>

      *     \ |       |       |       |       *<br>

 -----*-------+-------+-------+<u></u>-------+-------*<br>

      *       | \     |       |       |       *<br>

  B   *  B>A  |   \   |  B>C  |  B>D  |  B>E  *<br>

      *       |     \ |       |       |       *<br>

 -----*-------+-------+-------+<u></u>-------+-------*<br>

      *       |       | \     |       |       *<br>

  C   *  C>A  |  C>B  |   \   |  C>D  |  C>E  *<br>

      *       |       |     \ |       |       *<br>

 -----*-------+-------+-------+<u></u>-------+-------*<br>

      *       |       |       | \     |       *<br>

  D   *  D>A  |  D>B  |  D>C  |   \   |  D>E  *<br>

      *       |       |       |     \ |       *<br>

 -----*-------+-------+-------+<u></u>-------+-------*<br>

      *       |       |       |       | \     *<br>

  E   *  E>A  |  E>B  |  E>C  |  E>D  |   \   *<br>

      *       |       |       |       |     \ *<br>

 -----*************************<u></u>****************<br>

<br>

Now there are no more choices to sort, so<br>

the resulting sequence is A, B, C, D, E.<br>

In this sequence the full sequence score<br>

-- which equals A>B + A>C + A>D +<br>

A>E + B>C + B>D + B>E + C>D +<br>

C>E + D>E -- is likely to be the<br>

highest possible sequence score.<br>

<br>

Additional calculations, as described below,<br>

are needed because in rare cases it is possible<br>

that moving two or more choices at the same<br>

time could produce a higher sequence<br>

score.  This concept is analogous to<br>

climbing a mountain in foggy conditions by<br>

always heading in the locally higher direction<br>

and ending up at the top of a peak and then,<br>

when the fog clears, seeing a higher peak.<br>

<br>

------- Full calculation method for VoteFair popularity ranking -------<br>

<br>

This is a description of the full algorithm<br>

used to calculate VoteFair popularity ranking<br>

results.<br>

<br>

The algorithm begins by calculating the<br>

Choice-Specific Pairwise-Score ranking.<br>

This pre-sort is a required part of the<br>

process.  Without it, some unusual cases<br>

can cause the calculations to fail to find the<br>

sequence with the highest score.  This<br>

pre-sort is analogous to starting a search for<br>

the highest mountain peak within a mountain<br>

range instead of starting the search within a<br>

large valley.<br>

<br>

The next step is to apply the insertion-sort<br>

method as described in the section above,<br>

including starting at the left/highest end.<br>

<br>

To ensure that all possible moves of each<br>

choice are considered, the insertion-sort<br>

method is done in both directions.<br>

Sorting in both directions means that in some<br>

sorting passes sorting moves choices to the<br>

left, as explained in the above example.<br>

In other sorting passes sorting starts by<br>

considering the right-most choice as the first<br>

sorted choice, and choices move to the right,<br>

into the sorted portion.  This convention<br>

ensures movement for choices that need to move<br>

right, instead of left, in order to cause an<br>

increase in the score.<br>

<br>

Complications can arise when there is<br>

"circular ambiguity", so additional steps are<br>

used.  The most common cases of circular<br>

ambiguity involve several choices that are tied<br>

for the same sort-order position.<br>

<br>

A key part of dealing with circular<br>

ambiguity is to follow this convention:<br>

whenever a choice can produce the same,<br>

highest, sequence score at more than one<br>

position, the choice is moved to the farthest<br>

of those highest-sequence-score positions.<br>

<br>

Another part of dealing with these<br>

complications is to sort the sequence multiple<br>

times.<br>

<br>

During the second sorting pass, if there is<br>

no circular ambiguity, the sequence of the<br>

choices in the pairwise matrix remains the<br>

same.  This lack of movement (when there<br>

is no circular ambiguity) occurs because the<br>

sorted and unsorted portions are<br>

adjacent.  Specifically, each choice to be<br>

sorted is already at the top (for left-movement<br>

sorting) or bottom (for right-movement sorting)<br>

of the "unsorted" portion, and it is being<br>

moved to the bottom (for left-movement sorting)<br>

or top (for right-movement sorting) of<br>

the "sorted" portion.  In such cases<br>

the only thing that moves is the boundary<br>

between the sorted choices and unsorted<br>

choices.<br>

<br>

However, in cases that involve circular<br>

ambiguity, the positions of some choices will<br>

change during the second and later sorting<br>

passes.  This happens because the<br>

convention (as explained above) is to move each<br>

choice as far as it will go, within the limits<br>

of maximizing the sequence score.<br>

<br>

During the sorting passes the highest<br>

sort-order (sequence) position of each choice<br>

is tracked, and the lowest sort-order position<br>

of each choice is tracked.  These highest<br>

and lowest positions are reset (to current<br>

positions) whenever the sequence score<br>

increases to a higher score.  At the end<br>

of the sorting process the highest and lowest<br>

positions reveal which choices are tied at the<br>

same popularity ranking level.<br>

<br>

Using the insertion-sort example, if choices<br>

B, C, and D can be in any order and still<br>

produce the same highest sequence score, then<br>

each of these choices would move to the left of<br>

the other two each time it is sorted, and each<br>

of these choices would have the same<br>

highest-ranked position of second place, and<br>

each would have the same lowest-ranked position<br>

of fourth place. Because these three choices<br>

have the same highest and lowest positions,<br>

they are easily identified as tied (at the same<br>

popularity ranking).<br>

<br>

More complex cases of circular ambiguity can<br>

occur.  To deal with these cases, and to<br>

ensure the correctness of the "winner" (the<br>

most popular choice), the sorting process is<br>

repeated for the top half (plus one) of the<br>

highest-ranked choices, and this sub-set<br>

sorting is repeated until there are just three<br>

choices.  For example, if there are 12<br>

choices, the sorting process is done for 12<br>

choices, then the top 7 choices, then the top 4<br>

choices, and finally the top 3 choices.<br>

Then the highest-ranked choice (or the choices<br>

that are tied at the top) is kept at the<br>

highest rank while the other choices are sorted<br>

a final time.  (If, instead, the<br>

least-popular choice is the most important one<br>

to identify correctly, the data supplied to<br>

this algorithm can be inverted according to<br>

preference levels, and then the calculated<br>

ranking can be reversed.)<br>

<br>

As a clarification, the extra sub-set<br>

sorting is done only if more than one sequence<br>

has the same highest sequence score.  This<br>

point is significant if the distinction between<br>

VoteFair popularity ranking and the<br>

Condorcet-Kemeny method is relevant.<br>

Specifically, the Condorcet-Kemeny method does<br>

not indicate how such "tied" sequence scores<br>

should be resolved, whereas VoteFair popularity<br>

ranking resolves such "tied" sequence scores as<br>

part of its calculation process.<br>

<br>

After all the sorting has been done, the<br>

highest and lowest ranking levels are used to<br>

determine the results.  For each choice<br>

its highest and lowest ranking levels<br>

are added together (which equals twice their<br>

average) and then multiplied times a<br>

constant.  The constant equals 10 times<br>

the number of choices minus one.  These<br>

numbers are converted to integers, and then<br>

these "averaged scaled integerized" values are<br>

used as the non-normalized ranking<br>

levels.  Two or more choices are ranked at<br>

the same level if they have the same<br>

"averaged-scaled-integerized" ranking<br>

values.<br>

<br>

The final calculation step is to normalize<br>

the "averaged-scaled-integerized" ranking<br>

levels so that the normalized ranking levels<br>

are consecutive, namely 1, 2, 3, etc. (so that<br>

no ranking levels are skipped).<br>

<br>

The result is a ranking that identifies<br>

which choice is first-most popular, which<br>

choice is second-most popular, and so on down<br>

to which choice is least popular.  Ties<br>

can occur at any level.<br>

<br>

------ Calculation time ------<br>

<br>

The full algorithm used to calculate<br>

VoteFair popularity ranking results  has a<br>

calculation time that is less than or equal to<br>

the following polynomial function:<br>

<br>

  T = A + ( B * N ) + ( C * ( N * N ) )<br>

<br>

where T is the calculation time, N is the<br>

number of choices, and A and B and C are<br>

constants.  (In mathematical notation, N *<br>

N would be written as N squared.)  This<br>

function includes the calculation time required<br>

for the Choice-Specific Pairwise-Score (CSPS)<br>

pre-sort calculations.<br>

<br>

This time contrasts with the slow execution<br>

times  of the "NP-hard" approach, in which<br>

every sequence score is calculated in order to<br>

find the sequence with the highest score.<br>

If every sequence score were calculated (from<br>

scratch), the calculation time would be<br>

proportional to:<br>

<br>

  N! * N * N<br>

<br>

where N is the number of choices, N! is N<br>

factorial (2 * 3 * 4 * ... * N), and N * N<br>

equals N squared.  Note that N factorial<br>

equals the number of possible sequences, and N<br>

squared times one-half approximately equals the<br>

number of pairwise counts that are added to<br>

calculate each sequence score.<br>

<br>

This clarification about calculation time is<br>

included because there is an academically<br>

common -- yet mistaken -- belief that<br>

calculating the "Condorcet-Kemeny method" is<br>

"NP-hard" and cannot be calculated in a time<br>

that is proportional to a polynomial function<br>

of N (the number of choices).<br>

<br>

(c) Copyright 2011 Richard Fobes at VoteFair.org<br>

<br>

(This description copied from VoteFair.org<br>

with permission.)<br>

<br>

----<br>

Election-Methods mailing list - see <a href="http://electorama.com/em" target="_blank">http://electorama.com/em</a> for list info<br>

</blockquote></div><br></div>