This algorithm, at first read, seems that it could still be NP-hard if all candidates were part of a (three-candidate???) cycle with all other candidates. Which is, of course, a case which would hardly ever arise in real life; but one which must be considered in giving a worst-case execution time order for the algorithm.<div>
<br></div><div>Jameson<br><br><div class="gmail_quote">2011/12/21 Richard Fobes <span dir="ltr"><<a href="mailto:ElectionMethods@votefair.org">ElectionMethods@votefair.org</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
As previously promised, I am revealing how Condorcet-Kemeny calculations can be done fast.<br>
<br>
How fast? Calculating a "complex" case with 12 choices is so fast that I haven't bothered to time the calculations. As an estimate, it is on the order of a couple of seconds or less.<br>
<br>
The estimated time for calculating results for 50 choices (!) is less than a minute -- which is far less than a "lifetime".<br>
<br>
The algorithm is described in the documentation that accompanies the just-announced VoteFair ranking software.<br>
<br>
The full explanation of the algorithm is described on these three web pages:<br>
<br>
<a href="http://www.votefair.org/calculation_details_popularity.html" target="_blank">http://www.votefair.org/<u></u>calculation_details_<u></u>popularity.html</a><br>
<br>
<a href="http://www.votefair.org/insertion_sort_ranking.html" target="_blank">http://www.votefair.org/<u></u>insertion_sort_ranking.html</a><br>
<br>
<a href="http://www.votefair.org/choice_specific_pairwise_score.html" target="_blank">http://www.votefair.org/<u></u>choice_specific_pairwise_<u></u>score.html</a><br>
<br>
For convenience, below is the description for the "insertion-sort" part of the algorithm, which is the lowest-level part of the algorithm. It is followed by a brief description of the full algorithm (excluding the initial estimation algorithm).<br>
<br>
As a reminder, the Condorcet-Kemeny method calculates a full ranking of all choices; it does not just find the single winner.<br>
<br>
Also consider that the Condorcet-Kemeny method does not specify how to resolve sequence-score ties. The VoteFair ranking software does resolve such sequence-score ties by identifying which choices are tied at which ranking levels.<br>
<br>
Richard Fobes<br>
<br>
<br>
-------- Insertion-sort part of the algorithm ---------<br>
<br>
This explanation clarifies how the<br>
well-known insertion-sort algorithm is applied<br>
to VoteFair popularity ranking in a way that<br>
greatly reduces the number of calculations<br>
needed to find maximum sequence scores.<br>
(This method is just part of the full<br>
algorithm, which is explained [elsewhere].)<br>
<br>
Consider an example in which there are five<br>
choices named A, B, C, D, and E, with a final<br>
sort order that matches this alphabetical<br>
order.<br>
<br>
Notation: The notation A>B refers to how<br>
many voters pairwise prefer choice A over<br>
choice B, and the notation B>A refers to how<br>
many voters pairwise prefer choice B over<br>
choice A. This notation always uses the<br>
"greater-than" symbol ">", and never uses<br>
the "less-than" symbol "<".<br>
<br>
At an intermediate stage in this sorting<br>
example, suppose that the choices A, C, and E<br>
have been sorted -- into this correct<br>
order -- and choice B is about to be<br>
sorted, and choice D remains unsorted.<br>
The pairwise counts for this arrangement are<br>
shown below. The asterisks show the<br>
separation between sorted choices and unsorted<br>
choices.<br>
<br>
| | | | | |<br>
| A | C | E | B | D |<br>
| | | | | |<br>
-----*************************<u></u>-------+-------+<br>
* \ | | * | |<br>
A * \ | A>C | A>E * A>B | A>D |<br>
* \ | | * | |<br>
-----+-------+-------+-------+<u></u>-------+-------+<br>
* | \ | * | |<br>
C * C>A | \ | C>E * C>B | C>D |<br>
* | \ | * | |<br>
-----+-------+-------+-------+<u></u>-------+-------+<br>
* | | \ * | |<br>
E * E>A | E>C | \ * E>B | E>D |<br>
* | | \ * | |<br>
-----*************************<u></u>-------+-------+<br>
| | | | \ | |<br>
B | B>A | B>C | B>E | \ | B>D |<br>
| | | | \ | |<br>
-----+-------+-------+-------+<u></u>-------+-------+<br>
| | | | | \ |<br>
D | D>A | D>C | D>E | D>B | \ |<br>
| | | | | \ |<br>
-----+-------+-------+-------+<u></u>-------+-------+<br>
<br>
The diagonal line passes through empty cells<br>
-- that would otherwise represent a<br>
choice's comparison with itself, such as<br>
A>A.<br>
<br>
The diagonal line also is the border between<br>
the upper-right triangular area and the<br>
lower-left triangular area. The sequence<br>
score for the current sequence is the sum of<br>
all the pairwise counts in the upper-right<br>
triangular area (currently A>C + A>E +<br>
A>B + A>D + C>E + C>B + C>D +<br>
E>B + E>D + B>D).<br>
<br>
The goal of these calculations is to find<br>
the maximum sequence score, which means the<br>
goal is to change the sequence so that the<br>
largest pairwise counts move into the<br>
upper-right triangular area, leaving the<br>
smallest pairwise counts in the lower-left<br>
triangular area.<br>
<br>
The first step in sorting choice B is to<br>
consider the possibility of moving it to the<br>
left of choice E, which would form the sequence<br>
A, C, B, E. Here is the pairwise-count<br>
matrix for this sequence. The asterisks<br>
now include choice B because this is a possible<br>
sort order.<br>
<br>
| | | | | |<br>
| A | C | B | E | D |<br>
| | | | | |<br>
-----*************************<u></u>********-------+<br>
* \ | | | * |<br>
A * \ | A>C | A>B | A>E * A>D |<br>
* \ | | | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | \ | | * |<br>
C * C>A | \ | C>B | C>E * C>D |<br>
* | \ | | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | | \ | * |<br>
B * B>A | B>C | \ | B>E * B>D |<br>
* | | \ | --- * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | | | \ * |<br>
E * E>A | E>C | E>B | \ * E>D |<br>
* | | --- | \ * |<br>
-----*************************<u></u>********-------+<br>
| | | | | \ |<br>
D | D>A | D>C | D>B | D>E | \ |<br>
| | | | | \ |<br>
-----+-------+-------+-------+<u></u>-------+-------+<br>
<br>
The only pairwise counts that crossed the<br>
diagonal line are the (underlined) counts<br>
B>E and E>B, which swapped places.<br>
All the other pairwise counts that move do not<br>
cross the diagonal line; they stay on the same<br>
side of the diagonal line.<br>
<br>
As a result, the score for this sequence,<br>
compared to the score for the previous<br>
sequence, increases (or decreases if negative)<br>
by the amount B>E minus E>B. In<br>
this case (assuming there are no complications<br>
that are explained later) the sequence score<br>
has increased because in the final<br>
(alphabetical) sort order, choice B appears<br>
before choice E.<br>
<br>
The next step in sorting choice B is to<br>
consider the possibility of moving it to the<br>
left of choice C, which would form the sequence<br>
A, B, C, E. Here is the pairwise-count<br>
matrix for this sequence.<br>
<br>
| | | | | |<br>
| A | B | C | E | D |<br>
| | | | | |<br>
-----*************************<u></u>********-------+<br>
* \ | | | * |<br>
A * \ | A>B | A>C | A>E * A>D |<br>
* \ | | | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | \ | | * |<br>
B * B>A | \ | B>C | B>E * B>D |<br>
* | \ | --- | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | | \ | * |<br>
C * C>A | C>B | \ | C>E * C>D |<br>
* | --- | \ | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | | | \ * |<br>
E * E>A | E>B | E>C | \ * E>D |<br>
* | | | \ * |<br>
-----*************************<u></u>********-------+<br>
| | | | | \ |<br>
D | D>A | D>B | D>C | D>E | \ |<br>
| | | | | \ |<br>
-----+-------+-------+-------+<u></u>-------+-------+<br>
<br>
The only pairwise counts that crossed the<br>
diagonal line are the (underlined) counts<br>
B>C and C>B, which swapped places.<br>
The other pairwise counts that moved remained<br>
on the same side of the diagonal line.<br>
<br>
The score for this sequence increases (or<br>
decreases if negative) by the amount B>C<br>
minus C>B. In this case the sequence<br>
score has increased because (in the final<br>
alphabetical order) choice B appears before<br>
choice C.<br>
<br>
The final step in sorting choice B is to<br>
consider the possibility of moving it to the<br>
left of choice A, which would form the sequence<br>
B, A, C, E. Here is the matrix for this<br>
sequence.<br>
<br>
| | | | | |<br>
| B | A | C | E | D |<br>
| | | | | |<br>
-----*************************<u></u>********-------+<br>
* \ | | | * |<br>
B * \ | B>A | B>C | B>E * B>D |<br>
* \ | --- | | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | \ | | * |<br>
A * A>B | \ | A>C | A>E * A>D |<br>
* --- | \ | | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | | \ | * |<br>
C * C>B | C>A | \ | C>E * C>D |<br>
* | | \ | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | | | \ * |<br>
E * E>B | E>A | E>C | \ * E>D |<br>
* | | | \ * |<br>
-----*************************<u></u>********-------+<br>
| | | | | \ |<br>
D | D>B | D>A | D>C | D>E | \ |<br>
| | | | | \ |<br>
-----+-------+-------+-------+<u></u>-------+-------+<br>
<br>
The only pairwise counts that crossed the<br>
diagonal line are the (underlined) counts<br>
B>A and A>B, which swapped places.<br>
The other pairwise counts that moved remained<br>
on the same side of the diagonal line.<br>
<br>
The score for this sequence increases (or<br>
decreases if negative) by the amount B>A<br>
minus A>B. In this case the sequence<br>
score has decreased because (in the final<br>
alphabetical order) choice B appears after, not<br>
before, choice A.<br>
<br>
At this point choice B has been tested at<br>
each position within the sorted portion.<br>
The maximum sequence score (for the sorted<br>
portion) occurred when it was between choices A<br>
and C. As a result, choice B will be<br>
moved to the position between choices A and<br>
C.<br>
<br>
Notice that the full sequence score did not<br>
need to be calculated in order to find this<br>
"local" maximum. These calculations only<br>
need to keep track of increases and decreases<br>
that occur as the being-sorted choice swaps<br>
places with successive already-sorted<br>
choices.<br>
<br>
The pairwise-count matrix with choice B in<br>
the second sort-order position (between A and<br>
C) is shown below. Now choice D is the<br>
only unsorted choice.<br>
<br>
| | | | | |<br>
| A | B | C | E | D |<br>
| | | | | |<br>
-----*************************<u></u>********-------+<br>
* \ | | | * |<br>
A * \ | A>B | A>C | A>E * A>D |<br>
* \ | | | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | \ | | * |<br>
B * B>A | \ | B>C | B>E * B>D |<br>
* | \ | | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | | \ | * |<br>
C * C>A | C>B | \ | C>E * C>D |<br>
* | | \ | * |<br>
-----*-------+-------+-------+<u></u>-------*-------+<br>
* | | | \ * |<br>
E * E>A | E>B | E>C | \ * E>D |<br>
* | | | \ * |<br>
-----*************************<u></u>********-------+<br>
| | | | | \ |<br>
D | D>A | D>B | D>C | D>E | \ |<br>
| | | | | \ |<br>
-----+-------+-------+-------+<u></u>-------+-------+<br>
<br>
Choice D would be sorted in the same<br>
way. Of course the maximum sequence score<br>
would occur when choice D is between choices C<br>
and E, so D is moved there.<br>
<br>
| | | | | |<br>
| A | B | C | D | E |<br>
| | | | | |<br>
-----*************************<u></u>****************<br>
* \ | | | | *<br>
A * \ | A>B | A>C | A>D | A>E *<br>
* \ | | | | *<br>
-----*-------+-------+-------+<u></u>-------+-------*<br>
* | \ | | | *<br>
B * B>A | \ | B>C | B>D | B>E *<br>
* | \ | | | *<br>
-----*-------+-------+-------+<u></u>-------+-------*<br>
* | | \ | | *<br>
C * C>A | C>B | \ | C>D | C>E *<br>
* | | \ | | *<br>
-----*-------+-------+-------+<u></u>-------+-------*<br>
* | | | \ | *<br>
D * D>A | D>B | D>C | \ | D>E *<br>
* | | | \ | *<br>
-----*-------+-------+-------+<u></u>-------+-------*<br>
* | | | | \ *<br>
E * E>A | E>B | E>C | E>D | \ *<br>
* | | | | \ *<br>
-----*************************<u></u>****************<br>
<br>
Now there are no more choices to sort, so<br>
the resulting sequence is A, B, C, D, E.<br>
In this sequence the full sequence score<br>
-- which equals A>B + A>C + A>D +<br>
A>E + B>C + B>D + B>E + C>D +<br>
C>E + D>E -- is likely to be the<br>
highest possible sequence score.<br>
<br>
Additional calculations, as described below,<br>
are needed because in rare cases it is possible<br>
that moving two or more choices at the same<br>
time could produce a higher sequence<br>
score. This concept is analogous to<br>
climbing a mountain in foggy conditions by<br>
always heading in the locally higher direction<br>
and ending up at the top of a peak and then,<br>
when the fog clears, seeing a higher peak.<br>
<br>
------- Full calculation method for VoteFair popularity ranking -------<br>
<br>
This is a description of the full algorithm<br>
used to calculate VoteFair popularity ranking<br>
results.<br>
<br>
The algorithm begins by calculating the<br>
Choice-Specific Pairwise-Score ranking.<br>
This pre-sort is a required part of the<br>
process. Without it, some unusual cases<br>
can cause the calculations to fail to find the<br>
sequence with the highest score. This<br>
pre-sort is analogous to starting a search for<br>
the highest mountain peak within a mountain<br>
range instead of starting the search within a<br>
large valley.<br>
<br>
The next step is to apply the insertion-sort<br>
method as described in the section above,<br>
including starting at the left/highest end.<br>
<br>
To ensure that all possible moves of each<br>
choice are considered, the insertion-sort<br>
method is done in both directions.<br>
Sorting in both directions means that in some<br>
sorting passes sorting moves choices to the<br>
left, as explained in the above example.<br>
In other sorting passes sorting starts by<br>
considering the right-most choice as the first<br>
sorted choice, and choices move to the right,<br>
into the sorted portion. This convention<br>
ensures movement for choices that need to move<br>
right, instead of left, in order to cause an<br>
increase in the score.<br>
<br>
Complications can arise when there is<br>
"circular ambiguity", so additional steps are<br>
used. The most common cases of circular<br>
ambiguity involve several choices that are tied<br>
for the same sort-order position.<br>
<br>
A key part of dealing with circular<br>
ambiguity is to follow this convention:<br>
whenever a choice can produce the same,<br>
highest, sequence score at more than one<br>
position, the choice is moved to the farthest<br>
of those highest-sequence-score positions.<br>
<br>
Another part of dealing with these<br>
complications is to sort the sequence multiple<br>
times.<br>
<br>
During the second sorting pass, if there is<br>
no circular ambiguity, the sequence of the<br>
choices in the pairwise matrix remains the<br>
same. This lack of movement (when there<br>
is no circular ambiguity) occurs because the<br>
sorted and unsorted portions are<br>
adjacent. Specifically, each choice to be<br>
sorted is already at the top (for left-movement<br>
sorting) or bottom (for right-movement sorting)<br>
of the "unsorted" portion, and it is being<br>
moved to the bottom (for left-movement sorting)<br>
or top (for right-movement sorting) of<br>
the "sorted" portion. In such cases<br>
the only thing that moves is the boundary<br>
between the sorted choices and unsorted<br>
choices.<br>
<br>
However, in cases that involve circular<br>
ambiguity, the positions of some choices will<br>
change during the second and later sorting<br>
passes. This happens because the<br>
convention (as explained above) is to move each<br>
choice as far as it will go, within the limits<br>
of maximizing the sequence score.<br>
<br>
During the sorting passes the highest<br>
sort-order (sequence) position of each choice<br>
is tracked, and the lowest sort-order position<br>
of each choice is tracked. These highest<br>
and lowest positions are reset (to current<br>
positions) whenever the sequence score<br>
increases to a higher score. At the end<br>
of the sorting process the highest and lowest<br>
positions reveal which choices are tied at the<br>
same popularity ranking level.<br>
<br>
Using the insertion-sort example, if choices<br>
B, C, and D can be in any order and still<br>
produce the same highest sequence score, then<br>
each of these choices would move to the left of<br>
the other two each time it is sorted, and each<br>
of these choices would have the same<br>
highest-ranked position of second place, and<br>
each would have the same lowest-ranked position<br>
of fourth place. Because these three choices<br>
have the same highest and lowest positions,<br>
they are easily identified as tied (at the same<br>
popularity ranking).<br>
<br>
More complex cases of circular ambiguity can<br>
occur. To deal with these cases, and to<br>
ensure the correctness of the "winner" (the<br>
most popular choice), the sorting process is<br>
repeated for the top half (plus one) of the<br>
highest-ranked choices, and this sub-set<br>
sorting is repeated until there are just three<br>
choices. For example, if there are 12<br>
choices, the sorting process is done for 12<br>
choices, then the top 7 choices, then the top 4<br>
choices, and finally the top 3 choices.<br>
Then the highest-ranked choice (or the choices<br>
that are tied at the top) is kept at the<br>
highest rank while the other choices are sorted<br>
a final time. (If, instead, the<br>
least-popular choice is the most important one<br>
to identify correctly, the data supplied to<br>
this algorithm can be inverted according to<br>
preference levels, and then the calculated<br>
ranking can be reversed.)<br>
<br>
As a clarification, the extra sub-set<br>
sorting is done only if more than one sequence<br>
has the same highest sequence score. This<br>
point is significant if the distinction between<br>
VoteFair popularity ranking and the<br>
Condorcet-Kemeny method is relevant.<br>
Specifically, the Condorcet-Kemeny method does<br>
not indicate how such "tied" sequence scores<br>
should be resolved, whereas VoteFair popularity<br>
ranking resolves such "tied" sequence scores as<br>
part of its calculation process.<br>
<br>
After all the sorting has been done, the<br>
highest and lowest ranking levels are used to<br>
determine the results. For each choice<br>
its highest and lowest ranking levels<br>
are added together (which equals twice their<br>
average) and then multiplied times a<br>
constant. The constant equals 10 times<br>
the number of choices minus one. These<br>
numbers are converted to integers, and then<br>
these "averaged scaled integerized" values are<br>
used as the non-normalized ranking<br>
levels. Two or more choices are ranked at<br>
the same level if they have the same<br>
"averaged-scaled-integerized" ranking<br>
values.<br>
<br>
The final calculation step is to normalize<br>
the "averaged-scaled-integerized" ranking<br>
levels so that the normalized ranking levels<br>
are consecutive, namely 1, 2, 3, etc. (so that<br>
no ranking levels are skipped).<br>
<br>
The result is a ranking that identifies<br>
which choice is first-most popular, which<br>
choice is second-most popular, and so on down<br>
to which choice is least popular. Ties<br>
can occur at any level.<br>
<br>
------ Calculation time ------<br>
<br>
The full algorithm used to calculate<br>
VoteFair popularity ranking results has a<br>
calculation time that is less than or equal to<br>
the following polynomial function:<br>
<br>
T = A + ( B * N ) + ( C * ( N * N ) )<br>
<br>
where T is the calculation time, N is the<br>
number of choices, and A and B and C are<br>
constants. (In mathematical notation, N *<br>
N would be written as N squared.) This<br>
function includes the calculation time required<br>
for the Choice-Specific Pairwise-Score (CSPS)<br>
pre-sort calculations.<br>
<br>
This time contrasts with the slow execution<br>
times of the "NP-hard" approach, in which<br>
every sequence score is calculated in order to<br>
find the sequence with the highest score.<br>
If every sequence score were calculated (from<br>
scratch), the calculation time would be<br>
proportional to:<br>
<br>
N! * N * N<br>
<br>
where N is the number of choices, N! is N<br>
factorial (2 * 3 * 4 * ... * N), and N * N<br>
equals N squared. Note that N factorial<br>
equals the number of possible sequences, and N<br>
squared times one-half approximately equals the<br>
number of pairwise counts that are added to<br>
calculate each sequence score.<br>
<br>
This clarification about calculation time is<br>
included because there is an academically<br>
common -- yet mistaken -- belief that<br>
calculating the "Condorcet-Kemeny method" is<br>
"NP-hard" and cannot be calculated in a time<br>
that is proportional to a polynomial function<br>
of N (the number of choices).<br>
<br>
(c) Copyright 2011 Richard Fobes at VoteFair.org<br>
<br>
(This description copied from VoteFair.org<br>
with permission.)<br>
<br>
----<br>
Election-Methods mailing list - see <a href="http://electorama.com/em" target="_blank">http://electorama.com/em</a> for list info<br>
</blockquote></div><br></div>