[EM] Fast Condorcet-Kemeny calculations -- in polynomial time
Jameson Quinn
jameson.quinn at gmail.com
Wed Dec 21 13:39:19 PST 2011
This algorithm, at first read, seems that it could still be NP-hard if all
candidates were part of a (three-candidate???) cycle with all other
candidates. Which is, of course, a case which would hardly ever arise in
real life; but one which must be considered in giving a worst-case
execution time order for the algorithm.
Jameson
2011/12/21 Richard Fobes <ElectionMethods at votefair.org>
> As previously promised, I am revealing how Condorcet-Kemeny calculations
> can be done fast.
>
> How fast? Calculating a "complex" case with 12 choices is so fast that I
> haven't bothered to time the calculations. As an estimate, it is on the
> order of a couple of seconds or less.
>
> The estimated time for calculating results for 50 choices (!) is less than
> a minute -- which is far less than a "lifetime".
>
> The algorithm is described in the documentation that accompanies the
> just-announced VoteFair ranking software.
>
> The full explanation of the algorithm is described on these three web
> pages:
>
> http://www.votefair.org/**calculation_details_**popularity.html<http://www.votefair.org/calculation_details_popularity.html>
>
> http://www.votefair.org/**insertion_sort_ranking.html<http://www.votefair.org/insertion_sort_ranking.html>
>
> http://www.votefair.org/**choice_specific_pairwise_**score.html<http://www.votefair.org/choice_specific_pairwise_score.html>
>
> For convenience, below is the description for the "insertion-sort" part of
> the algorithm, which is the lowest-level part of the algorithm. It is
> followed by a brief description of the full algorithm (excluding the
> initial estimation algorithm).
>
> As a reminder, the Condorcet-Kemeny method calculates a full ranking of
> all choices; it does not just find the single winner.
>
> Also consider that the Condorcet-Kemeny method does not specify how to
> resolve sequence-score ties. The VoteFair ranking software does resolve
> such sequence-score ties by identifying which choices are tied at which
> ranking levels.
>
> Richard Fobes
>
>
> -------- Insertion-sort part of the algorithm ---------
>
> This explanation clarifies how the
> well-known insertion-sort algorithm is applied
> to VoteFair popularity ranking in a way that
> greatly reduces the number of calculations
> needed to find maximum sequence scores.
> (This method is just part of the full
> algorithm, which is explained [elsewhere].)
>
> Consider an example in which there are five
> choices named A, B, C, D, and E, with a final
> sort order that matches this alphabetical
> order.
>
> Notation: The notation A>B refers to how
> many voters pairwise prefer choice A over
> choice B, and the notation B>A refers to how
> many voters pairwise prefer choice B over
> choice A. This notation always uses the
> "greater-than" symbol ">", and never uses
> the "less-than" symbol "<".
>
> At an intermediate stage in this sorting
> example, suppose that the choices A, C, and E
> have been sorted -- into this correct
> order -- and choice B is about to be
> sorted, and choice D remains unsorted.
> The pairwise counts for this arrangement are
> shown below. The asterisks show the
> separation between sorted choices and unsorted
> choices.
>
> | | | | | |
> | A | C | E | B | D |
> | | | | | |
> -----***************************-------+-------+
> * \ | | * | |
> A * \ | A>C | A>E * A>B | A>D |
> * \ | | * | |
> -----+-------+-------+-------+**-------+-------+
> * | \ | * | |
> C * C>A | \ | C>E * C>B | C>D |
> * | \ | * | |
> -----+-------+-------+-------+**-------+-------+
> * | | \ * | |
> E * E>A | E>C | \ * E>B | E>D |
> * | | \ * | |
> -----***************************-------+-------+
> | | | | \ | |
> B | B>A | B>C | B>E | \ | B>D |
> | | | | \ | |
> -----+-------+-------+-------+**-------+-------+
> | | | | | \ |
> D | D>A | D>C | D>E | D>B | \ |
> | | | | | \ |
> -----+-------+-------+-------+**-------+-------+
>
> The diagonal line passes through empty cells
> -- that would otherwise represent a
> choice's comparison with itself, such as
> A>A.
>
> The diagonal line also is the border between
> the upper-right triangular area and the
> lower-left triangular area. The sequence
> score for the current sequence is the sum of
> all the pairwise counts in the upper-right
> triangular area (currently A>C + A>E +
> A>B + A>D + C>E + C>B + C>D +
> E>B + E>D + B>D).
>
> The goal of these calculations is to find
> the maximum sequence score, which means the
> goal is to change the sequence so that the
> largest pairwise counts move into the
> upper-right triangular area, leaving the
> smallest pairwise counts in the lower-left
> triangular area.
>
> The first step in sorting choice B is to
> consider the possibility of moving it to the
> left of choice E, which would form the sequence
> A, C, B, E. Here is the pairwise-count
> matrix for this sequence. The asterisks
> now include choice B because this is a possible
> sort order.
>
> | | | | | |
> | A | C | B | E | D |
> | | | | | |
> -----***********************************-------+
> * \ | | | * |
> A * \ | A>C | A>B | A>E * A>D |
> * \ | | | * |
> -----*-------+-------+-------+**-------*-------+
> * | \ | | * |
> C * C>A | \ | C>B | C>E * C>D |
> * | \ | | * |
> -----*-------+-------+-------+**-------*-------+
> * | | \ | * |
> B * B>A | B>C | \ | B>E * B>D |
> * | | \ | --- * |
> -----*-------+-------+-------+**-------*-------+
> * | | | \ * |
> E * E>A | E>C | E>B | \ * E>D |
> * | | --- | \ * |
> -----***********************************-------+
> | | | | | \ |
> D | D>A | D>C | D>B | D>E | \ |
> | | | | | \ |
> -----+-------+-------+-------+**-------+-------+
>
> The only pairwise counts that crossed the
> diagonal line are the (underlined) counts
> B>E and E>B, which swapped places.
> All the other pairwise counts that move do not
> cross the diagonal line; they stay on the same
> side of the diagonal line.
>
> As a result, the score for this sequence,
> compared to the score for the previous
> sequence, increases (or decreases if negative)
> by the amount B>E minus E>B. In
> this case (assuming there are no complications
> that are explained later) the sequence score
> has increased because in the final
> (alphabetical) sort order, choice B appears
> before choice E.
>
> The next step in sorting choice B is to
> consider the possibility of moving it to the
> left of choice C, which would form the sequence
> A, B, C, E. Here is the pairwise-count
> matrix for this sequence.
>
> | | | | | |
> | A | B | C | E | D |
> | | | | | |
> -----***********************************-------+
> * \ | | | * |
> A * \ | A>B | A>C | A>E * A>D |
> * \ | | | * |
> -----*-------+-------+-------+**-------*-------+
> * | \ | | * |
> B * B>A | \ | B>C | B>E * B>D |
> * | \ | --- | * |
> -----*-------+-------+-------+**-------*-------+
> * | | \ | * |
> C * C>A | C>B | \ | C>E * C>D |
> * | --- | \ | * |
> -----*-------+-------+-------+**-------*-------+
> * | | | \ * |
> E * E>A | E>B | E>C | \ * E>D |
> * | | | \ * |
> -----***********************************-------+
> | | | | | \ |
> D | D>A | D>B | D>C | D>E | \ |
> | | | | | \ |
> -----+-------+-------+-------+**-------+-------+
>
> The only pairwise counts that crossed the
> diagonal line are the (underlined) counts
> B>C and C>B, which swapped places.
> The other pairwise counts that moved remained
> on the same side of the diagonal line.
>
> The score for this sequence increases (or
> decreases if negative) by the amount B>C
> minus C>B. In this case the sequence
> score has increased because (in the final
> alphabetical order) choice B appears before
> choice C.
>
> The final step in sorting choice B is to
> consider the possibility of moving it to the
> left of choice A, which would form the sequence
> B, A, C, E. Here is the matrix for this
> sequence.
>
> | | | | | |
> | B | A | C | E | D |
> | | | | | |
> -----***********************************-------+
> * \ | | | * |
> B * \ | B>A | B>C | B>E * B>D |
> * \ | --- | | * |
> -----*-------+-------+-------+**-------*-------+
> * | \ | | * |
> A * A>B | \ | A>C | A>E * A>D |
> * --- | \ | | * |
> -----*-------+-------+-------+**-------*-------+
> * | | \ | * |
> C * C>B | C>A | \ | C>E * C>D |
> * | | \ | * |
> -----*-------+-------+-------+**-------*-------+
> * | | | \ * |
> E * E>B | E>A | E>C | \ * E>D |
> * | | | \ * |
> -----***********************************-------+
> | | | | | \ |
> D | D>B | D>A | D>C | D>E | \ |
> | | | | | \ |
> -----+-------+-------+-------+**-------+-------+
>
> The only pairwise counts that crossed the
> diagonal line are the (underlined) counts
> B>A and A>B, which swapped places.
> The other pairwise counts that moved remained
> on the same side of the diagonal line.
>
> The score for this sequence increases (or
> decreases if negative) by the amount B>A
> minus A>B. In this case the sequence
> score has decreased because (in the final
> alphabetical order) choice B appears after, not
> before, choice A.
>
> At this point choice B has been tested at
> each position within the sorted portion.
> The maximum sequence score (for the sorted
> portion) occurred when it was between choices A
> and C. As a result, choice B will be
> moved to the position between choices A and
> C.
>
> Notice that the full sequence score did not
> need to be calculated in order to find this
> "local" maximum. These calculations only
> need to keep track of increases and decreases
> that occur as the being-sorted choice swaps
> places with successive already-sorted
> choices.
>
> The pairwise-count matrix with choice B in
> the second sort-order position (between A and
> C) is shown below. Now choice D is the
> only unsorted choice.
>
> | | | | | |
> | A | B | C | E | D |
> | | | | | |
> -----***********************************-------+
> * \ | | | * |
> A * \ | A>B | A>C | A>E * A>D |
> * \ | | | * |
> -----*-------+-------+-------+**-------*-------+
> * | \ | | * |
> B * B>A | \ | B>C | B>E * B>D |
> * | \ | | * |
> -----*-------+-------+-------+**-------*-------+
> * | | \ | * |
> C * C>A | C>B | \ | C>E * C>D |
> * | | \ | * |
> -----*-------+-------+-------+**-------*-------+
> * | | | \ * |
> E * E>A | E>B | E>C | \ * E>D |
> * | | | \ * |
> -----***********************************-------+
> | | | | | \ |
> D | D>A | D>B | D>C | D>E | \ |
> | | | | | \ |
> -----+-------+-------+-------+**-------+-------+
>
> Choice D would be sorted in the same
> way. Of course the maximum sequence score
> would occur when choice D is between choices C
> and E, so D is moved there.
>
> | | | | | |
> | A | B | C | D | E |
> | | | | | |
> -----*******************************************
> * \ | | | | *
> A * \ | A>B | A>C | A>D | A>E *
> * \ | | | | *
> -----*-------+-------+-------+**-------+-------*
> * | \ | | | *
> B * B>A | \ | B>C | B>D | B>E *
> * | \ | | | *
> -----*-------+-------+-------+**-------+-------*
> * | | \ | | *
> C * C>A | C>B | \ | C>D | C>E *
> * | | \ | | *
> -----*-------+-------+-------+**-------+-------*
> * | | | \ | *
> D * D>A | D>B | D>C | \ | D>E *
> * | | | \ | *
> -----*-------+-------+-------+**-------+-------*
> * | | | | \ *
> E * E>A | E>B | E>C | E>D | \ *
> * | | | | \ *
> -----*******************************************
>
> Now there are no more choices to sort, so
> the resulting sequence is A, B, C, D, E.
> In this sequence the full sequence score
> -- which equals A>B + A>C + A>D +
> A>E + B>C + B>D + B>E + C>D +
> C>E + D>E -- is likely to be the
> highest possible sequence score.
>
> Additional calculations, as described below,
> are needed because in rare cases it is possible
> that moving two or more choices at the same
> time could produce a higher sequence
> score. This concept is analogous to
> climbing a mountain in foggy conditions by
> always heading in the locally higher direction
> and ending up at the top of a peak and then,
> when the fog clears, seeing a higher peak.
>
> ------- Full calculation method for VoteFair popularity ranking -------
>
> This is a description of the full algorithm
> used to calculate VoteFair popularity ranking
> results.
>
> The algorithm begins by calculating the
> Choice-Specific Pairwise-Score ranking.
> This pre-sort is a required part of the
> process. Without it, some unusual cases
> can cause the calculations to fail to find the
> sequence with the highest score. This
> pre-sort is analogous to starting a search for
> the highest mountain peak within a mountain
> range instead of starting the search within a
> large valley.
>
> The next step is to apply the insertion-sort
> method as described in the section above,
> including starting at the left/highest end.
>
> To ensure that all possible moves of each
> choice are considered, the insertion-sort
> method is done in both directions.
> Sorting in both directions means that in some
> sorting passes sorting moves choices to the
> left, as explained in the above example.
> In other sorting passes sorting starts by
> considering the right-most choice as the first
> sorted choice, and choices move to the right,
> into the sorted portion. This convention
> ensures movement for choices that need to move
> right, instead of left, in order to cause an
> increase in the score.
>
> Complications can arise when there is
> "circular ambiguity", so additional steps are
> used. The most common cases of circular
> ambiguity involve several choices that are tied
> for the same sort-order position.
>
> A key part of dealing with circular
> ambiguity is to follow this convention:
> whenever a choice can produce the same,
> highest, sequence score at more than one
> position, the choice is moved to the farthest
> of those highest-sequence-score positions.
>
> Another part of dealing with these
> complications is to sort the sequence multiple
> times.
>
> During the second sorting pass, if there is
> no circular ambiguity, the sequence of the
> choices in the pairwise matrix remains the
> same. This lack of movement (when there
> is no circular ambiguity) occurs because the
> sorted and unsorted portions are
> adjacent. Specifically, each choice to be
> sorted is already at the top (for left-movement
> sorting) or bottom (for right-movement sorting)
> of the "unsorted" portion, and it is being
> moved to the bottom (for left-movement sorting)
> or top (for right-movement sorting) of
> the "sorted" portion. In such cases
> the only thing that moves is the boundary
> between the sorted choices and unsorted
> choices.
>
> However, in cases that involve circular
> ambiguity, the positions of some choices will
> change during the second and later sorting
> passes. This happens because the
> convention (as explained above) is to move each
> choice as far as it will go, within the limits
> of maximizing the sequence score.
>
> During the sorting passes the highest
> sort-order (sequence) position of each choice
> is tracked, and the lowest sort-order position
> of each choice is tracked. These highest
> and lowest positions are reset (to current
> positions) whenever the sequence score
> increases to a higher score. At the end
> of the sorting process the highest and lowest
> positions reveal which choices are tied at the
> same popularity ranking level.
>
> Using the insertion-sort example, if choices
> B, C, and D can be in any order and still
> produce the same highest sequence score, then
> each of these choices would move to the left of
> the other two each time it is sorted, and each
> of these choices would have the same
> highest-ranked position of second place, and
> each would have the same lowest-ranked position
> of fourth place. Because these three choices
> have the same highest and lowest positions,
> they are easily identified as tied (at the same
> popularity ranking).
>
> More complex cases of circular ambiguity can
> occur. To deal with these cases, and to
> ensure the correctness of the "winner" (the
> most popular choice), the sorting process is
> repeated for the top half (plus one) of the
> highest-ranked choices, and this sub-set
> sorting is repeated until there are just three
> choices. For example, if there are 12
> choices, the sorting process is done for 12
> choices, then the top 7 choices, then the top 4
> choices, and finally the top 3 choices.
> Then the highest-ranked choice (or the choices
> that are tied at the top) is kept at the
> highest rank while the other choices are sorted
> a final time. (If, instead, the
> least-popular choice is the most important one
> to identify correctly, the data supplied to
> this algorithm can be inverted according to
> preference levels, and then the calculated
> ranking can be reversed.)
>
> As a clarification, the extra sub-set
> sorting is done only if more than one sequence
> has the same highest sequence score. This
> point is significant if the distinction between
> VoteFair popularity ranking and the
> Condorcet-Kemeny method is relevant.
> Specifically, the Condorcet-Kemeny method does
> not indicate how such "tied" sequence scores
> should be resolved, whereas VoteFair popularity
> ranking resolves such "tied" sequence scores as
> part of its calculation process.
>
> After all the sorting has been done, the
> highest and lowest ranking levels are used to
> determine the results. For each choice
> its highest and lowest ranking levels
> are added together (which equals twice their
> average) and then multiplied times a
> constant. The constant equals 10 times
> the number of choices minus one. These
> numbers are converted to integers, and then
> these "averaged scaled integerized" values are
> used as the non-normalized ranking
> levels. Two or more choices are ranked at
> the same level if they have the same
> "averaged-scaled-integerized" ranking
> values.
>
> The final calculation step is to normalize
> the "averaged-scaled-integerized" ranking
> levels so that the normalized ranking levels
> are consecutive, namely 1, 2, 3, etc. (so that
> no ranking levels are skipped).
>
> The result is a ranking that identifies
> which choice is first-most popular, which
> choice is second-most popular, and so on down
> to which choice is least popular. Ties
> can occur at any level.
>
> ------ Calculation time ------
>
> The full algorithm used to calculate
> VoteFair popularity ranking results has a
> calculation time that is less than or equal to
> the following polynomial function:
>
> T = A + ( B * N ) + ( C * ( N * N ) )
>
> where T is the calculation time, N is the
> number of choices, and A and B and C are
> constants. (In mathematical notation, N *
> N would be written as N squared.) This
> function includes the calculation time required
> for the Choice-Specific Pairwise-Score (CSPS)
> pre-sort calculations.
>
> This time contrasts with the slow execution
> times of the "NP-hard" approach, in which
> every sequence score is calculated in order to
> find the sequence with the highest score.
> If every sequence score were calculated (from
> scratch), the calculation time would be
> proportional to:
>
> N! * N * N
>
> where N is the number of choices, N! is N
> factorial (2 * 3 * 4 * ... * N), and N * N
> equals N squared. Note that N factorial
> equals the number of possible sequences, and N
> squared times one-half approximately equals the
> number of pairwise counts that are added to
> calculate each sequence score.
>
> This clarification about calculation time is
> included because there is an academically
> common -- yet mistaken -- belief that
> calculating the "Condorcet-Kemeny method" is
> "NP-hard" and cannot be calculated in a time
> that is proportional to a polynomial function
> of N (the number of choices).
>
> (c) Copyright 2011 Richard Fobes at VoteFair.org
>
> (This description copied from VoteFair.org
> with permission.)
>
> ----
> Election-Methods mailing list - see http://electorama.com/em for list info
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20111221/20ae6424/attachment-0004.htm>
More information about the Election-Methods
mailing list