[EM] VoteFair Ranking software version 6.0 in C++ with MIT license

VoteFair electionmethods at votefair.org
Sat Mar 7 17:23:38 PST 2020


On 2/25/2020 3:17 PM, Kristofer Munsterhjelm wrote:
 > So it seems the situation is different than I thought. Instead of the
 > method not being the same as Kemeny, the case is more that the software
 > doesn't implement the method, but rather something that approximates
 > (and differs from) it.
 > ...
 > ... the software doesn't give any impression that this is what it does.
 > Instead, from the documentation, one would get the impression that it
 > implements *the* Kemeny method (for any number of candidates). ...
 > ...
 > The documentation could thus use being more clear about what the
 > software implements -- or the software could be rewritten so that it
 > implements the actual VoteFair (or Kemeny) method.

I agree that the software deserves better documentation, so I have added 
the following clarification to the VoteFair ranking software (both in 
README.txt and votefair_ranking.cpp).

//  Clarification: For reasons of computation time, there are some
//  very rare cases for which this software can produce results that
//  differ from full VoteFair popularity ranking results.
//  These exceptions involve more choices (candidates) than the
//  value of the constant named "global_check_all_scores_choice_limit",
//  which currently has a default value of 6.
//  In those rare cases, the software estimates which top (6) choices
//  are the most popular, and then does the full VoteFair popularity
//  ranking to determine which of those top choices is most popular.
//  The rare cases in which the estimation method produces different
//  results involve one or more rock-paper-scissors-like cycles
//  that include the top choices, which is a rare combination.
//  As an analogy, those cases are like finding the highest
//  sand dune in a desert, whereas most cases are like finding
//  the highest mountain peak in a mountain range.
//  As the number of ballots increases (such as beyond 50 ballots),
//  the likelihood of such cycles greatly decreases.
//  If this difference is important, the value of the constant
//  "global_check_all_scores_choice_limit" can be increased, but
//  that change will dramatically increase calculation time,
//  to the point of requiring years of computation time for values
//  even as small as 20.  Therefore, values larger than 12 are not
//  recommended.

Kristofer, thank you for taking the time to understand VoteFair Ranking 
and the VoteFair ranking software, and for calling attention to this 
need for better software documentation.

Richard Fobes


On 2/25/2020 3:17 PM, Kristofer Munsterhjelm wrote:
> I hope this reply isn't too late, so here we go:
>
> On 23/01/2020 18.53, VoteFair wrote:
>> On 1/22/2020 3:49 PM, Kristofer Munsterhjelm wrote:
>>> Given this definition, there are two possibilities. Either the VoteFair
>>> software implements the VoteFair method, or it does not. ...
>>
>> If the default settings are used, and if the number of candidates is no
>> more than the check-all-sequence-scores setting (which I think is 6),
>> then the software implements the VoteFair popularity ranking method,
>> which yields the same results as the Condorcet-Kemeny method.
>>
>> In other words you are not allowing for a third possibility, which -- of
>> course -- depends on the number of candidates.
>>
>>> ... It seems that you instead are saying that, as you
>>> define the VoteFair ranking, it does agree with Kemeny all the time, but
>>> the software doesn't.
>>
>> Yes, this is the third possibility. The software CAN always yield the
>> same results -- simply by limiting the number of candidates it can
>> handle (or, up to a limit based on computing power, increasing the
>> number of candidates checked by the full/slow method).
>>
>>> Could you give me a definition of the VoteFair method itself? Without
>>> knowing what bit is the software and what bit is the method, it's
>>> difficult to get further. For instance, it would be hard to know whether
>>> the results that the VoteFair software gives when Kemeny orderings are
>>> tied is a property of the method or of the software, because the
>>> definition of the Kemeny method itself doesn't say what should happen in
>>> such a case.
>>
>> Originally I wrote the definition of VoteFair popularity ranking (which
>> at that time I called VoteFair ranking) on Wikipedia, and Markus Schulze
>> re-named the article to the "Kemeny-Young method" claiming that the two
>> methods are the same. So the definition there matches VoteFair
>> popularity ranking because I wrote it.
>
> Alright, that enabled me to track down the original definition by
> looking at the Wikipedia history archive. I've quoted it below.
>
>> VoteFair ranking shall be done using software (such as accessible at
>> www.VoteFair.org) that performs the following calculations. The
>> preferences indicated in the ballots are counted to produce a tally
>> table in which all the possible pairs of candidates are listed, one
>> number for each pair indicates the number of voters who prefer one
>> candidate in the pair over the other candidate in the pair, another
>> number for each pair indicates the number of voters who have the
>> opposite preference for these two candidates, and a third number for
>> each pair indicates the number of voters who express no preference
>> between the two candidates. Using a computer, each possible sequence of
>> candidates is considered, where a sequence consists of one of the
>> candidates being regarded as the most popular candidate, another
>> candidate being regarded as the second-most popular candidate, and so
>> on. For each such sequence the numbers in the tally table that apply to
>> that sequence are added together to produce a sequence score for this
>> sequence. The sequence that has the highest sequence score indicates the
>> overall order of preference for the candidates. If there is more than
>> one sequence that has the same highest score, the sequences with this
>> score shall be analyzed to identify one or more ties at one or more
>> preference levels.
>
> I agree that the definition above describes the Kemeny-Young method. I'd
> even say it's the same (not just mathematically equivalent) even though
> the mechanics are different:
>
> There exist implementations of Kemeny-Young that don't rely on
> enumerating all combinations and then choosing the best one. For
> instance, integer programming implementations scale much better while
> producing the optimal result. If the implementation mattered as to
> whether one method is the same as another, then these implementations
> would not be of the Kemeny-Young method because they don't use
> exhaustive enumerations. But they are generally considered to be.
>
>> Yes, the issue of how to handle Condorcet-cycle ties is not stated by
>> John Kemeny (as far as I know), and I have not explicitly stated how
>> such ties are handled in VoteFair popularity ranking. If I did make such
>> a statement then you or others might argue that the two methods are not
>> the same.
>>
>> The software does attempt to handle ties, either to use a tie-breaking
>> criteria or to declare an official tie. In the latter case, the VoteFair
>> approach presumably matches the Kemeny approach. In the former case the
>> candidates in the cycle can be re-sequenced and the sequence sums yield
>> the same highest sum, yet SOME of those "tied" sequences can yield
>> different sums. If this seems complex it's because it IS complex. It
>> took lots of work to get the software to handle these and other "edge"
>> cases. This approach is unlike what many software developers do, which
>> is to dismiss edge cases with some kind of error message. In other
>> words, my software looks deeper into the "tie."
>>
>> So, keeping in mind that neither John Kemeny nor I have officially
>> stated how ties (in the sequence sums) are to be handled, both methods
>> are mathematically equivalent.
>
> So it seems the situation is different than I thought. Instead of the
> method not being the same as Kemeny, the case is more that the software
> doesn't implement the method, but rather something that approximates
> (and differs from) it.
>
> As I've said before, I consider a method to be algorithm-agnostic.
> Schulze implemented by the beatpath mechanism is the same method as
> Schulze implemented by the cloneproof Schwartz dropping mechanism.
> Furthermore, I think that a method is a function that takes an election
> as its input and then provides either a winner set or a ranking (order
> of finish).
>
> If a method uses some kind of auxiliary input parameter, then the method
> with one value of that auxiliary input parameter is not the same as the
> method with another value of that parameter, unless it happens to be
> inconsequential for the outcome. E.g. STV with a Droop quota is not the
> same method as STV with the Hare method.
>
> A software implementation is "of the method" or "implements the method"
> if it agrees with the method for every input that it returns an output for.
>
> Since the VoteFair method definition is the same (or equivalent) to the
> Kemeny-Young method definition, the methods are the same. But the
> software doesn't implement that method[1]. Instead, it implements a
> family of approximations determined by the parameter. Each approximation
> agrees with the full method if the number of candidates is below a
> certain number, and returns undefined results above this number.
>
> But the software doesn't give any impression that this is what it does.
> Instead, from the documentation, one would get the impression that it
> implements *the* Kemeny method (for any number of candidates). E.g. from
> the VoteFair::VoteFairRanking Perl module documentation:
>
>> This module calculates VoteFair Ranking results. The portions of
>> VoteFair Ranking implemented here are:
>>
>> - VoteFair popularity ranking. This voting method calculates the full
>> popularity ranking of all candidates (or choices in the case of a
>> survey) from most popular and second-most popular down to least
>> popular. It uses the preference information collected on 1-2-3
>> ballots (or any equivalent way of expressing "ranked" preferences).
>> When a single position is being filled, the most popular candidate is
>> declared the winner. This calculation method is mathematically
>> equivalent to the Condorcet-Kemeny election method.
>
> and further,
>
>> VoteFair popularity ranking is described in Wikipedia, and is
>> mathematically equivalent to the Condorcet-Kemeny method. The following
>> comments explain the algorithm used here, which quickly calculates the
>> ranking results. This explanation is important because some academic
>> sources claim that the computations cannot be done quickly if there is a
>> large number of choices being ranked.
>
>> Calculation time:
>>
>> The full algorithm used to calculate VoteFair popularity ranking
>> results has a calculation time that is less than or equal to the
>> following polynomial function:
>>
>> T = A + ( B * N ) + ( C * ( N * N ) )
>>
>> where T is the calculation time, N is the number of choices, and A
>> and B and C are constants. (In mathematical notation, N * N would be
>> written as N squared.) This function includes the calculation time
>> required for the Choice-Specific Pairwise-Score (CSPS) pre-sort
>> calculations.
>>
>> This time contrasts with the slow execution times of the "NP-hard"
>> approach, in which every sequence score is calculated in order to find
>> the sequence with the highest score. If every sequence score were
>> calculated (from scratch), the calculation time would be proportional to:
>>
>> N! * N * N
>>
>> where N is the number of choices, N! is N factorial (2 * 3 * 4 * ...
>> * N), and N * N equals N squared. Note that N factorial equals the number
>> of possible sequences, and N squared times one-half approximately equals
>> the number of pairwise counts that are added to calculate each sequence
>> score.
>>
>> This clarification about calculation time is included because there
>> is an academically common -- yet mistaken -- belief that calculating the
>> "Condorcet-Kemeny method" is "NP-hard" and cannot be calculated in a
>> time that is proportional to a polynomial function of N (the number of
>> choices).
>
> This description gives the impression that the software calculates the
> VoteFair ranking (period, i.e. no approximations), that it does so in
> polynomial time, and that it is possible to determine the Kemeny-Young
> winner in polynomial time. But what it does is calculate an
> approximation, which is how it can manage to do so in polytime without
> proving P=NP.
>
> Similarly, in the VoteFair C++ README.txt file:
>
>> //  * VoteFair popularity ranking.  This voting method calculates the
>> //  full popularity ranking of all candidates (or choices in the case
>> //  of a survey) from most popular and second-most popular down to
>> //  least popular.  It uses the preference information collected on
>> //  1-2-3 ballots (or any equivalent way of expressing "ranked"
>> //  preferences).  When a single position is being filled, the most
>> //  popular candidate is declared the winner.  This calculation method
>> //  is mathematically equivalent to the Condorcet-Kemeny election
>> //  method.
>
> which gives the impression that again, the software calculates the
> method (although at least it doesn't mention being able to circumvent
> NP-hardness :-).
>
> Now, if I'm generous, I could say that what I understand you to be
> saying is that when you say "this voting method calculates...", you're
> talking about the software, but when you're saying "this calculation
> method is mathematically equivalent...", you're talking about the
> method. But it's easy to draw the wrong conclusion, as I did earlier.
> The documentation could thus use being more clear about what the
> software implements -- or the software could be rewritten so that it
> implements the actual VoteFair (or Kemeny) method.
>
> And again, I would suggest the latter as a good strategy. Instead of
> producing correct results up to a certain number of candidates and then
> producing undefined results - that it produces the correct results in
> potentially exponential time, and that it just quits with a timeout if
> it takes too long a time.
>
> That way, it's immediately obvious to the user whether the method
> produces a correct result (namely, that it'd do so when it does produce
> a result at all). Instead of producing a result in managable time with
> some probability (which you argue is low) of producing something
> incorrect, it produces a result in managable time with similar (high if
> you're right) probability of not getting a timeout.
>
> In addition, if you use a reasonably good integer programming solver to
> do so, you can quickly solve cases that the current software can't, e.g.
> 1000 candidates but no cycles at all.
>
> [1] Strictly speaking, it implements a generalization of that method,
> but I don't want to make things more confusing than they need to be; I'd
> rather deal with one point at a time.
>


More information about the Election-Methods mailing list