[EM] VoteFair Ranking software version 6.0 in C++ with MIT license

Tue Feb 25 15:17:34 PST 2020

I hope this reply isn't too late, so here we go:

On 23/01/2020 18.53, VoteFair wrote:
> On 1/22/2020 3:49 PM, Kristofer Munsterhjelm wrote:
>> Given this definition, there are two possibilities. Either the VoteFair
>> software implements the VoteFair method, or it does not. ...
> 
> If the default settings are used, and if the number of candidates is no
> more than the check-all-sequence-scores setting (which I think is 6),
> then the software implements the VoteFair popularity ranking method,
> which yields the same results as the Condorcet-Kemeny method.
> 
> In other words you are not allowing for a third possibility, which -- of
> course -- depends on the number of candidates.
> 
>> ... It seems that you instead are saying that, as you
>> define the VoteFair ranking, it does agree with Kemeny all the time, but
>> the software doesn't.
> 
> Yes, this is the third possibility. The software CAN always yield the
> same results -- simply by limiting the number of candidates it can
> handle (or, up to a limit based on computing power, increasing the
> number of candidates checked by the full/slow method).
> 
>> Could you give me a definition of the VoteFair method itself? Without
>> knowing what bit is the software and what bit is the method, it's
>> difficult to get further. For instance, it would be hard to know whether
>> the results that the VoteFair software gives when Kemeny orderings are
>> tied is a property of the method or of the software, because the
>> definition of the Kemeny method itself doesn't say what should happen in
>> such a case.
> 
> Originally I wrote the definition of VoteFair popularity ranking (which
> at that time I called VoteFair ranking) on Wikipedia, and Markus Schulze
> re-named the article to the "Kemeny-Young method" claiming that the two
> methods are the same. So the definition there matches VoteFair
> popularity ranking because I wrote it.

Alright, that enabled me to track down the original definition by
looking at the Wikipedia history archive. I've quoted it below.

> VoteFair ranking shall be done using software (such as accessible at
> www.VoteFair.org) that performs the following calculations. The
> preferences indicated in the ballots are counted to produce a tally
> table in which all the possible pairs of candidates are listed, one
> number for each pair indicates the number of voters who prefer one
> candidate in the pair over the other candidate in the pair, another
> number for each pair indicates the number of voters who have the
> opposite preference for these two candidates, and a third number for
> each pair indicates the number of voters who express no preference
> between the two candidates. Using a computer, each possible sequence of
> candidates is considered, where a sequence consists of one of the
> candidates being regarded as the most popular candidate, another
> candidate being regarded as the second-most popular candidate, and so
> on. For each such sequence the numbers in the tally table that apply to
> that sequence are added together to produce a sequence score for this
> sequence. The sequence that has the highest sequence score indicates the
> overall order of preference for the candidates. If there is more than
> one sequence that has the same highest score, the sequences with this
> score shall be analyzed to identify one or more ties at one or more
> preference levels.

I agree that the definition above describes the Kemeny-Young method. I'd
even say it's the same (not just mathematically equivalent) even though
the mechanics are different:

There exist implementations of Kemeny-Young that don't rely on
enumerating all combinations and then choosing the best one. For
instance, integer programming implementations scale much better while
producing the optimal result. If the implementation mattered as to
whether one method is the same as another, then these implementations
would not be of the Kemeny-Young method because they don't use
exhaustive enumerations. But they are generally considered to be.

> Yes, the issue of how to handle Condorcet-cycle ties is not stated by
> John Kemeny (as far as I know), and I have not explicitly stated how
> such ties are handled in VoteFair popularity ranking. If I did make such
> a statement then you or others might argue that the two methods are not
> the same.
> 
> The software does attempt to handle ties, either to use a tie-breaking
> criteria or to declare an official tie. In the latter case, the VoteFair
> approach presumably matches the Kemeny approach. In the former case the
> candidates in the cycle can be re-sequenced and the sequence sums yield
> the same highest sum, yet SOME of those "tied" sequences can yield
> different sums. If this seems complex it's because it IS complex. It
> took lots of work to get the software to handle these and other "edge"
> cases. This approach is unlike what many software developers do, which
> is to dismiss edge cases with some kind of error message. In other
> words, my software looks deeper into the "tie."
> 
> So, keeping in mind that neither John Kemeny nor I have officially
> stated how ties (in the sequence sums) are to be handled, both methods
> are mathematically equivalent.

So it seems the situation is different than I thought. Instead of the
method not being the same as Kemeny, the case is more that the software
doesn't implement the method, but rather something that approximates
(and differs from) it.

As I've said before, I consider a method to be algorithm-agnostic.
Schulze implemented by the beatpath mechanism is the same method as
Schulze implemented by the cloneproof Schwartz dropping mechanism.
Furthermore, I think that a method is a function that takes an election
as its input and then provides either a winner set or a ranking (order
of finish).

If a method uses some kind of auxiliary input parameter, then the method
with one value of that auxiliary input parameter is not the same as the
method with another value of that parameter, unless it happens to be
inconsequential for the outcome. E.g. STV with a Droop quota is not the
same method as STV with the Hare method.

A software implementation is "of the method" or "implements the method"
if it agrees with the method for every input that it returns an output for.

Since the VoteFair method definition is the same (or equivalent) to the
Kemeny-Young method definition, the methods are the same. But the
software doesn't implement that method[1]. Instead, it implements a
family of approximations determined by the parameter. Each approximation
agrees with the full method if the number of candidates is below a
certain number, and returns undefined results above this number.

But the software doesn't give any impression that this is what it does.
Instead, from the documentation, one would get the impression that it
implements *the* Kemeny method (for any number of candidates). E.g. from
the VoteFair::VoteFairRanking Perl module documentation:

> This module calculates VoteFair Ranking results. The portions of
> VoteFair Ranking implemented here are:
> 
> - VoteFair popularity ranking. This voting method calculates the full
> popularity ranking of all candidates (or choices in the case of a
> survey) from most popular and second-most popular down to least
> popular. It uses the preference information collected on 1-2-3
> ballots (or any equivalent way of expressing "ranked" preferences).
> When a single position is being filled, the most popular candidate is
> declared the winner. This calculation method is mathematically
> equivalent to the Condorcet-Kemeny election method.

and further,

> VoteFair popularity ranking is described in Wikipedia, and is
> mathematically equivalent to the Condorcet-Kemeny method. The following
> comments explain the algorithm used here, which quickly calculates the
> ranking results. This explanation is important because some academic
> sources claim that the computations cannot be done quickly if there is a
> large number of choices being ranked.

> Calculation time:
> 
> The full algorithm used to calculate VoteFair popularity ranking
> results has a calculation time that is less than or equal to the
> following polynomial function:
> 
> T = A + ( B * N ) + ( C * ( N * N ) )
> 
> where T is the calculation time, N is the number of choices, and A
> and B and C are constants. (In mathematical notation, N * N would be
> written as N squared.) This function includes the calculation time
> required for the Choice-Specific Pairwise-Score (CSPS) pre-sort
> calculations.
> 
> This time contrasts with the slow execution times of the "NP-hard"
> approach, in which every sequence score is calculated in order to find
> the sequence with the highest score. If every sequence score were
> calculated (from scratch), the calculation time would be proportional to:
> 
> N! * N * N
> 
> where N is the number of choices, N! is N factorial (2 * 3 * 4 * ...
> * N), and N * N equals N squared. Note that N factorial equals the number
> of possible sequences, and N squared times one-half approximately equals
> the number of pairwise counts that are added to calculate each sequence
> score.
> 
> This clarification about calculation time is included because there
> is an academically common -- yet mistaken -- belief that calculating the
> "Condorcet-Kemeny method" is "NP-hard" and cannot be calculated in a
> time that is proportional to a polynomial function of N (the number of
> choices).

This description gives the impression that the software calculates the
VoteFair ranking (period, i.e. no approximations), that it does so in
polynomial time, and that it is possible to determine the Kemeny-Young
winner in polynomial time. But what it does is calculate an
approximation, which is how it can manage to do so in polytime without
proving P=NP.

Similarly, in the VoteFair C++ README.txt file:

> //  * VoteFair popularity ranking.  This voting method calculates the
> //  full popularity ranking of all candidates (or choices in the case
> //  of a survey) from most popular and second-most popular down to
> //  least popular.  It uses the preference information collected on
> //  1-2-3 ballots (or any equivalent way of expressing "ranked"
> //  preferences).  When a single position is being filled, the most
> //  popular candidate is declared the winner.  This calculation method
> //  is mathematically equivalent to the Condorcet-Kemeny election
> //  method.

which gives the impression that again, the software calculates the
method (although at least it doesn't mention being able to circumvent
NP-hardness :-).

Now, if I'm generous, I could say that what I understand you to be
saying is that when you say "this voting method calculates...", you're
talking about the software, but when you're saying "this calculation
method is mathematically equivalent...", you're talking about the
method. But it's easy to draw the wrong conclusion, as I did earlier.
The documentation could thus use being more clear about what the
software implements -- or the software could be rewritten so that it
implements the actual VoteFair (or Kemeny) method.

And again, I would suggest the latter as a good strategy. Instead of
producing correct results up to a certain number of candidates and then
producing undefined results - that it produces the correct results in
potentially exponential time, and that it just quits with a timeout if
it takes too long a time.

That way, it's immediately obvious to the user whether the method
produces a correct result (namely, that it'd do so when it does produce
a result at all). Instead of producing a result in managable time with
some probability (which you argue is low) of producing something
incorrect, it produces a result in managable time with similar (high if
you're right) probability of not getting a timeout.

In addition, if you use a reasonably good integer programming solver to
do so, you can quickly solve cases that the current software can't, e.g.
1000 candidates but no cycles at all.

[1] Strictly speaking, it implements a generalization of that method,
but I don't want to make things more confusing than they need to be; I'd
rather deal with one point at a time.