[EM] Ballot Data Format
Rob Lanphier
roblan at gmail.com
Fri May 28 02:31:25 PDT 2021
Hi everyone
John, thanks for putting the subject on this thread and for the
pointer to Vote::Count. Carl and James, thanks for the pointers.
I'll endeavor to keep all of you (and everyone on this mailing list)
in the loop on the progress we're making. The reddit thread
continued, which I'll repeat the important stuff that I said there.
The name (and file extension) for the format that I'm gravitating
toward is ABIF (".abif"), which stands for "aggregated ballot image
format". I'm using the term "ballot image" because that seems to be
the term of art for publishing real-world electoral results. Once
upon a time, "ballot image" meant "a picture of the ballot", but now
just means a crude ASCII representation in a line of text.
I did some processing of the ballot images from San Francisco's 2018
mayoral election, which involved some coding and some manual shell
processing with grep and friends. My work was ugly the way that all
manual futzing in bash is ugly, but I got a few regexps and some test
data (and some experience) that I'm applying here. As I was
processing the results, I had wished the results were aggregated in an
easier to process manner. I would love to finish my processing work
and publish it in a sane format that other programmers can use, which
I'm hoping ABIF can become.
I've been working for WAY TOO MANY years on text formats for
aggregated results. The format that I used in my Perl script
published in the Perl Journal in 1996 was a noble attempt, and was
better (in many ways) than the revised JSON-based format I published
in 2005 with electowidget. I think my obsession with JSON was an
unfortunate detour in coming up with a good format. I've been
studying text formats for structured data for a very long time, ever
since I learned Perl (in 1994 or so) and the work intensified with the
work on RTSP in 1996 through 1998. We briefly dabbled with making
RTSP a binary protocol using ASN.1, but Dr. Henning Schulzrinne from
Columbia University convinced us (by publishing his "RTSP prime"
draft) that a text-based HTTP knockoff (with MIME headers) was the way
to go. As we worked on RTSP (and SMIL), we saw the rise of XML, and
the slow steady fading of XML as a data format with the advent of
JSON, and YAML, and TOML, and many other simpler formats.
That's my longwinded way of saying that the test cases that I've
published on the ABIF electowiki page are (I think) a respectable
start for a flexible text format that a wide variety of programmers
can get their heads around:
https://electowiki.org/wiki/ABIF
IThe thing that I love about ABIF (as it's shaping up) is that it
solves several big problems which my 2005 electowidget format didn't.
It goes back to the roots of the 1996 Perl script, back before I "knew
what I was doing", and seems simpler to work with for a reasonably new
programmer (as I was in '96). My electowidget format REQUIRED
ratings, which I would normalize to rankings as appropriate. But I
was often trying to express elections that only had ranked ballots
available (e.g. the 2003 Debian election, and the 2009 Burlington
mayoral race). Having a format that ALLOWS for ratings, but doesn't
require ratings seems appropriate given that IRV/RCV is much more
common in municipal use right now than rating systems. I love that
the format is very similar to the ad hoc format many have used here on
the EM list for expressing rankings. I'm reasonably sure that writing
a parser in any language that has reasonable regular expression
support will be easy, and can probably be done with a single-pass
parser. I haven't really started it yet, but I know how to write a
spec that many programmers can look at, nod their heads, and say
"yeah, I can work with that". I think having a test suite with
well-specified expected output is going to be a key part of solving
the interoperability problem, and it will be helpful for others to
inspect in a piecemeal fashion rather than feeling obligated to read a
ponderously long specification.
In the next few days, I'll take a look at the implementations that
have been mentioned in this thread. For example: Pivot's format that
Carl pointed to. It looks to me that the format that Pivot uses is
very similar to this proposed ABIF format, with the only difference I
see (at first glance) is that this ABIF:
27: A > B > C > D
26: B > A = C > D
24: C > A = D > B
23: D > C > A > B
...would become this in Pivot:
27 * A > B > C > D
26 * B > A = C > D
24 * C > A = D > B
23 * D > C > A > B
Changing colon (":") to asterisk ("*") is an interesting change to
consider. I suspect that as we all look more closely at Pivot and
other formats, there's going to be other incompatibilities and mindset
differences to hash out. These all seem like easily solved problems,
because I get the sense that many programmers are hungry for
compatible solutions in this space, and are willing to write
converters to be part of the compatibility party.
At any rate, I think (if others on the mailing list don't mind) that
we should just use this mailing list and electowiki as places to hash
out the format. If we do this right, it will be easy enough to use
that people on this mailing list (and over on reddit, and many other
places) will keep ABIF compatibility in mind when they write examples
of elections to consider. Hopefully having more software
compatibility in our ecosystem will make it easier for us to
collaborate on analysis and speed up reform efforts.
Rob
On Thu, May 27, 2021 at 1:34 PM John Karr <brainbuz at brainbuz.org> wrote:
>
> As the author of Vote::Count, a standardized format for ballots would be
> a big plus. When I've been able to collect sample data, the first thing
> I need to do is convert it to my format. Currently Vote::Count has two
> formats, a text one for ranked ballots and a json/yaml format for range
> ballots. The documentation on my formats is here:
> https://metacpan.org/pod/Vote::Count::ReadBallots
>
> I'm not on Reddit, but I think creating a working group of people with
> an interest to propose a standard would be a great idea, and I'm
> interested in helping.
>
> A standard format would allow creation of a library of data for which
> electowiki would seem to be a natural home.
>
> On 5/27/21 4:02 PM, election-methods-request at lists.electorama.com wrote:
>
> > Send Election-Methods mailing list submissions to
> > election-methods at lists.electorama.com
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
> >
> > or, via email, send a message with subject or body 'help' to
> > election-methods-request at lists.electorama.com
> >
> > You can reach the person managing the list at
> > election-methods-owner at lists.electorama.com
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Election-Methods digest..."
> >
> >
> > Today's Topics:
> >
> > 1. (no subject) (Rob Lanphier)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 26 May 2021 23:38:14 -0700
> > From: Rob Lanphier <roblan at gmail.com>
> > To: election-methods at lists.electorama.com
> > Subject: [EM] (no subject)
> > Message-ID:
> > <CAK9hOYn2T=ympC7gEd8wS_8S8yjzK==xsmEfNKWo99cBjaXDgA at mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi folks,
> >
> > There's an interesting discussion happening on reddit about ASCII
> > formats for aggregated ballot images. I'll provide a deep link to my
> > comment here:
> >
> > <https://www.reddit.com/r/EndFPTP/comments/nkm2cd/standardizing_cardinal_ballot_notation/gzls6pj/>
> >
> > What the original reddit poster (/user/jman722) made me realize is
> > that it's possible to come up with a format that works for both range
> > ballots and ranked ballots. The range ballots can be on a scale of
> > 0-5, where 5 is "awesome", and 0 is "awful". The ranked ballots can
> > be A>B>C.
> >
> > I'm going to use the example that the original reddit poster made:
> >
> > 12: Allie/5, Billy/5, Candace/4, Dennis/3, Edith/3, Frank/2, Georgie/1, Harold/0
> > 7: Allie/4, Billy/0, Candace/2, Dennis/3, Edith/1, Frank/0, Georgie/5, Harold/3
> > 5: Allie/0, Billy/3, Candace/2, Dennis/3, Edith/4, Frank/5, Georgie/3, Harold/4
> >
> > That format is good but not great. It takes a careful eye to see that
> > Allie, Billy, Frank, and Georgie are the passionate favorites (earning
> > a "5" score), and another close look to see that Allie, Billy, Frank,
> > and Harold are listed as completely unacceptable (earning a "0" score)
> >
> > My old format that I used for my 1996 Perl script that I wrote and
> > published in The Perl Journal would express those ballots this way:
> >
> > 12: Allie=Billy>Candace>Dennis=Edith>Frank>Georgie>Harold
> > 7: Georgie>Allie>Dennis=Harold>Candace>Edith>Billy=Frank
> > 5: Frank>Edith=Harold>Billy=Dennis=Georgie>Candace>Allie
> >
> > With this format, it becomes clear that 12 voters really like Allie
> > and Billy and really don't like Harold. The next 7 voters really like
> > Georgie, and really don't like Billy and Frank. The remaining 5
> > voters really like Frank, but really dislike Allie. One has to add up
> > 12+7+5 to realize there are 24 voters in this election.
> >
> > The ratings are stripped from my old 1996-ish format. It only
> > provides the following parse tokens:
> >
> > [quantity]: [cand5yay] [> or =] [cand4good] [> or =] ... [cand0boo]
> >
> > It seems as though it would be possible to come up with a merged
> > format that would express the range ballots above like this:
> >
> > 12: Allie/5 =Billy/5 >Candace/4 >Dennis/3 =Edith/3 >Frank/2 >Georgie/1 >Harold/0
> > 7: Georgie/5 >Allie/4 >Dennis/3 =Harold/3 >Candace/2 >Edith/1 >Billy/0 =Frank/0
> > 5: Frank/5 >Edith/4 =Harold/4 >Billy/3 =Dennis/3 =Georgie/3 >Candace/2 >Allie/0
> >
> > The ">", "=", and "," characters could all be optional delimiters
> > between the candidate/score tuples on each line (though at least one
> > of those three delimiters WOULD be required). If ">" or "=" is used as
> > a delimiter, then the candidates MUST be ordered by score (highest
> > score first). Candidate tokens can be one or more ASCII characters
> > ([A-Z] or [a-z]) OR the candidate token MUST start with a square
> > bracket ([) and end with the closing square bracket (]), and the
> > intervening text can be any unicode character (e.g. [Do?a Garc?a
> > M?rquez] or [Ximena Pe?a] or [???]) . Whitespace can be discarded, but
> > SHOULD be included for legibility.
> >
> > Linters could be created to deduplicate ballot lines, sort the
> > candidate by score on each line, convert commas to ">" and "=" (for
> > ranked ballot equivalents), and add whitespace for readability. They
> > could optionally normalize the candidates to a range of ASCII letters
> > (e.g. changing "Allie" to "A", "Billy" to "B", etc).
> >
> > The goal would be to make it useful for two people debating whether
> > the Condorcet criterion or the Monotonicity criterion is more
> > important. They could both easily crank out a set of ballots that
> > could be fed into either a ranked-ballot counter or a rated-ballot
> > counter. Having the candidate tuples sorted in each line makes it
> > clearer what the preferences were of the set of voters represented by
> > the given line.
> >
> > I think that parsers could be written for this format such that they
> > follow Postel's Law (a.k.a the "robustness principle"):
> > https://en.wikipedia.org/wiki/Robustness_principle
> >
> > To quote that^: "be conservative in what you do, be liberal in what
> > you accept from others"
> >
> > People trying to express ranked ballots could drop the scores, and
> > ONLY include ">" and "=" as a delimiter between candidates, People
> > trying to express rated ballots could use commas (",") instead of ">"
> > and "=". Programmers trying to parse handcrafted scenarios could
> > figure out how to fill in the blanks.
> >
> > I'm tempted to write a reference parser for this, but first, what do
> > you all think? Let the list know! Let me know! Let reddit know!
> > :-D
> >
> > Thanks
> > Rob
> >
> > p.s. I'm thinking of calling my version "ABIF", standing for
> > "Aggregated Ballot Image Format". I may just document it here:
> > https://electowiki.org/wiki/User:RobLa/ABIF
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Election-Methods mailing list
> > Election-Methods at lists.electorama.com
> > http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
> >
> >
> > ------------------------------
> >
> > End of Election-Methods Digest, Vol 202, Issue 7
> > ************************************************
>
> ----
> Election-Methods mailing list - see https://electorama.com/em for list info
More information about the Election-Methods
mailing list