[EM] Ballot Data Format
Jan Šimbera
simbera.jan at gmail.com
Sun Jun 6 04:29:46 PDT 2021
Hi Rob and all,
I like the simultaneous readability, versatility and robustness of ABIF.
If the multiplier separator is made variant (asterisk / colon / both),
I think the only difference between Pivot and ABIF would be the
significance of whitespace in candidate tokens, something that the
parser could also be liberal in accepting - I see the [] bracketed form
as more readable (and thus default) but no problem in making
the non-bracketed form acceptable.
Also, if leaving out the scores for the unordered variant is allowed,
the format will be able to record approval votes in a very readable
form, which I would support.
If the community eventually arrives at some sort of consensus,
I'll put the ABIF read/write implementation into the backlog of
votelib (which can already read BLT and STV formats) so that
it has an interchange format for ordinal ballots as well.
The implementation would benefit from a formal parsing definition
(BNF or similar); if not, I might create it in the process.
All the best,
Jan
On Fri, May 28, 2021 at 11:32 AM Rob Lanphier <roblan at gmail.com> wrote:
> Hi everyone
>
> John, thanks for putting the subject on this thread and for the
> pointer to Vote::Count. Carl and James, thanks for the pointers.
> I'll endeavor to keep all of you (and everyone on this mailing list)
> in the loop on the progress we're making. The reddit thread
> continued, which I'll repeat the important stuff that I said there.
>
> The name (and file extension) for the format that I'm gravitating
> toward is ABIF (".abif"), which stands for "aggregated ballot image
> format". I'm using the term "ballot image" because that seems to be
> the term of art for publishing real-world electoral results. Once
> upon a time, "ballot image" meant "a picture of the ballot", but now
> just means a crude ASCII representation in a line of text.
>
> I did some processing of the ballot images from San Francisco's 2018
> mayoral election, which involved some coding and some manual shell
> processing with grep and friends. My work was ugly the way that all
> manual futzing in bash is ugly, but I got a few regexps and some test
> data (and some experience) that I'm applying here. As I was
> processing the results, I had wished the results were aggregated in an
> easier to process manner. I would love to finish my processing work
> and publish it in a sane format that other programmers can use, which
> I'm hoping ABIF can become.
>
> I've been working for WAY TOO MANY years on text formats for
> aggregated results. The format that I used in my Perl script
> published in the Perl Journal in 1996 was a noble attempt, and was
> better (in many ways) than the revised JSON-based format I published
> in 2005 with electowidget. I think my obsession with JSON was an
> unfortunate detour in coming up with a good format. I've been
> studying text formats for structured data for a very long time, ever
> since I learned Perl (in 1994 or so) and the work intensified with the
> work on RTSP in 1996 through 1998. We briefly dabbled with making
> RTSP a binary protocol using ASN.1, but Dr. Henning Schulzrinne from
> Columbia University convinced us (by publishing his "RTSP prime"
> draft) that a text-based HTTP knockoff (with MIME headers) was the way
> to go. As we worked on RTSP (and SMIL), we saw the rise of XML, and
> the slow steady fading of XML as a data format with the advent of
> JSON, and YAML, and TOML, and many other simpler formats.
>
> That's my longwinded way of saying that the test cases that I've
> published on the ABIF electowiki page are (I think) a respectable
> start for a flexible text format that a wide variety of programmers
> can get their heads around:
> https://electowiki.org/wiki/ABIF
>
> IThe thing that I love about ABIF (as it's shaping up) is that it
> solves several big problems which my 2005 electowidget format didn't.
> It goes back to the roots of the 1996 Perl script, back before I "knew
> what I was doing", and seems simpler to work with for a reasonably new
> programmer (as I was in '96). My electowidget format REQUIRED
> ratings, which I would normalize to rankings as appropriate. But I
> was often trying to express elections that only had ranked ballots
> available (e.g. the 2003 Debian election, and the 2009 Burlington
> mayoral race). Having a format that ALLOWS for ratings, but doesn't
> require ratings seems appropriate given that IRV/RCV is much more
> common in municipal use right now than rating systems. I love that
> the format is very similar to the ad hoc format many have used here on
> the EM list for expressing rankings. I'm reasonably sure that writing
> a parser in any language that has reasonable regular expression
> support will be easy, and can probably be done with a single-pass
> parser. I haven't really started it yet, but I know how to write a
> spec that many programmers can look at, nod their heads, and say
> "yeah, I can work with that". I think having a test suite with
> well-specified expected output is going to be a key part of solving
> the interoperability problem, and it will be helpful for others to
> inspect in a piecemeal fashion rather than feeling obligated to read a
> ponderously long specification.
>
> In the next few days, I'll take a look at the implementations that
> have been mentioned in this thread. For example: Pivot's format that
> Carl pointed to. It looks to me that the format that Pivot uses is
> very similar to this proposed ABIF format, with the only difference I
> see (at first glance) is that this ABIF:
>
> 27: A > B > C > D
> 26: B > A = C > D
> 24: C > A = D > B
> 23: D > C > A > B
>
> ...would become this in Pivot:
> 27 * A > B > C > D
> 26 * B > A = C > D
> 24 * C > A = D > B
> 23 * D > C > A > B
>
> Changing colon (":") to asterisk ("*") is an interesting change to
> consider. I suspect that as we all look more closely at Pivot and
> other formats, there's going to be other incompatibilities and mindset
> differences to hash out. These all seem like easily solved problems,
> because I get the sense that many programmers are hungry for
> compatible solutions in this space, and are willing to write
> converters to be part of the compatibility party.
>
> At any rate, I think (if others on the mailing list don't mind) that
> we should just use this mailing list and electowiki as places to hash
> out the format. If we do this right, it will be easy enough to use
> that people on this mailing list (and over on reddit, and many other
> places) will keep ABIF compatibility in mind when they write examples
> of elections to consider. Hopefully having more software
> compatibility in our ecosystem will make it easier for us to
> collaborate on analysis and speed up reform efforts.
>
> Rob
>
> On Thu, May 27, 2021 at 1:34 PM John Karr <brainbuz at brainbuz.org> wrote:
> >
> > As the author of Vote::Count, a standardized format for ballots would be
> > a big plus. When I've been able to collect sample data, the first thing
> > I need to do is convert it to my format. Currently Vote::Count has two
> > formats, a text one for ranked ballots and a json/yaml format for range
> > ballots. The documentation on my formats is here:
> > https://metacpan.org/pod/Vote::Count::ReadBallots
> >
> > I'm not on Reddit, but I think creating a working group of people with
> > an interest to propose a standard would be a great idea, and I'm
> > interested in helping.
> >
> > A standard format would allow creation of a library of data for which
> > electowiki would seem to be a natural home.
> >
> > On 5/27/21 4:02 PM, election-methods-request at lists.electorama.com wrote:
> >
> > > Send Election-Methods mailing list submissions to
> > > election-methods at lists.electorama.com
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > >
> http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
> > >
> > > or, via email, send a message with subject or body 'help' to
> > > election-methods-request at lists.electorama.com
> > >
> > > You can reach the person managing the list at
> > > election-methods-owner at lists.electorama.com
> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of Election-Methods digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > > 1. (no subject) (Rob Lanphier)
> > >
> > >
> > > ----------------------------------------------------------------------
> > >
> > > Message: 1
> > > Date: Wed, 26 May 2021 23:38:14 -0700
> > > From: Rob Lanphier <roblan at gmail.com>
> > > To: election-methods at lists.electorama.com
> > > Subject: [EM] (no subject)
> > > Message-ID:
> > > <CAK9hOYn2T=ympC7gEd8wS_8S8yjzK==
> xsmEfNKWo99cBjaXDgA at mail.gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Hi folks,
> > >
> > > There's an interesting discussion happening on reddit about ASCII
> > > formats for aggregated ballot images. I'll provide a deep link to my
> > > comment here:
> > >
> > > <
> https://www.reddit.com/r/EndFPTP/comments/nkm2cd/standardizing_cardinal_ballot_notation/gzls6pj/
> >
> > >
> > > What the original reddit poster (/user/jman722) made me realize is
> > > that it's possible to come up with a format that works for both range
> > > ballots and ranked ballots. The range ballots can be on a scale of
> > > 0-5, where 5 is "awesome", and 0 is "awful". The ranked ballots can
> > > be A>B>C.
> > >
> > > I'm going to use the example that the original reddit poster made:
> > >
> > > 12: Allie/5, Billy/5, Candace/4, Dennis/3, Edith/3, Frank/2,
> Georgie/1, Harold/0
> > > 7: Allie/4, Billy/0, Candace/2, Dennis/3, Edith/1, Frank/0, Georgie/5,
> Harold/3
> > > 5: Allie/0, Billy/3, Candace/2, Dennis/3, Edith/4, Frank/5, Georgie/3,
> Harold/4
> > >
> > > That format is good but not great. It takes a careful eye to see that
> > > Allie, Billy, Frank, and Georgie are the passionate favorites (earning
> > > a "5" score), and another close look to see that Allie, Billy, Frank,
> > > and Harold are listed as completely unacceptable (earning a "0" score)
> > >
> > > My old format that I used for my 1996 Perl script that I wrote and
> > > published in The Perl Journal would express those ballots this way:
> > >
> > > 12: Allie=Billy>Candace>Dennis=Edith>Frank>Georgie>Harold
> > > 7: Georgie>Allie>Dennis=Harold>Candace>Edith>Billy=Frank
> > > 5: Frank>Edith=Harold>Billy=Dennis=Georgie>Candace>Allie
> > >
> > > With this format, it becomes clear that 12 voters really like Allie
> > > and Billy and really don't like Harold. The next 7 voters really like
> > > Georgie, and really don't like Billy and Frank. The remaining 5
> > > voters really like Frank, but really dislike Allie. One has to add up
> > > 12+7+5 to realize there are 24 voters in this election.
> > >
> > > The ratings are stripped from my old 1996-ish format. It only
> > > provides the following parse tokens:
> > >
> > > [quantity]: [cand5yay] [> or =] [cand4good] [> or =] ... [cand0boo]
> > >
> > > It seems as though it would be possible to come up with a merged
> > > format that would express the range ballots above like this:
> > >
> > > 12: Allie/5 =Billy/5 >Candace/4 >Dennis/3 =Edith/3 >Frank/2 >Georgie/1
> >Harold/0
> > > 7: Georgie/5 >Allie/4 >Dennis/3 =Harold/3 >Candace/2 >Edith/1 >Billy/0
> =Frank/0
> > > 5: Frank/5 >Edith/4 =Harold/4 >Billy/3 =Dennis/3 =Georgie/3 >Candace/2
> >Allie/0
> > >
> > > The ">", "=", and "," characters could all be optional delimiters
> > > between the candidate/score tuples on each line (though at least one
> > > of those three delimiters WOULD be required). If ">" or "=" is used as
> > > a delimiter, then the candidates MUST be ordered by score (highest
> > > score first). Candidate tokens can be one or more ASCII characters
> > > ([A-Z] or [a-z]) OR the candidate token MUST start with a square
> > > bracket ([) and end with the closing square bracket (]), and the
> > > intervening text can be any unicode character (e.g. [Do?a Garc?a
> > > M?rquez] or [Ximena Pe?a] or [???]) . Whitespace can be discarded, but
> > > SHOULD be included for legibility.
> > >
> > > Linters could be created to deduplicate ballot lines, sort the
> > > candidate by score on each line, convert commas to ">" and "=" (for
> > > ranked ballot equivalents), and add whitespace for readability. They
> > > could optionally normalize the candidates to a range of ASCII letters
> > > (e.g. changing "Allie" to "A", "Billy" to "B", etc).
> > >
> > > The goal would be to make it useful for two people debating whether
> > > the Condorcet criterion or the Monotonicity criterion is more
> > > important. They could both easily crank out a set of ballots that
> > > could be fed into either a ranked-ballot counter or a rated-ballot
> > > counter. Having the candidate tuples sorted in each line makes it
> > > clearer what the preferences were of the set of voters represented by
> > > the given line.
> > >
> > > I think that parsers could be written for this format such that they
> > > follow Postel's Law (a.k.a the "robustness principle"):
> > > https://en.wikipedia.org/wiki/Robustness_principle
> > >
> > > To quote that^: "be conservative in what you do, be liberal in what
> > > you accept from others"
> > >
> > > People trying to express ranked ballots could drop the scores, and
> > > ONLY include ">" and "=" as a delimiter between candidates, People
> > > trying to express rated ballots could use commas (",") instead of ">"
> > > and "=". Programmers trying to parse handcrafted scenarios could
> > > figure out how to fill in the blanks.
> > >
> > > I'm tempted to write a reference parser for this, but first, what do
> > > you all think? Let the list know! Let me know! Let reddit know!
> > > :-D
> > >
> > > Thanks
> > > Rob
> > >
> > > p.s. I'm thinking of calling my version "ABIF", standing for
> > > "Aggregated Ballot Image Format". I may just document it here:
> > > https://electowiki.org/wiki/User:RobLa/ABIF
> > >
> > >
> > > ------------------------------
> > >
> > > Subject: Digest Footer
> > >
> > > _______________________________________________
> > > Election-Methods mailing list
> > > Election-Methods at lists.electorama.com
> > >
> http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
> > >
> > >
> > > ------------------------------
> > >
> > > End of Election-Methods Digest, Vol 202, Issue 7
> > > ************************************************
> >
> > ----
> > Election-Methods mailing list - see https://electorama.com/em for list
> info
> ----
> Election-Methods mailing list - see https://electorama.com/em for list
> info
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.electorama.com/pipermail/election-methods-electorama.com/attachments/20210606/774dbb99/attachment-0001.html>
More information about the Election-Methods
mailing list