# [EM] SARVO-Range as lottery utility (vNM utility) method?

Kristofer Munsterhjelm km_elmet at t-online.de
Tue Apr 26 02:13:26 PDT 2022

```On 26.04.2022 00:51, Forest Simmons wrote:
>
>
> El lun., 25 de abr. de 2022 4:25 a. m., Kristofer Munsterhjelm
> <km_elmet at t-online.de <mailto:km_elmet at t-online.de>> escribió:
>>
>>     So the M-W strategy is: let
>>             v_i be the strategic rating we want to find
>>             u_i be the public utility of candidate i
>>             p_ij be the voter's perceived probability that i and j will
>>     be tied.
>>
>
> I could be wrong but I think it should be "tied for winning."

You're right. I was looking at the paper just now, and it says:

"For each pair of candidates i and j, the /pivot probability/ p is the
probability (perceived by a voter) of the event that candidates i and j
will be tied for first place in the election."

I imagine you could refine it a little by letting the p.p. be
parameterized by the vote to submit. E.g. if it's Range voting and i's
score minus j's score is 1, then you could flip the win from i to j by
voting j 10 and i 0. But this would complicate the strategy a lot at
(probably) only very slight benefit.

> It is interesting that this strategy can actually result in non-solid
> approval coalitions on ballots ... sometimes it requires you to approve
> X while leaving unapproved some candidate Y rated above X on the same
> ballot ... i.e. insincere strategy.
>
> Furthermore, if estimates of both the utilities u_i and u_j, as well as
> of the probabilities p_ij in question were known with a high degree of
> precision, you might get away with those insincere gaps in the approval
> order.
>
> These facts reflect the fragility (anti-robustness) of the winning tie
> probability based strategy.

Yes, I think Warren observed something similar: under imperfect
information, the optimal Range/Approval strategy might have you
approving of X and not Y even though you rank Y ahead of X. Under
perfect information, there's always some kind of cutoff where you
approve everybody above it and don't everybody below it.

> Nevertheless, your result is highly relevant because it shows that on a
> fundamental level there is a meaningful, experimental way of defining
> individual utilities that are just as good as the theoretical utilities
> invoked as a basis for Approval strategy.

I keep harping on the problem of Range and Approval to fail "de facto
IIA" despite passing it de jure, and I suspect it's related to this. If
we can't standardize a and b, then if the method behaves differently
when given u_i and up_i values, then you can get strange behavior. So
the guidelines about how to vote (mean utility, etc) are just
preprocessing steps that make your ballot expression no longer depend on
what a and b are. Then it's much more honest to attach these guidelines
to the method itself so it does so for the voter, so that voters don't
have to care about what society's a and b values are supposed to be, and
so that the method doesn't get away with sweeping de-facto failures
under the rug.

At least MJ recognizes this and says "the only way we're going to get
IIA is if we have a and b values that are close enough to commensurable
that the problem doesn't occur". And then the point of using grades
instead of scores, and using order statistics, is to make the whole
process relatively insensitive to what a and b are, so that (hopefully)
a common grade standard can be established.

> It is equally true for the not as sensitive strategy of approving the
> candidates k with above expectation utilities:
> u_k >sum P_i u_i,
> based on estimates of (non tie based) winning probabilities P_i, which
> are still sketchy because of rampant misinformation, not to mention
> intentional disinformation.

Those are zero-info strategies, and indeed, they're also insensitive to
a and b.

SARVO tries to get around the fragility/chaos problem by averaging over
a lot of vote orders. But it's somewhat of a hack; it's not particularly
elegant, and it fails scale invariance. Perhaps better is finding a
voting equilibrium where the mixed strategy is so that the distribution
of the M-W votes are stable, and then electing the candidate with the
highest expected score. I haven't read the M-W paper in detail, though,
so I don't know if finding this equilibrium is easy.

(Another possibility, inspired by counterfactual regret minimization, is
to do M-W strategy by every voter, and then once everybdoy has submitted
a vote, pulling one of the voters from the list and having him readjust
his strategic ballot. Keep doing so over a long enough timeline and the
average of scores should converge to an equilibrium.)

For the zero-info strategies, I tried to figure out what the optimum
zero info strategy is for Lp cumulative voting. I didn't get all the way
there, but this is what I figured:

Under zero information, p_ij is equal for all pairs, and is (I think)
1/n^2. So the objective for a zero-info voter is to maximize
SUM i=1..n v_i R_i
with R_i = SUM i != j: 1/(n^2) (u_i - u_j).

We also have the constraint that SUM i=1..n |v_i|^p = 1 (due to Lp
normalization).

So to use a Lagrangian:
max SUM i=1..n R_i v_i + lambda (1 - SUM i=1..n |v_i|^p)
i.e.
max SUM i=1..n (R_i v_i - lambda |v_i|^p) + lambda

Now do a hack and use v_i^p instead because it's easier to differentiate
(might not be sound?), and let's consider one particular v, say v_1.

The derivative wrt v_1 is
v_1 = ( -R_1/(lambda*p) )^(1/(p-1))
and wrt lambda
sum i=1..n: v_i^p = 1.

So what that means is that the optimum is at
v_i = (R_i/k)^(1/(p-1))
where k is a constant set so that the pth powers of the voting variables
sum to one. (I.e. lambda is set so that -lambda p = k, because the
derivative wrt lambda itself places no constraint on lambda.)

In particular, for max norm (Range), the calculation involves an 1/infty
norm, i.e. 0 norm, so that the scores only depend on the sign values of
the R variables. I don't *quite* get the right result here (it seems to
indicate the optimum vote would be +1 or -1 for every candidate), which
I think is because I turned |v_i| into v_i above.

For ordinary cumulative voting (l1-cumulative), all R_i are raised to
some power that's approaching infinity. So as this power approaches
infinity, the k term grows to satisfy the constraint that the pth power
sums must be 1. This means that everything except the v_i corresponding
to the greatest R_i will approach zero, whereas the remaining one
approaches one. So the best zero-info strategy is to give max score to
your favorite and nobody else.

For the quadratic norm, v_i = R_i/k, so only here is the zero info vote
directly proportional to R_i.

And R_i - R_j is proportional to u_i - u_j with the same constant of
proportionality throughout, because:
R_i - R_j = 1/(n^2) (SUM i!=k (u_i - u_k) - SUM j!=k (u_j - u_k))
= 1/(n^2) ( (n-1) u_i - SUM k: (u_k) + u_i - (n-1) u_j + SUM k:
(u_k) - u_j)
= 1/(n^2) (n (u_i - u_j))
= 1/n (u_i - u_j)

Hence for quadratic voting, so are the optimal zero info scores v_i.
Looking at R_i - R_j removes the b factor, which is probably why I can't
show that R_i is proportional to u_i directly.

Again, it's not entirely sound but it indicates the general direction.
Do improve my calculations if you can, as they're very rough.

(The problem with quadratic voting is that it isn't cloneproof. I
suspect that only Range itself is, because for every other p-norm >= 1,
you can imagine a two-candidate election where A gets 1+epsilon points,
B gets 1, then clone A to make A lose, if you just make epsilon small
enough.)

-km
```