[EM] IMDb top 250 list with PR
Kristofer Munsterhjelm
km_elmet at t-online.de
Mon Feb 13 05:25:09 PST 2023
On 11.02.2023 22:27, Toby Pereira wrote:
> I've long thought that the IMDb top 250 list
> https://www.imdb.com/search/title/?groups=top_250&sort=user_rating would
> be more interesting if it was done using a sequential PR method. That
> way you would likely get more of people's absolute favourite films
> rather than those with broad appeal, but which might not be right at the
> top of people's lists. You're more likely to see differents genres
> represented etc.
I think I wrote a post about more or less this use of proportional
representation, but I just can't seem to find it. I said that I was
looking for a method that would work on very large numbers of candidates
and voters, and a reply suggested the use of SPAV/SPRV.
In my case, I was thinking about recommender systems (which is another
subject of interest of mine); the simplest recommender systems take the
current user's preferences (watched movies/ratings) and determines a
similarity to other users', then does a weighted sum of those
preferences to create suggestions for the users in question.
In essence, the other users' preferences become weighted rated or
Approval votes (depending on whether the system gathers ratings or just
thumbs up/down). There are other tricky parts when the recommendations
are implicit (e.g. they consist of movies the user has watched rather
than explicit thumbs up/down) because it's much harder to determine
whether a user didn't watch because he didn't know about the movie, or
because he knew he wouldn't like it.
In any case, summing the ratings predictions by other users becomes a
kind of approval/range voting process. But the output is often "hey, you
liked Casino Royale; here I will suggest every other Bond movie". So
proportional representation makes a lot of sense. But since the
database could potentially be extremely large (both in terms of
users/voters and items/candidates), STV can't work, and methods that
aren't polytime (e.g. Schulze STV) are right out. So SPAV/SPRV does seem
the right type of voting method to use in such a context.
I implemented SPAV on a basic recommender system for another site, and I
think it provided more varied answers, but because "users who liked what
you liked also liked" type reasoning can't get more subtle relations,
the quality is still limited: I got a sort of "cycling behavior". E.g.
suppose a user had watched Casino Royale and Interstellar, SPRV/SPAV
would then produce something like [Spy movie] [Spy movie] [Spy movie]
[Sci-fi] [Sci-fi] [Sci-fi] [Spy movie] [Spy movie] [Sci-fi] [Sci-fi]...
Which is better than just all spy movies all the time, but could still
be improved upon :-)
Current recommender systems are considerably more advanced than the
nearest-neighbor type above, but I think they miss an active learning
component, the lack of which leads them to be either too uninteresting
(recommending the same thing over and over), or have too little
information to say anything at all (thus recommending mainly things that
appeals to everybody).
Xing et al.'s paper, "Enhancing collaborative filtering music
recommendation by balancing exploration and exploitation",
https://archives.ismir.net/ismir2014/paper/000140.pdf, gives one way to
do this. They focus on the problem that recommending the same thing over
and over leads to the user getting bored and suggest a Bayesian approach.
Another approach could be having an interactive system where the user
can choose between different groups or clusters of items depending on
his interest; and when one group is chosen, the system then produces
subgroups within that group. E.g. first having a choice between spy
movies and sci-fi, and then Carré-type vs Mission Impossible-type spy
movies, or between space fantasy and near-future sci-fi (e.g. Star Wars
vs Gravity).
It might even be possible to combine Bayesian low rank matrix completion
with Thompson sampling to get a more comprehensive sort of active
learning. But I'm not sure if BLRMC is just a neat trick to make low
rank matrix completion (relevant for latent factor type recommenders)
into a maximum posteriori estimation problem, or if the distribution
itself can be used. And it might be much too hard to sample anyway.
Whew, this got a bit off-topic. But I guess I'm saying... recommender
systems are not a trivial matter either :-)
-km
More information about the Election-Methods
mailing list