How to correlate movie taste?

Question

I am looking for the best known way to correlate movie taste, i.e. to determine which people out of a large pool deviate the least from your own movie scores.

One site with this purpose is Criticker. As a longtime user, I have over the years read and participated in many discussions about its algorithm and its shortcomings in a quest to find a better one.

First, I'll briefly explain how the current algorithm works and what its drawbacks are. And then my question would be: Do you know a better one?

current algorithm:

Each users scores movies.
The user can freely choose a ranking scale anywhere between 0-100 (so 1-10 or 0-5 or 50-100 are all valid).
Movies are not compared to each other by their nominal score but by their percentile rank.
This is done in order to
- normalize the ranking scales (which could also be done linearly instead)
- be able to compare different voting behaviour, i.e. different use of the same ranking scale (see "advantages" below)
To compare the similarity of taste between 2 users, the differences between the respective percentile rank of each movie in common are averaged.

example:

movie 1 is in 76th percentile of user A and in 52th percentile of user B: difference=24
movie 2 is in 99th percentile of user A and in 90th percentile of user B: difference=9
movie 3 is in 89th percentile of user A and in 86th percentile of user B: difference=3
average difference = (25+9+3)/3 = 12

In practice an average difference (called "taste compatibility index" or TCI) of 12 is considered a quite good match.

advantages:

Users with different voting behaviour can be compared to each other. For example: Movies ranked 70 are considered terrible by user A but excellent by user B (although they both use a 0-100 scale). In theory the use of percentiles should allow for a more meaningful comparison, as movies are not looked at and compared to each other in isolation, but their relative place (upper end, lower end) on a user's voting spectrum is compared to each other.

drawbacks:

When users tend to watch only movies they like (which is the main purpose of this site), they end up having movies with very high nominal values in very low percentiles.
And as a consequence of this, 2 users with identical taste get a bad TCI match, if one of them has watched more bad movies. Example: User A and user B have 100 films in common with very similar (or even identical) nominal scores. But if in addition to that user B has watched 50 terrible movies (which user A would find just as terrible, but has avoided for obvious reasons), this impacts user B's percentiles in such a way that his shared films are in much higher percentile than for user A (although they nominally have similar or even identical scores).
A special case of the above is this: you create user account A and duplicate the account as user B (i.e. with exact same movies and exact same scores). The next day you score a couple of bad movies only with user account B. This skews the percentile distribution and, as a result, you don't match your own taste (user A) anymore, although your taste has not changed.

The question is:
Is there a better method (than just comparing percentiles) that would still offer its advantages (comparing different voting behaviour) but doesn't suffer from the flaws listed above ?

In the site's discussion forum, a user suggested using Spearman's rank correlation coefficient as an improvement (but without further explanation). I have found a specific example of how to use the SRCC for movie taste comparison, but given my humble maths skills I would like to inquire what the improvements (if any) using SRCC would achieve and what disadvantages it has?

And SCRR aside, are there even better methods for the purpose at hand ?

Not sure if this would work, but maybe its worth a try: When a new user creates an account i would ask him or her to name the worst movie he or she ever saw and also the best movie. In addition i would ask the person to rate the movies on a desired scale. From now on i would use this two values as "use of scale" parameters for the person and would transform all follow-up ratings withing this range (given rating for worst movie becomes 0, given rating for best movie becomes 100). Before transformation it might be a good idea to update the min/max, just in case the person found new high/lowlight — TinglTanglBob, Nov 27 '18 at 16:20
@TinglTanglBob: The question about the best/worst movie is unnecessary, because this is determined by the highest/lowest score. But other than that: What you describe is exactly what I meant by "which could also be done linearly instead". It's not really relevant to the question though, because normalizing/equalizing different voting scales is no problem at all and can easily be done linearly (e.g. a 0-5 voting scale becomes 0-100 by multiplication of 20) as you described yourself. The problem is not different voting scales. The problem is different use of the same voting scale (item 4b) — summerrain, Nov 27 '18 at 16:28
Well, as you've said a user could only watch and rate movies he/she likes. What if a users lowest score is 70. Does that mean he watched filmes he liked a lot only or does it mean he dishes out 70 as a min value, even for movies he really doesn't like. — TinglTanglBob, Nov 27 '18 at 16:31
I don't trust movie scores. they have too much bias of all kinds in them. for instance, i'm lazy and don't score every movie that i watch etc. it's more reliable to observe watching habits, in my opinion. starting from you watch, but also to account for whether you saw the whole thing, time of week and year, etc. Overall, I find recommendations from all streaming services pretty bad, despite the claim that they take into account my preferences, I don't see it in results. most of what they recommend is garbage to me, which I don't bother even trying. — Aksakal, Nov 27 '18 at 16:40
@TinglTanglBob: Yes,this seems to be the conundrum. Your scenario1 may be an avid genre fan only watching and ranking movies of this genre >70. In your scenario2 the value of 70 just has a different (worse) meaning to the user. If the numbers had the same meaning for all users, it would be better to compare nominal values. The reason for comparing percentiles instead, is that some users (e.g.myself) consider 10 terrible and 50 already quite interesting, while others deem everything below 60 terrible. The question is: Is there another, better method than comparing nominal values or percentiles? — summerrain, Nov 27 '18 at 17:04
@Aksakal: Of course numbers don't (always) perfectly translate the reality. But that's a little off-topic. The idea is to avoid unnecessarily introducing an additional bias on top of that by choosing improper algorithms. — summerrain, Nov 27 '18 at 17:07
@rainbowtableturner: By collecting ratings for the best and worst movie you should get values for which you know what they mean. So if you know a person rates the worst movie ever seen by 10 and rates titanic as 70 it might be a pretty good movie. If another person rates the worst movie ever seen with 50, a rating of 70 for titanic doesn't look so great any more. So the next step is transforming a given rating within the range of min and max. This transformation does not have to be linear at all. If you have any idea about the relation of true liking and score you can use it here :P — TinglTanglBob, Nov 27 '18 at 17:15
Other than that i have to agree with Aksakal: When it comes to movie recomendation observeable watching habits seem to be a good predictor. You could have a look at the WALS algorithm for example. It could be used by directly observable data but it should work with ratings too. So maybe you could compare the outcomes for using different predictors. — TinglTanglBob, Nov 27 '18 at 17:21
@TinglTanglBob YouTube appears to give interesting recommendations. It shows me 20 or so clips to watch at the top, and I usually end up clicking on a few of them. YT has "like/dislike" and "add to list" buttons, no ratings. So, I'd say that a simple thumbs up/down is probably pretty good. I think "reliability" of this instrument is robust enough to justify its usage despite the biases I mentioned — Aksakal, Nov 27 '18 at 19:16
Aksakal, please keep comments on topic. The question is about this specific scenario described in the question (input data consists of scores on a 0-100 scale). — summerrain, Nov 27 '18 at 21:38
@TinglTanglBob: Would you want to elaborate on WALS in an answer, including an example of how to use it? It sounds interesting, but already the wiki article is over my head. Also, what do you mean by "observable watching habits"? All the input data available are the user-submitted movie scores. — summerrain, Nov 27 '18 at 21:45
I think there is far better information in the www than i could ever give in an answer. But i try to give a source which i find helpful: http://dsnotes.com/post/2017-05-28-matrix-factorization-for-recommender-systems/ - by observable habits i mean data you can collect by just observing what a user does without asking any further questions. Variables could be "Did user click on video v1", "did user watch video v1 from beginning to end", "did user add a like to video v1" and so on. The benefit of this kind of data is, that you get it from every user, not just from those who are willing to answer — TinglTanglBob, Nov 28 '18 at 08:52
@TinglTanglBob: Thanks but the movie scores are the only user-submitted and collected data and the only data I am interested in. (There are no videos to click on.) — summerrain, Nov 29 '18 at 01:01
I think votings can tell you more than just the number. For example I'd expect people who like horror movies to rate more horrormovies than anything else. So for similar tastes it might be enough (or a first try) to compare which movies haven been rated by each user. But this doesn't save the actual problem about mapping the rating to true liking. Maybe for this kind of question it would be helpful to change the numerical scales to verbal ones (really bad movie - rather bad - rather good - really good movie)? — TinglTanglBob, Nov 29 '18 at 09:01
@TinglTanglBob: The downside of verbal scores is: It's limited to 5-10 levels. Beyond that it's difficult to express nuances verbally. What would be the verbal difference between 36 and 37? The other problem is that people also use words differently. Also people would never agree on a common verbalization. Granted, it's a good idea, but it's not how the site works. They will never delete the hundreds of thousand of numerical scores they already have. So the bottom line is, I am most interested in how to deal with the already existing numerical scores. — summerrain, Nov 30 '18 at 22:12
Thats of course true, but on the other hand one can question if there is a meaningful difference between a rating of 36 to 37 (and by transforming the scores to personal percentiles you loose the numeric difference anyway). For a recomendation-system i think that a 5-point scale are enough (if it shouldn't be verbal, there would be sam-scales http://www.alliedacademies.org/articles/emotion-analysis-using-sam-selfassessment-manikin-scale.html). However, the site is how it is, so no need to talk about other scales anymore since they wont happen :) — TinglTanglBob, Dec 01 '18 at 10:08
For further thoughts i think an important question is: Do you need the correlation of movie-taste for building up a user recomendation or for some other purpose? If first we could think about not using correlation but only focusing on the most liked filmes by user, since for a recomendation a medium and a bad movie are just the same -> not recomendable — TinglTanglBob, Dec 01 '18 at 10:19

How to correlate movie taste?

0 Answers0

Linked