I am looking for the best known way to correlate movie taste, i.e. to determine which people out of a large pool deviate the least from your own movie scores.
One site with this purpose is Criticker. As a longtime user, I have over the years read and participated in many discussions about its algorithm and its shortcomings in a quest to find a better one.
First, I'll briefly explain how the current algorithm works and what its drawbacks are. And then my question would be: Do you know a better one?
current algorithm:
- Each users scores movies.
- The user can freely choose a ranking scale anywhere between 0-100 (so 1-10 or 0-5 or 50-100 are all valid).
- Movies are not compared to each other by their nominal score but by their percentile rank.
- This is done in order to
- normalize the ranking scales (which could also be done linearly instead)
- be able to compare different voting behaviour, i.e. different use of the same ranking scale (see "advantages" below)
- To compare the similarity of taste between 2 users, the differences between the respective percentile rank of each movie in common are averaged.
example:
movie 1 is in 76th percentile of user A and in 52th percentile of user B: difference=24
movie 2 is in 99th percentile of user A and in 90th percentile of user B: difference=9
movie 3 is in 89th percentile of user A and in 86th percentile of user B: difference=3
average difference = (25+9+3)/3 = 12
In practice an average difference (called "taste compatibility index" or TCI) of 12 is considered a quite good match.
advantages:
- Users with different voting behaviour can be compared to each other. For example: Movies ranked 70 are considered terrible by user A but excellent by user B (although they both use a 0-100 scale). In theory the use of percentiles should allow for a more meaningful comparison, as movies are not looked at and compared to each other in isolation, but their relative place (upper end, lower end) on a user's voting spectrum is compared to each other.
drawbacks:
When users tend to watch only movies they like (which is the main purpose of this site), they end up having movies with very high nominal values in very low percentiles.
And as a consequence of this, 2 users with identical taste get a bad TCI match, if one of them has watched more bad movies. Example: User A and user B have 100 films in common with very similar (or even identical) nominal scores. But if in addition to that user B has watched 50 terrible movies (which user A would find just as terrible, but has avoided for obvious reasons), this impacts user B's percentiles in such a way that his shared films are in much higher percentile than for user A (although they nominally have similar or even identical scores).
A special case of the above is this: you create user account A and duplicate the account as user B (i.e. with exact same movies and exact same scores). The next day you score a couple of bad movies only with user account B. This skews the percentile distribution and, as a result, you don't match your own taste (user A) anymore, although your taste has not changed.
The question is:
Is there a better method (than just comparing percentiles) that would still offer its advantages (comparing different voting behaviour) but doesn't suffer from the flaws listed above ?
In the site's discussion forum, a user suggested using Spearman's rank correlation coefficient as an improvement (but without further explanation). I have found a specific example of how to use the SRCC for movie taste comparison, but given my humble maths skills I would like to inquire what the improvements (if any) using SRCC would achieve and what disadvantages it has?
And SCRR aside, are there even better methods for the purpose at hand ?