1

I have a list of websites and Facebook likes count for each of them. Count varies from 0 to millions.

For each website I want to make up some Score Value from 0 to 10, which would represent sites' social popularity.

Any thoughts how to deal with such a problem would be much appreciated.

Update:
Data summary:
Min. : 1
1st Qu.: 2
Median : 8
Mean : 908
3rd Qu.: 28
Max. :10841643

mdewey
  • 17,806

1 Answers1

4

Maybe this would work: $$ \text{social popularity} = \frac{\text{count}}{\text{max count}} \cdot 10$$

  • 3
    +1 I would suggest perhaps taking the log of the count (+1), because this kind of data will have an extreme right tail. Unless you do some kind of transform, you'll end up with the vast bulk of sites having a score of 0 or 1, and a handful with 9 or 10. – Hong Ooi Jul 24 '13 at 19:27
  • It is indeed extremely right tailed. Min. : 1
    1st Qu.: 2
    Median : 8
    Mean : 908
    3rd Qu.: 28
    Max. :10841643 Can you please write exact formula, where that log(count) should go. Thanks.
    – Viacheslav Jul 25 '13 at 11:03
  • +1 for @HongOoi suggestion. Usually this type of web counters follows a Zipf curve, which, when is plotted as $x = log(rank+1)$ over $y = log(count+1)$ looks linear. Also, it depends on what you want to achieve with that score. – rapaio Jul 25 '13 at 11:32
  • The skewness of the distribution is a fundamental feature of this kind of data, I disagree on using any log transformation. (+1) to snegostup's comment and (+1) to PEV's answer. – stochazesthai Apr 25 '15 at 07:40