0

I have a GitHub dataset of three variables (star_count, fork_count and watch_count) for each repository. Now, I want to find a popularity score using the three variables. For example, I can normalize the value of three variables and take average to calculate the popularity score for a repository.

Now, I will average the three variables if three variables are rank correlated. To calculate rank correlation between two variables, I can calculate Spearman’s rho. However, what is the way to calculate the rank correlation between three variables?

One approach I thought of calculating rank correlation of every pair of variables and check if the pairs are correlated. Will it be a good approach? Is there any statistical test to find rank correlation between three variables?

  • Related: https://stats.stackexchange.com/questions/588968/why-is-correlation-only-defined-between-two-variables/589221#589221 – Galen Mar 30 '23 at 17:08
  • If your objective is to find a "popularity score," then ask about that instead of going down this rabbit hole of rank correlations and averages, which has no apparent connection to scoring. Could you explain how popularity might be related to your variables and how you know? – whuber Mar 30 '23 at 17:51
  • If a repository contains more watch_count, fork_count and start_count then the repository is more popular. But fork_count value ranges will be less than watch_count. Similarly the range of other values are different. I want to have this statistical correlation that these variables are rank correlated so that I can normalize the values of three variables and take average to calculate the popularity score for a repo. – Setu Kumar Basak Mar 30 '23 at 17:56

0 Answers0