1

I tried to calculate Spearman's rank coefficient by hand.

Data:

enter image description here

enter image description here

But when I use python, it returns a different value.

rawdata = pd.DataFrame(
    [
        [3,4],
        [5,4],
        [6,2],
        [6,4],
        [8,9],
        [11,7]
    ],
    columns=['Set of A','Set of B'])
print(rawdata)
correlation, pval = spearmanr(rawdata)
print(f'correlation={correlation:.6f}, p-value={pval:.6f}')

It returns:

correlation=0.585239, p-value=0.222365

What am I doing wrong here?

shin
  • 113

1 Answers1

2

The formula you're using is a simplified version for when there are no ties. Try the full version of the formula:

$r_s=\frac{\Sigma_i(a_i-\bar{a})(b_i-\bar{b})}{\sqrt{\Sigma_i(a_i-\bar{a})^{2}(b_i-\bar{b})^{2}}}$

Where $a_i$ and $b_i$ are the fractional ranks you've already found in your table

CFD
  • 416
  • You seem to be writing about the Pearson correlation coefficient. The calculations presented in the question clearly deal with ties (in a standard manner). – whuber Feb 14 '20 at 18:09
  • I believe that Spearman correlation is equal to the Pearson correlation of the fractional ranks – CFD Feb 14 '20 at 18:11
  • 1
    Thank you for modifying and explaining your notation. Doing the calculation confirms your answer is a correct explanation of the discrepancy (+1). – whuber Feb 14 '20 at 18:36