3

I'm trying to calculate correlation using a formula in Statistics 4th Edition by Freedman:

r = average of (x in standard units) * (y in standard units)

If I try this out ...

x = 1:7
y = c(6,7,5,4,3,1,2)

x.z = scale(x)
y.z = scale(y)

prod = x.z * y.z
mean(prod)
[1] -0.7959184

However, if I use the builtin cor I get a different answer:

cor(x, y)
[1] -0.9285714

Looking through the worked examples in the book, the standard values for x and y seem to be rounded to the nearest 0.5, so I round my values and I get the expected answer:

x.z.round = round(x.z/0.5)*0.5 
y.z.round = round(y.z/0.5)*0.5 

prod.round = x.z.round * y.z.round
mean(prod.round)
[1] -0.9285714

Why do the x and y scaled values seemingly need to be rounded to the nearest 0.5?

  • 2
    The answer is that cor does not implement the correlation coefficient as defined in your reference textbook. It's important to consult its documentation (type ?cor) and compare its definition to that your book is using. – whuber Dec 20 '18 at 14:32

1 Answers1

5

You made a mistake, the formula for Pearson's correlation coefficient (using the standardized formula) is divided by n-1, not n. So if you use sum(prod)/6 you get the correct result.

> sum(prod)/6
[1] -0.9285714
user2974951
  • 7,813
  • Incredible! So was it just by chance that rounding came up with the right answer? – Chris Snow Dec 20 '18 at 10:43
  • @ChrisSnow Trying your code for a different sample produces incorrect results, so this may very well be just a coincidende. – user2974951 Dec 20 '18 at 10:53
  • 2
    The OP did not make a mistake. The textbook they cite always divides by $n,$ not $n-1.$ See https://stats.stackexchange.com/a/3932/919 for a fuller account of this. Thus, the correct answer is to multiply the result of cor by the square root of $n/(n-1)$ rather than to accept what the software tells you! – whuber Dec 20 '18 at 14:32