2

Parts of my question have been answered on the website separately. I was trying to find a way to reconcile between the two. The two parts are:

  1. The correlation between $x$ and $x^{2}$ is not zero if, let's say, $x$ is distributed uniformly between $0$ and $1$. However, if we shift the $x$ values by $-0.5$, so that $x$ is distributed symmetrically around the y-axis, the correlation becomes close to $0$. This is covered in the answers here: Why are $x$ and $x^2$ correlated?
  2. Correlation is translation invariant. This is covered here: Pearson correlation - can negative values in your data artificially increase the size of the correlation?

My question is, if correlation is indeed translation invariant, why does it change from non-zero to close to $0$ in the $x$ and $x^{2}$ case after the shift by $-0.5$?

S. M.
  • 23

1 Answers1

9

The correlation between $x$ and $y$ is translation invariant:

$$ \text{cor}(x,y) = \text{cor}(x+a,y+b). $$

Thus,

$$ \text{cor}(x,x^2) = \text{cor}(x+a,x^2+b). $$

However, you are not comparing these two quantities. You are comparing $\text{cor}(x,x^2)$ and

$$ \text{cor}(x+a,(x+a)^2)=\text{cor}(x+a,x^2+2ax+a^2) \neq \text{cor}(x+a,x^2+b). $$

The important difference is in the $2ax$ term in the middle expression, which is not a constant $b$ but varies with $x$.

Stephan Kolassa
  • 123,354