I kind of see that the covariance is similar to the product of the standard deviations but I can't figure out how. My problem is that I don't understand how does this formula tells us how strong the connection is between 2 variables.
-
1Also informative: https://stats.stackexchange.com/questions/83347 and https://stats.stackexchange.com/questions/235004. – whuber Aug 27 '21 at 13:48
-
1Possible duplicate: https://stats.stackexchange.com/q/235004/110833 – Roger V. Jan 06 '23 at 08:30
4 Answers
Ok, I'm going to assume you recognise the formula for a covariance, but that saying "Cauchy-Schwarz inequality" will not be helpful. So $$\mathrm{cov}[X,Y]=\frac{1}{n-1} \sum_{i=1}^n (x_i-\bar x)(y_i-\bar y).$$
The most similar that $X$ and $Y$ could be would be that they were identical: $y_i=x_i$. In that case, the covariance is
$$\begin{align} \mathrm{cov}[X,Y] &=\frac{1}{n-1} \sum_{i=1}^n (x_i-\bar x)(x_i-\bar x) =\mathrm{var}[X] \\[6pt] &=\frac{1}{n-1} \sum_{i=1}^n (y_i-\bar y)(y_i-\bar y) =\mathrm{var}[Y]. \\[6pt] \end{align}$$ The correlation is then $$\rho=\frac{\mathrm{cov}[X,Y]}{\sqrt{\mathrm{var}[X]\mathrm{var}[Y]}}=\frac{\mathrm{var}[X]}{\sqrt{\mathrm{var}[X]\mathrm{var}[X]}}=1.$$
Ok, does scaling one of the variables matter? No: if you double $Y$ you end up with a 2 in the numerator and $2^2=4$ inside the square root for $\mathrm{var}[Y]$; the correlation is still 1.
So there's no way you can get the correlation to be bigger than 1, and it's equal to 1 when the two variables are identical or when one is a positive multiple of the other, or (more generally) when one is a positive multiple of the other plus a constant difference -- ie, a straight line relationship.
The most opposite that $X$ and $Y$ could be would be that $Y=-X$: when $X$ goes up, $Y$ goes down by the same amount. The same sort of calculations show that the correlation is then $-1$; it's also $-1$ when one is a negative multiple of the other plus a constant difference -- ie, a straight line relationship
Anything you do to make $(x_i-\bar x)(y_i-\bar y)$ bigger will also make at least one of $(x_i-\bar x)^2$ and $(y_i-\bar y)^2$ bigger. You can make the numerator of the correlation as large as you like, but the denominator will also be large, and the maximum will still be $1$.
- 56,404
- 8
- 127
- 185
- 38,062
The answer by Thomas Lumley gives an intuitive view of the matter, so I'm going to show it using vector notation via the law of cosines to give an alternative view of the matter. To do this, suppose we create vectors $\mathbf{u}=(u_1,...,u_n)$ and $\mathbf{v}=(v_1,...,v_n)$ where the elements are deviations from the means:
$$u_i = x_i-\bar{x} \quad \quad \quad \quad v_i = y_i-\bar{y}.$$
We can now write the sample covariance and the sample variances in vector notation as:
$$\begin{align} r_{X,Y} &= \frac{1}{n-1} \sum_{i=1}^n u_i v_i = \frac{1}{n-1} \mathbf{u} \cdot \mathbf{v} \\[6pt] s_{X}^2 &= \frac{1}{n-1} \sum_{i=1}^n u_i^2 = \frac{1}{n-1} ||\mathbf{u}||^2 \\[6pt] s_{Y}^2 &= \frac{1}{n-1} \sum_{i=1}^n v_i^2 = \frac{1}{n-1} ||\mathbf{v}||^2 \\[6pt] \end{align}$$
The law of cosines says that $\mathbf{u} \cdot \mathbf{v} = ||\mathbf{u}|| \ ||\mathbf{v}|| \cos \theta$ where $\theta$ is the angle between the vectors. This allows us to write the (Pearson) sample correlation coefficient as:
$$\rho_{X,Y} = \frac{r_{X,Y}}{s_X s_Y} = \frac{\mathbf{u} \cdot \mathbf{v}}{||\mathbf{u}|| \ ||\mathbf{v}||} = \cos \theta.$$
That is, the sample correlation is equivalent to the cosine of the angle between the vectors of deviations from the sample means for the two samples. Now, we know that $-1 \leqslant \cos \theta \leqslant 1$ for all $\theta \in \mathbb{R}$, so this establishes the range restriction.
Incidentally, this interesting mathematical rule is just a basic aspect of the geometric analysis of random variables. Some broader geometric properties of the linear relationships between multiple variables are examined in O'Neill (2019). In that paper you will see that both the Pearson sample correlation and the broader coefficient of determination are fully determined by the cosines of the angles between the vectors of deviations of the variables of interest. This means that regression analysis (encompassing measurement of Pearson correlation for two variables) can be conceived in relatively simple geometric terms, where key measures of goodness-of-fit are scale free measures based on the cosines of the angles between the vectors of deviations of the variables.
- 124,856
In addition to the other excellent answers, I want to offer an explanation using the Cauchy-Schwarz inequality which states: $$ |\langle \mathbf{u},\mathbf{v}\rangle|^2 \leq \langle \mathbf{u},\mathbf{u}\rangle \cdot \langle \mathbf{v},\mathbf{v}\rangle $$ where $\langle\cdot,\cdot\rangle$ is the inner product. After defining an inner product on the set of random variables using the expected value of their product, $$ \langle X,Y\rangle := \operatorname{E}(XY) $$ the Cauchy-Schwarz inequality becomes $$ |\operatorname{E}(XY)|^{2} \leq \operatorname{E}(X^{2}) \operatorname{E}(Y^{2}). $$ No we apply this to $(X-\mu_{x})$ and $(Y-\mu_{y})$: $$ |\underbrace{\operatorname{E}((X-\mu_{x})(Y-\mu_{y}))}_{= \operatorname{Cov}(XY)}|^{2} \leq \underbrace{\operatorname{E}((X-\mu_{x})^{2})}_{= \operatorname{Var}(X)} \underbrace{\operatorname{E}((Y-\mu_{y})^{2})}_{= \operatorname{Var}(Y)}. $$ Consequently, $$ |\operatorname{Cov}(XY)|^{2}\leq \operatorname{Var}(X)\operatorname{Var}(Y) $$ Taking square roots, we get $$ |\operatorname{Cov}(XY)|\leq \sqrt{\operatorname{Var}(X)\operatorname{Var}(Y)} = \sigma_{x}\sigma_{y} \implies -\sigma_{x}\sigma_{y}\leq \operatorname{Cov}(XY)\leq +\sigma_{x}\sigma_{y}. $$ By dividing by $\sigma_{x}\sigma_{y}$ we get the correlation coefficient $\rho$: $$ -1\leq \underbrace{\frac{\operatorname{Cov}(XY)}{\sigma_{x}\sigma_{y}}}_{=\rho}\leq 1. $$
- 30,198
Quick tip to commit to memory: $\text{Correlation}(X, Y)$ = the expected cosine angle between centered versions of $X$ and $Y$.
Here the centering is done by subtracting off the expected value of each random variable from itself.
- 166
-
This characterization isn't quite correct, and might even be confusing, because the "expected cosine angle" (a) makes sense only for multivariate random variables and (b) because of its nonlinear nature is unlikely to equal the correlation. When one views $X$ and $Y$ as vectors in a function space, such as $L^2(\mathbb R, \mathrm d\lambda),$ then it equals the cosine of their angle -- no expectation is involved. – whuber Sep 18 '23 at 14:35