1

I know that the covariance of two random variables, such as X and Y, is calculated as follows:

$$ Cov(X, Y) = \frac{\sum{(X - \bar{X})(Y-\bar{Y})}}{n} $$

where $n$ is the size of the sample and $\bar{X}$ and $\bar{Y}$ are the means of X and Y, respectively. What I do not understand is how this formula measures the dependence or the correlation between these two variables. In other words, What does this formula have to do with the dependence between X and Y?

  • You've missed out a summation sign. Otherwise, an answer is that covariance only indicates correlation when it is scaled by the product of the standard deviations. See any good introductory text, such as that by Freedman, Pisani and Purves. – Nick Cox Apr 06 '23 at 00:20
  • 1
    $\text{Cov}(X, Y) = \mathbb E[{(X - \bar{X})(Y-\bar{Y})}]$ while $\text{Cor}(X, Y) = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}$ to be scale free – Henry Apr 06 '23 at 00:35
  • 1
    https://stats.stackexchange.com/questions/229667/difference-between-correlation-and-covariance-is-covariance-only-useful-if-the and the links from it may be useful – Henry Apr 06 '23 at 00:38

1 Answers1

1

The formula below would be the standard way to write covariances at the sample level.

$$ Cov(X, Y) = \overset{N}{\underset{i=1}{\sum}}\left(\dfrac{(X_i - \bar{X})(Y_i-\bar{Y})}{n - 1} \right)$$

Each term in the summation measures if an observation of $X$ is above or below (or equal to) the mean of $X$ and if the corresponding observation of $Y$ is above or below (or equal to). These are then multiplied. If this product is positive, it means that both $X_i$ and $Y_i$ are either above or below their respective means; if this product is negative, one is above while the other is below its respective mean.

If you wind up with many positive products in the numerator, this means that the $X$ and $Y$ variables tend to be above or below their respective means simultaneously. This means that, when one variable is high, so is the other, and when one variable is low, so is the other. Conversely, if you wind up with many negative products, this means that when one variable is low, the other tends to be high, and when one variable is high, the other starts to be low.

When so many of the numerator products are positive that they wash out negative products, the sum is positive (positive covariance). This means that there is a relationship between the variables in that both tend to be high simultaneously or low simultaneously.

When so many of the numerator products are negative that they wash out the positive products, the sum is negative (negative covariance). This means that there is a relationship between the variables in that, when one is high, the other is low.

Dave
  • 62,186