9

I am trying to derive the covariance of two sample means and get confused at one point. Given is a sample of size $n$ with paired dependent observations $x_i$ and $y_i$ as realizations of RVs $X$ and $Y$ and sample means $\bar{x}$ and $\bar{y}$. I try to derive $cov(\bar{x},\bar{y})$.

I am relatively sure the result should be

$$cov(\bar{x},\bar{y})=\frac{1}{n}cov(X,Y)$$

However I arrive at

$$cov(\bar{x},\bar{y})=E(\bar{x}\bar{y})-\mu_x\mu_y = E\left(\frac{1}{n^2}\sum x_i \sum y_i\right) -\mu_x\mu_y =\frac{1}{n^2} n^2 E(x_i y_i) -\mu_x\mu_y=cov(X,Y)$$

I used

$$E\left(\frac{1}{n^2}\sum x_i \sum y_i\right)=\frac{1}{n^2} E\left(x_1y_1+x_2y_1+\cdots + x_ny_n\right)=\frac{1}{n^2} n^2 E(x_iy_i)$$

Somewhere should be a flaw in my thinking.

User1865345
  • 8,202
tomka
  • 6,572
  • 1
    I think your reasoning is essentially correct: http://stats.stackexchange.com/questions/59546/estimating-the-covariance-of-the-means-from-two-samples, that is, $\mathrm{cov}(\bar{x},\bar{y}) = \mathrm{cov}(X,Y)$ – sandris Jul 28 '15 at 16:53
  • So the difference is the assumption about covariances in paired and independent samples. The upper result is that for paired samples, the lower that for independent samples, where $E(x_iy_j)=E(x_i)E(y_j)$ when $i \ne j$ – tomka Jul 28 '15 at 17:00
  • 5
    If you are comfortable with deriving the fact that the variance of the sample mean is $1/n$ times the variance, then the result is immediate because covariances are variances. As far as your mistake goes, note that $\text{cov}(x_i,y_j)=0$ for $i\ne j$. It also helps to know that whenever you are working with covariances or variances you may always assume the means are zero, because these are central moments that don't depend on the means at all. – whuber Jul 28 '15 at 17:31
  • What I do not yet fully understand is why it holds that $cov(x_i,y_j)=0$ for $i≠j$ when I have paired samples, but it does not hold when I have independent samples (?). Can you explain? – tomka Jul 28 '15 at 19:45
  • 3
    Your use of the term "sample" implicitly means $(x_i,y_i)$ is independent of $(x_j,y_j)$ for $i\ne j$. From this it is immediate that their covariances (if they exist) must be zero. – whuber Jul 28 '15 at 20:44

2 Answers2

14

Covariance is a bilinear function meaning that $$ \operatorname{cov}\left(\sum_{i=1}^n a_iC_i, \sum_{j=1}^m b_jD_j\right) = \sum_{i=1}^n \sum_{j=1}^m a_i b_j\operatorname{cov}(C_i,D_j).$$ There is no need to mess with means etc.

Applying this to the question of the covariance of the sample means of $n$ independent paired samples $(X_i, Y_i)$ (note: the pairs are independent bivariate random variables; we are not claiming that $X_i$ and $Y_i$ are independent random variables), we have that \begin{align} \operatorname{cov}\left(\bar{X},\bar{Y}\right) &= \operatorname{cov}\left(\frac{1}{n}\sum_{i=1}^n X_i, \frac 1n\sum_{j=1}^n Y_j\right)\\ &= \frac{1}{n^2}\sum_{i=1}^n \sum_{j=1}^n \operatorname{cov} (X_i, Y_j)\\ &= \frac{1}{n^2}\sum_{i=1}^n \operatorname{cov} (X_i, Y_i) &\scriptstyle{\text{since $X_i$ and $Y_j$ are independent, and thus uncorrelated, for $i \neq j$}}\\ &= \frac 1n\operatorname{cov} (X, Y) \end{align}


As noted below in a comment by flow2k, although $\operatorname{cov}(\bar{X},\bar{Y})$ is smaller than $\operatorname{cov}({X},{Y})$ by a factor of $n$, the (Pearson) correlation coefficients are the same: $\rho_{\bar{X},\bar{Y}} = \rho_{X,Y}$ !! Previously I had never given the correlation coefficients any thought at all.

Dilip Sarwate
  • 46,658
  • I think they are $n^2$ terms, but $n(n-1)$ cancle with $\mu_x\mu_y$ due to independence. – tomka Jul 28 '15 at 18:56
  • The quoted section above "Covariance is a bilinear function..." - where is this quoted from? – flow2k Mar 12 '23 at 01:12
  • 1
    @flow2k The first "quoted" paragraph of my answer is not specifically a quotation in the sense that I wrote it myself without looking at a textbook or paper etc while doing so, but the first sentence (possibly in exactly the same words) can be found in many textbooks. The second sentence of the "quoted" paragraph is proudly my own words; textbook writers (or their copyeditors) and journal paper writers and journal editors don't use such informal language. – Dilip Sarwate Mar 12 '23 at 02:54
  • Thanks. I think it's interesting the correlation coefficient of the sample means remains unchanged. – flow2k Mar 12 '23 at 09:02
  • I had never thought about correlation coefficients at all! I will incorporate this information into my answer (with credit to you). – Dilip Sarwate Mar 17 '23 at 03:39
6

I think the algebra issue is resolved with the following:

\begin{align}{1 \over n^2}E\left(\sum_{i=1}^n x_i \sum_{i=1}^n y_i\right)&={1 \over n^2}E\left(\sum_{i=1}^n x_i y_i +\sum_{i\ne j}x_i y_j\right)\\&={1 \over n^2}(n(Cov(x_i,y_i)+\mu_X \mu_Y)+n(n-1)\mu_X \mu_Y)\\&={1 \over n^2}(n Cov(x_i,y_i)+n^2 \mu_X \mu_Y))\\&=Cov(x_i,y_i)/n+ \mu_X \mu_Y\end{align}

User1865345
  • 8,202
JimB
  • 3,734
  • 11
  • 20