8

While playing around with the formula for covariance, I discovered something I wasn't expecting. Replacing the $E[Y]$ in the definition of covariance with an $E[X]$ appears to simplify back down to the covariance via the alternative formula for covariance.

$$\begin{align} E[(X-E[X])(Y-E[X])] &= E[XY -E[X]Y -E[X]X + E[X]E[X]]\\ &= E[XY] - E[E[X]Y]-E[E[X]X] + E[X]^2\\ &= E[XY] -E[X]E[Y] -E[X]^2 + E[X]^2 \\ &= E[XY] - E[X]E[Y] \\ &= COV[X,Y] \\ &= E[(X-E[X])(Y-E[Y])] \end{align}$$

Is this a well known manipulation, and is there a more meaningful way to explain it than pure algebraic derivation?

One would think measuring the "center" of $Y$ "incorrectly" would impact the covariance, but it changes nothing when the center is artificially moved to $E[X]$. What gives?

  • 7
    What is going on is that for any mean $0$ random variable $Z$ (e.g., $Z = X - E(X)$), we have $E[Z (Y + a)] = E[Z (Y + b)]$, for any constants $a$ and $b$. In particular, $a = -E(Y)$ and $b = -E(X)$, so there isn't really anything special about those particular choices. Not sure if there is anything much deeper than that. – guy Sep 27 '23 at 00:55
  • By this logic, isn't all covariance zero? Sorry if this is a silly question. – amonaether Sep 27 '23 at 03:33
  • 1
    It is exceptionally well known and a FAQ, because it reduces to the fact that $E(X-E(X))=E(X)-E(X)=0.$ It's also an immediate consequence of the characterization of covariance at https://stats.stackexchange.com/a/18200/919, because there it's clear that the origin of your coordinate system doesn't matter -- and all you're doing is shifting that origin. – whuber Sep 27 '23 at 12:12
  • @amonaether I don't see how that logic implies all covariances are zero. It implies that if $Z$ is mean $0$ then the covariance between $Z$ and $Y$ is $E(ZY)$, which is true. – guy Sep 27 '23 at 14:13

2 Answers2

1

Wow this got me thinking for awhile. So I tried some different manipulations to see where things goes:

It seems you can get to the covariance in many ways but I notice the pattern that at least one of the variables must be centered properly while the other can be shifted by either it's center, the other's center, arbitrary constant, or none at all. $$ \begin{align} E[(X-E(X))(Y-E(X)]&=E[XY]-E[X]E[Y]\\ E[(X-E(Y))(Y-E(Y)]&=E[XY]-E[X]E[Y]\\ E[(X-E(X))Y]&=E[XY]-E[X]E[Y]\\ E[X(Y-E(Y))]&=E[XY]-E[X]E[Y]\end{align} $$ also just tried what happens if there isn't any proper centering at all and it doesn't work $$ E[(X-E(Y))(Y-E(X)]\neq E[XY]-E[X]E[Y] $$ Intuitively, covariance is the measure of the direction of the relationship of two variables. Hence the need for centering to help see the direction. If values go below the mean (going negative) is paired with smaller values then the overall sum would be positive direction because of the smaller negative values. Then the other way around if it's a negative direction. And it seems the only requirement is at least centering one of them.

I tried to see it numerically as well and it seems true

X=rnorm(20,10,2)
Y=2*X+rnorm(20)

#Y: other's center sum=0 for(x in 1:20){ sum=sum+(X[x]-mean(X))*(Y[x]-mean(X)) } #sum=160.7643

#Y: no centering sum=0 for(x in 1:20){ sum=sum+(X[x]-mean(X))*(Y[x]) } #sum=160.7643

#Y: arbitrary shifting sum=0 for(x in 1:20){ sum=sum+(X[x]-mean(X))*(Y[x]-10230) } #sum=160.7643

Derf
  • 281
  • 1
    Your last "arbitrary shifting" numerical example is off but only because of finite machine precision of double floating point numbers in R. Your two inequalities are also false. – Jarle Tufto Sep 27 '23 at 10:06
  • @JarleTufto Thanks for the comment. May I ask how come the first inequality is false? I guess I'm misunderstanding how to use inequalities or my algebra is bad. Do you mean that in the first inequality LHS=RHS? – Derf Sep 27 '23 at 12:16
  • @JarleTufto Oh I didn't notice that about the numerical example. Thank you for the clarification about the machine precision. Looks like the result is the same even after shifting if I use a smaller "arbitrary" number – Derf Sep 27 '23 at 12:19
  • 3
    $$ \begin{align} E((X-EX)(Y-c))&=E((X-EX)(Y-EY+EY-c)) \ &=E((X-EX)(Y-EY))+E((X-EX)(EY-c)) \ &=\operatorname{Cov}(X,Y)+(EY-c)E(X-EX)\ &=\operatorname{Cov}(X,Y) \end{align} $$ – Jarle Tufto Sep 27 '23 at 12:37
  • 1
    @JarleTufto wow big thanks for the help with the solution. I didn't notice $E(X-EX)=0$ would play a huge part. Now I'm trying the first inequality if I could do it the same approach as well – Derf Sep 27 '23 at 12:47
  • @Jarle Tufto's proof satisfied my question. Can I accept a comment as an answer? – amonaether Sep 27 '23 at 15:50
  • for future reference as well, should I delete my answer if a comment either proves it wrong or provides the correct answer? or do I leave it as is for readers? – Derf Sep 27 '23 at 16:22
1

Like @JarleTufto pointed out, adding zero allows us to express the form $E[(X-c_X)(Y-c_Y)]$ as covariance plus some other terms. Add zero: $$E[(X-E[X]+E[X]-c_X)(Y-E[Y]+E[Y]-c_Y)]$$ $$E[((X-E[X])+(E[X]-c_X))((Y-E[Y])+(E[Y]-c_Y))]$$ Expand: $$E[(X-E[X])(Y-E[Y])+(E[X]-c_X)(Y-E[Y])+(E[Y]-c_Y)(X-E[X])+(E[X]-c_X)(E[Y]-c_Y)]$$ Addition Rule: $$E[(X-E[X])(Y-E[Y])]+E[(E[X]-c_X)(Y-E[Y])+(E[Y]-c_Y)(X-E[X])]+E[(E[X]-c_X)(E[Y]-c_Y)]$$ Of course $Y-E[Y]$ and $X-E[X]$ have expected values of $0$, so the middle term evaluates to zero, and our expression becomes: $$COV[X,Y]+0+(E[X]-c_X)(E[Y]-c_Y)$$ If $c_X=E[X]$ then the whole thing simplifies to basic covariance again.

Mis-measuring the centers of the distributions by $E[X]-c_X$ and $E[Y]-c_Y$ results in the covariance plus an error term (both errors multiplied together). If one of the errors is zero, then you get covariance again.