0

So let's say I have two random variables, $X \sim (0,1)$ and $Y \sim (0,1)$. If I want to make a random variable $Z$, a linear combination of $aX + bY$ such that $\text{Corr}(Y,Z)$ is always equal to 0.20, this would've been my intuition and approach:

$$\text{Corr}(YZ) = \text{Cov}(Y,Z)/ \sqrt{\text{var}(y)* \text{var}(z)}$$

where $\text{var}(y) = \text{var}(z) = 1$ so $\text{Corr}(Y,Z)/1 = \text{Cov}(Y,Z)$.

Then:

$$\begin{array}{ccc} \text{Cov}(Y,Z) &=& \text{Cov}(Y, aX + bY) \\ &=& \text{Cov}(Y, aX) + \text{Cov}(Y, bY) \\ &=& a\text{Cov}(Y,X) + b\text{Cov}(Y,Y) \\ &=& a\text{Cov}(Y,X) + b*Var(Y) \\ &=& a\text{Cov}(Y,X) + b = 0.20 \end{array}$$

First question: So I know $\text{Cov}(Y,X) = \text{Cov}(X,Y)$. Would the covariance between these two variables essentially be 0?

Assuming it's 0, then to answer the original question, we would simply need to multiply Y with 0.20 such that $\text{Corr}(Y,Z) = 0.20$.

I then went on Excel, did a RNG of 25k numbers for X~(0,1), Y~(0,1), Z = X + 0.2Y. When I took the \text{Corr}elation, the results were a bit puzzling:

Cor(X,Y) = 0.7...?

Cor(Y, X + 0.2Y) = 0.2

I probably made a lot of wrong assumptions, can someone help me point out what my misconception is? Is it incorrect to say "simply multiply Y by 0.20 and the resulting relationship of $\text{Corr}(aX + by,y)$ will always be equal to 0.20?

AdamO
  • 62,637
Jimmy
  • 3
  • Could you tell us what "X~(0,1)" means? Is that trying to say $X$ is uniformly distributed on the interval $[0,1]$? If so, note that its variance is much smaller than $1.$ BTW, https://stats.stackexchange.com/questions/15011 answers your underlying question of how to create correlated variables from linear combinations of variables to achieve specified correlations. – whuber Dec 12 '22 at 21:05
  • 1
    Thank you for your response and reference, I'll check that one out. I just meant mean 0, variance 1. The problem doesn't specify if it's a normal/uniform/specific distribution. Used gaussian RNG for the numbers of the 25k samples – Jimmy Dec 12 '22 at 21:12
  • I suspect the numbers you generated in Excel don't have zero mean and unit variance. – whuber Dec 12 '22 at 21:14

1 Answers1

1

I will answer assuming $X$ and $Y$ are from normal distribution with mean $0$ and variance $1$.

Now, $Cov(Y,X) = Cov(X,Y)$ does not imply covariance is zero. This is just a general fact that covariance is symmetric.

Indeed $Cov(X,Y) = E(XY) - E(X)E(Y)$, and clearly this is equal to $Cov(Y,X)$.

If $X$ and $Y$ were independent, it would be sufficient to get $Cov(X,Y) = 0$.

Also, variance of $Z$ is not $1$. Once again assuming independence for convenience, $Var(Z) = Var(aX+bY) = a^2 Var(X) + b^2 Var(Y) = a^2+b^2$.

I think those two were the big things to go over.

Phil
  • 616
  • (+1) This analysis does not require the variables to have Normal distributions. – whuber Dec 12 '22 at 21:15
  • E(XY) - E(X)E(Y) makes sense. I guess as a follow up question, how would we know if X and Y were independent from the problem itself? I know at a basic level P(XY) = P(X)P(Y) but here, all that's specified is that X and Y are random variables. Is the default assumption that these are independent?

    Everything else makes a lot of sense now, thank you

    – Jimmy Dec 12 '22 at 21:50
  • 1
    @Jimmy no we don't know that, but in general, it's impossible to know what $Cov(X,Y)$ is without knowing more about $X$ and $Y$. So I just assumed they are independent. – Phil Dec 12 '22 at 21:54
  • Wrapping all of this up and assuming that X,Y are NOT independent I get that the constraint would come out to (aE(XY) + b)/sqrt(a^2+b^2) = 0.20. How would I solve for E(XY) in this context when there's no PMF/PDF and all I have are arbitrarily random distributions with the variables? If X,Y are independent, then simply comes out to the constraint of b/sqrt(a^2+b^2) = 0.20 by my calculation – Jimmy Dec 12 '22 at 22:37
  • 1
    @Jimmy If $X$ and $Y$ are not independent, just treat Cov(X,Y) as a constant and solve for $a$ and $b$ that makes sense – Phil Dec 12 '22 at 22:40