I am trying to understand how to arrive at $r = \dfrac{Cov(X,Y)}{\sigma_X\sigma_Y}$ with a logical narrative. This in fact is kind of continuation from my this unanswered question.
I see that by standardizing the X and Y, the resultant regression line contains $r$ as the slope. But I have to reason that out why should I do that. This is my current narrative.
My narrative:
- Covariance is given by below equation which implicitly states its symmetrical nature.
$$ Cov(X,Y) = \sum_x\sum_y(x-\overline{x})(y - \overline{y})p(x,y) = Cov(Y,X) \tag{1} $$
So X covaries with Y as much as Y with X as per above measure.
- However, simple regression lines are not symmetric.
$$ \hat{Y}|x = \hat{\beta_0} + \hat{\beta_1}x \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_1} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(x_i - \overline{x})^2} \ \ , \ \ \hat{\beta_0} = \overline{y} - \hat{\beta_1}\overline{x} \\ \hat{X}|y = \hat{\beta_2} + \hat{\beta_3}y \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_3} = \dfrac{\sum_i(y_i - \overline{y})(x_i - \overline{x}) }{\sum_i(y_i - \overline{y})^2} \ \ , \ \ \hat{\beta_2} = \overline{x} - \hat{\beta_2}\overline{y} \tag{2} $$
Thus, $\hat{\beta_1} \neq \hat{\beta_2}$.
Given the disadvantage of Covariance being critically dependent on units making it unsuitable to compare different pairs of RVs (or events), we seek a standard measure like Covariance but unitless.
Now by standardizing X and Y, we get new regression lines $\hat{Y}|x, \hat{X}|y$ where the x and y intercepts are zero, and both lines have equal slope which is unitless. That is,
If I do a full standardization on the sample set,
$$ X_s = \dfrac{X - \overline{X}}{s_X} \ \ , \ \ Y_s = \dfrac{Y - \overline{Y}}{s_Y} $$
we get, with new standardized sample set (i.e $x,y$ now represent new sample set)
$$ \hat{Y_s}|x_s = 0 + \hat{\beta_{1s}}x_s \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_{1s}} = \dfrac{\sum_i(y_{is} - \overline{y_s})(x_{is} - \overline{x_s}) }{\sum_i(x_{is} - \overline{x_s})^2} \ \ \ \ \\ \hat{X_s}|y_s = 0 + \hat{\beta_{3s}}y_s \ \ , \ \ \text{where} \ \ \ \ \hat{\beta_{3s}} = \dfrac{\sum_i(y_{is} - \overline{y_s})(x_{is} - \overline{x_s}) }{\sum_i(y_{is} - \overline{y_s})^2} \ \ \ \ \tag{3} $$
results in
$$ r = \hat{\beta_{1s}} = \hat{\beta_{3s}} \tag{4} $$
that is, the regression lines are symmetric to each other.
- In fact, reversing this procedure to non standardized raw X and Y, we could say their regression lines have relation with correlation as below
$$ r = \hat{\beta_1}\dfrac{s_X}{s_Y} = \hat{\beta_3}\dfrac{s_Y}{s_X} \tag{5} $$
My questions:
1. Is my above narrative correct and minimally complete? What went wrong? What could be added? How could I improvise?
2. I see, Galton discovered regression via a bivariate normal distribution link . How did we then generalized it to any or random distribution?
3. Also a perfect linearity would mean, underlying distribution is bivariate normal?
4. After this narrative, how could I prove this sample $r$ applies to population $\rho$ also?
5. I hope to see final $r$ equalling cosine product of standardized dot product also. That is,
$$ r = cos\theta = \dfrac{(x - \overline{x})\bullet(y - \overline{y})}{\lvert x - \overline{x} \rvert \lvert y - \overline{y} \rvert} \tag{6} $$
Then, what would unstandardized dot product refer to or how related to non standardized equation set (2)? That is $$ cos\theta = \dfrac{x\bullet y}{\lvert x \rvert \lvert y \rvert} = ? \tag{7} $$
more likelyinfo is coming from the pmf we multiply with. of course that makes it volume, but since initially we assume uniform joint distribution, its just scaled effect, so area having info. once we have non uniform joint pmf, volume comes in picture ( I have illustrated that also in an example 1.8 at end of Covariance chapter) – Parthiban Rajendran Nov 13 '18 at 04:54