Is there an intuitive explanation for why $R^2 = \hat{\beta_1} * \hat{\alpha_1}$

Question

In simple linear regression with one regressor, if you regress $y$ on $x$, i.e., $\hat{y} = \hat{\beta}_1 x + \hat{\beta}_0$ and $x$ on $y$, i.e., $\hat{x} = \hat{\alpha_1} y + \hat{\alpha_0}$, you can show mathematically that

$$ R^2 = \hat{\beta}_1 \hat{\alpha}_1 $$

Is there an intuitive explanation for how the coefficient of determination is the product of slopes?

See our wiki on correlation coefficients for many possibilities. — whuber, Nov 18 '23 at 16:22

Nick Cox · Answer 1 · 2023-11-18T02:43:46.033

6

One slope is corr$(y, x)$ SD$(y)$ / SD$(x)$ and the other is corr$(x, y)$ SD$(x)$ / SD$(y)$. But corr$(y, x)$ = corr$(x, y)$ and the other ratios cancel on multiplication. So the product of the slopes reduces to corr$^2(y, x)$, which is $R^2$.

That is only intuitive if you start knowing the ingredients already.

A really good explanation of correlation which gives that knowledge is in the text Statistics of Freedman, Pisani and Purves (one edition cited Adhikari as co-author). Any edition will do from 1978 to 2007.

edited Nov 18 '23 at 02:43

answered Nov 18 '23 at 02:41

Nick Cox

56,404
8
127
185

Yes I understand how this is derived mathematically, but intuitively I don't understand this – user5965026 Nov 18 '23 at 02:43
2

Sorry it doesn't help you. When I hear the word intuitive I translate as familiar and that usually fits. "You will find the GUI intuitive." means that you've used essentially similar programs before, so it will not seem strange. Appeals to intuition in mathematics usually boil down to the same thing at some level or another., as in claims that something follows immediately from some inequality, theorem, whatever. – Nick Cox Nov 18 '23 at 02:48
Notice they do x versus y and y versus x. I bet if you took the original data and minimized the SUM of the squared differences in each model, then the model fit from that minimization would be the same as that resulting from doing partial least squares (PLS) . So, it's really somehow the partial least squares minimization that causes the strange slope relation. Note that it's not minimizing x distance squared or y distance squared but rather the euclidean distance. That causes the strange slope relation because the two slopes are simultaneously related to model fit. – mlofton Nov 18 '23 at 05:08
Notice that above of course does not give intuition but maybe someone can go in that direction and it's lying there somewhere. – mlofton Nov 18 '23 at 05:12
One more question even though I'm not even certain regarding above. How is $R^2$ defined ? – mlofton Nov 18 '23 at 05:21
Just to clarify my attempt at a start to intuition: That slope relation arises because, without it, one slope wouldn't know what the other slope was doing. Therefore, one could achieve an equivalent model fit ( without the constraint ) by just minimizing the objective function: $(\sum_{i=1}^{n} (\hat{x}{i} - x{i})^2 + (\sum_{i=1}^{n} (\hat{y}{i} - y{i})^2)$ directly. But that's the same thing as doing PLS, I think. – mlofton Nov 18 '23 at 05:36
I called it partial least squares in my first comment but that's a mistake. What I was referring to is actually called total least squares. Let me see if I can find some document explaining it. This looks okay. https://igppweb.ucsd.edu/~cathy/Classes/SIO223A/sio223a.chap8.pdf – mlofton Nov 18 '23 at 06:27
@mlofton Usually, $R^2$ is defined as in formula (5) at https://stats.stackexchange.com/a/104577/919: namely, a ratio of "variance explained" to "total variance" (of the response). – whuber Dec 13 '23 at 17:52
@whuber: Yes, that's the standard formula. But, according to the OP, the regression is being done A) where x is the response and y is the independent variable and then B) vice versa. The two resulting $R^2$ values won't be the same in the two cases. So, I'm not sure what is done. Maybe the average is taken ? It probably doesn't matter. My only point was the product of the slopes might have something to do with total least squares. – mlofton Dec 14 '23 at 08:32
1

"usually" depends on the eye of the beholder and in any case I don't think that the merits of different definitions are best decided by number of references. The question presumes that the two different regressions yield the same $R^2$, which is true if $R^2$ is the square of the correlation between variables, or equivalently here between observed and predicted, i.e. $R = {\text {corr}}(x, y) = {\text {corr}}(\hat x, y) = {\text {corr}}(x, \hat y)$. – Nick Cox Dec 14 '23 at 09:38

Alecos Papadopoulos · Answer 2 · 2023-11-18T13:31:43.577

The nice (and dangerous) thing about "gaining intuition" is that one does not have to be mathematically strict or even "correct" (that was a warning for what is to follow, in case it didn't register as such).

Part I.

We are looking at two series of numbers $\{y,x\}$, we have a computational algorithm that is called ``ordinary least squares", and using this method we obtain an estimate of the one based on the other and vice versa:

$$\hat{y} = \hat{\beta}_1 x + \hat{\beta}_0,\qquad \hat{x} = \hat{\alpha_1} y + \hat{\alpha_0}.$$

We also define the ``coefficient of determination", $\equiv R^2$, as $$R^2_{y|x} = \frac{n^{-1}\sum(\hat y-\bar {\hat y}))^2}{n^{-1}\sum(y - \bar y)^2}.$$

The bar denotes the arithmetic average. In statistical terminology, this is a ratio of "sample variances". The variance is verbally described as the ``average squared deviation from the mean" (the arithmetic average in our case), "average squared variation around the mean"...

How could we describe the ``variance" without using statistical terminology?

The variance is a measure of how much we do NOT behave as a constant, the constant being our arithmetic average.

And the defining property of a constant is that it doesn't change. So we managed to connect the concept of "variance/variation around" with the concept of "change in relation to".

A mathematical concept and symbol related to change is the differential. And here we would want to think of the average differential. Conjuring the symbol $d_A$ to represent this concept of "average differential" we can then write the quotient

$$R^2_{y|x} = \frac{[d_A(\hat y - \bar {\hat y})]^2 }{[d_A(y - \bar y)]^2} = \frac{[d_A(\hat y)]^2 }{[d_A(y)]^2}.$$

The reason for the simplification is that the arithmetic average is a constant so it has zero average differential.

We can write the same for the other relation,

$$R^2_{x|y} = \frac{[d_A(\hat x)]^2 }{[d_A(x)]^2}.$$

A first hurdle to surpass intuitively is the known fact that $R^2_{y|x} = R^2_{x|y}$, i.e. that the two coefficients of determination are the same. But that intuition is not my goal here. So we have

$$R^2_{y|x} = R^2_{x|y} = R^2_{y,x}= \implies \frac{[d_A(\hat y)]^2 }{[d_A(y)]^2} = \frac{[d_A(\hat x)]^2 }{[d_A(x)]^2} \implies \frac{d_A(\hat y) }{d_A(y)} = \frac{d_A(\hat x) }{d_A(x)}.$$

But this means that we can write $$R^2_{y,x} = \frac{d_A(\hat y) }{d_A(y)} \cdot \frac{d_A(\hat x) }{d_A(x)}.$$

Part II.

When we want to talk and write about the estimated coefficients $\hat \beta_1$ and $\hat \alpha_1$ in a fancy way, we say "the derivative of" and we often write $$\frac {d\hat y}{d x} = \hat \beta_1,\qquad \frac {d\hat x}{d y} = \hat \alpha_1.$$

But in reality, we know that $\hat \beta_1$ is some average measure of "change in $\hat y$ as $x$ changes" (and likewise for $\hat \alpha_1$). So perhaps we should use our symbol $d_A$ and write $$\frac {d_A(\hat y)}{d_A (x)} = \hat \beta_1,\qquad \frac {d_A(\hat x)}{d_A(y)} = \hat \alpha_1.$$ $$\implies \hat \beta_1 \cdot \hat \alpha_1 = \qquad \frac {d_A(\hat y)}{d_A (x)} \cdot \frac {d_A(\hat x)}{d _A(y)}.$$

Now, we just moved away from "infinitesimal changes", which faciliates even more the commitment of Leibniz's Cardinal Sin: treat these "average derivatives" as quotients of differentials, which allows you to swap the denominators and arrive at

$$\hat \beta_1 \cdot \hat \alpha_1 = \qquad \frac {d_A(\hat y)}{d_A (y)} \cdot \frac {d_A(\hat x)}{d_A (x)}.$$

The conclusion of Part II is identical to the conclusion of Part I.

Did I provide any kind of intuition here? That's for others to say. I just note that these conclusions can now be described in terms of "ratios of average changes" without using more technical mathematical or statistical terminology, like variance, standard deviation, correlation, etc.

Is there an intuitive explanation for why $R^2 = \hat{\beta_1} * \hat{\alpha_1}$

2 Answers2