Could you give me the demonstration with all algebraic steps of the relationship $$F=(R^2/(p-1))/((1-R^2)/(n-p))?$$
-
Isn't this just a definition? What kind of demonstration are you seeking? – whuber Dec 04 '22 at 19:31
-
1@whuber, as the result is only true if we test whether all slope coefficients are zero, I would argue it is not a definition (at least, not a very general one). – Christoph Hanck Dec 05 '22 at 08:09
1 Answers
I will consider the F-test for testing a restriction of type \begin{equation}\label{multrestriction} H_0:\beta_2=0 \end{equation} in \begin{equation}\label{FTestregfull} y=X_1\beta_1+X_2\beta_2+u \end{equation} Further, I will depart from the expression for $F$-statistic given by (which itself could be derived from other ways of expressing the F-statistic, see e.g. Proof that F-statistic follows F-distribution) \begin{equation}\label{FUSSRRSSR} F_{\beta_2}=\frac{(\text{RSSR}-\text{USSR})/r}{\text{USSR}/(n-p)} \end{equation} with
- $r$ the dimension of $X_2$,
- $p$ the total number of regressors
- $\text{USSR}=y'M_{X}y$ the unrestricted sum of squared residuals of the unrestricted regression on both $X_1$ and $X_2$,
- $\text{RSSR}=y'M_{X_1}y$ the restricted sum of squared residuals of a regression of $y$ on $X_1$, i.e. with $H_0$ imposed.
If $X_1=\iota$, i.e., if we test if all slope coefficients are zero and the only included regressor under the null is a constant, we have $r=p-1$ and \begin{equation}\label{R2RSSR} \text{RSSR}=y'M_{\iota}y \end{equation} We may then, using this relationship, express $R^2$ as $$ R^2=1-\frac{\text{USSR}}{\text{RSSR}} $$ or $$ \text{USSR}=(1-R^2)\text{RSSR} $$ Hence, $$ \frac{\text{RSSR}-\text{USSR}}{\text{USSR}}=\frac{R^2}{1-R^2} $$ Little numerical illustration (also relating to my comment below the OP's question:
library(lmtest)
n <- 50
y <- rnorm(n)
X1 <- rnorm(n)
X2 <- rnorm(n)
reg <- lm(y~X1+X2)
Rsq <- summary(reg)$r.squared
USSR <- sum(resid(reg)^2)
RSSR <- sum((y-mean(y))^2)
> # all the same
> (Fstat <- waldtest(reg, test="F")$F[2])
[1] 2.169655
> (Fstat.R2 <- Rsq/(1-Rsq)*(n-3)/2)
[1] 2.169655
> (F.stat.SSR <- (RSSR-USSR)/USSR*(n-3)/2)
[1] 2.169655
> # not all the same:
> regX1 <- lm(y~X1)
> (Fstat <- waldtest(reg, regX1, test="F")$F[2])
[1] 0.4613734
> RSSR <- sum(resid(regX1)^2)
> (Fstat.R2 <- Rsq/(1-Rsq)*(n-3)/1)
[1] 4.33931
> (F.stat.SSR <- (RSSR-USSR)/USSR*(n-3)/1)
[1] 0.4613734
- 33,180