11

Say I estimate the following multivariate linear regression $$ y = \beta_0 +\beta_1 x_1 +\beta_2 x_2+\beta_3x_3+\beta_4x_4 + \epsilon$$ How can I test that $\beta_1=\beta_2=\beta_3$?

I know that to test if $\beta_1=\beta_2$ you can simply construct a $Z$ test with $$ Z = \frac{\beta_1-\beta_2}{\sqrt{se_{\beta_1}^2+se_{\beta_2}^2}}$$

Is there an analogue for multiple coefficient estimates?

Daria
  • 123
  • 5
    The test for equality of $\beta_1$ and $\beta_2$ implicitly assumes the estimates of the $\beta_i$ are uncorrelated. In general it will be incorrect; the denominator needs to include a term for their covariance. – whuber Nov 16 '17 at 15:32
  • 1
    If your X variables are in different units, then the beta coefficients are also in different units. In that case, I don’t see how it would make sense to compare them. – Harvey Motulsky Nov 21 '17 at 19:17

2 Answers2

12

You can use the $F$ test to test any linear restrictions $L$ on your coefficients.

Let your null hypothesis be $H_0:L\beta = c$ and your design matrix $X$ with rank $k$. Then the $F$ statistic will be:

$$ F = \frac{(L\hat{\beta}- c)'(\hat{\sigma}^2L(X'X)^{-1}L')^{-1}(L\hat{\beta} - c)}{q} $$

where $q$ is the number of restrictions you are testing. Under the null this will have an $F$ distribution with degrees of freedom $q$ and $n-k$.

In R you can easily do that with the function linearHypothesis of the car package. For example:

library(car) 
lm.model <- lm(mtcars)
linearHypothesis(lm.model, c("cyl = 0", "disp = 0", "hp = 0")) # all 3 zero
linearHypothesis(lm.model, c("cyl = disp", "disp = hp")) # all 3 equal
0

Instead of the nice matrix algebra elaboration Carlos presented and testing R's linearHypothesis, such a test can also be done in a more old fashioned way.

First the F value must be calculated by hand. This can be done after running two models, model A or the "full" model and model B, the "restricted" model.

model A: $y = b_0 + b_1x_1 + b_2x_2 + b_3x_3+b_4x_4$
model B: $y = b_0 + b_1x_1 + b_1x_2 + b_1x_3+b_4x_4$

In model B the regr. coefficients of the first three predictors are equal, as the hypothesis assumes. The model can also be written as:

model B: $y = b_0 + b_1(x_1 + x_2 + x_3) + x_4$

having the $sum = x_1 + x_2 + x_3$ as independent variable. So, model B can be estimated as follows:

model B: $y = b_0 + b_1sum + b_4x_4$

The F value can now be calculated by filling in the R-squares of model A and B in the formula below:

enter image description here

df1 = difference in nr. of regr. coeff. between model A and B = 5 - 3 = 2
df2 = nr. of cases - nr. of regr. coeff. in model A = N - 5

Next, the F value can be looked up in a table with tail probabilities of F distributions. This is really old school, I admit, but it should be possible with any software package able to run linear regression.

Of course, in R we would apply Carlos' method. One could also run the two models above and next run anova(modelA, modelB).

***** SPSS regression method.

Procedure "regression" in SPSS only allows users to drop several predictors at once from a linear model equation, but not to equate regression coefficients. However, we can write model A in a different way to obtain model A_new:

Model A_new: $y = b_0 + b_1(x_1 + x_2 + x_3) + b_2x_2 + b_3x_3 + b_4x_4$

Model A_new has the same R square as model A! Dropping the independents $x_2$ and $x_3$ from model A_new renders model B. The R square change when going from model A_new to model B can be tested as follows:

regression  
   /dependent y
   /enter sum x4
   /test (x2 x3)
BenP
  • 1,124