Preparation
Using R-Libraries: library(dplyr)
The situation
Data
Given the data
my_data <- mtcars |>
mutate(vs = factor(vs,
levels = c(0, 1),
labels = c('V-shaped', 'straight'))) |>
select(mpg, hp, vs)
which results in
> my_data
mpg hp vs
Mazda RX4 21.0 110 V-shaped
Mazda RX4 Wag 21.0 110 V-shaped
Datsun 710 22.8 93 straight
Hornet 4 Drive 21.4 110 straight
Hornet Sportabout 18.7 175 V-shaped
Valiant 18.1 105 straight
Duster 360 14.3 245 V-shaped
Merc 240D 24.4 62 straight
Merc 230 22.8 95 straight
Merc 280 19.2 123 straight
Merc 280C 17.8 123 straight
Merc 450SE 16.4 180 V-shaped
Merc 450SL 17.3 180 V-shaped
...
Note, that vs has exactly two values, V-shaped and straight.
Regressions
Calculating the estimate of hp for groups V-shaped and straight can be done seperately:
reg_v <- my_data |>
filter(vs == 'V-shaped') |>
lm(mpg ~ hp, data = _)
giving
> summary(reg_v)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.49637 2.42004 10.122 2.32e-08 ***
hp -0.04153 0.01219 -3.408 0.0036 **
and
reg_st <- my_data |>
filter(vs == 'straight') |>
lm(mpg ~ hp, data = _)
giving
> summary(reg_st)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.00055 4.17535 9.341 7.45e-07 ***
hp -0.15810 0.04426 -3.572 0.00384 **
We can also specify an interaction:
reg_inter1 <- my_data |>
lm(mpg ~ hp*vs, data = _)
returning
> summary(reg_inter1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.49637 2.73893 8.944 1.07e-09 ***
hp -0.04153 0.01379 -3.011 0.00547 **
vsstraight 14.50418 4.58160 3.166 0.00371 **
hp:vsstraight -0.11657 0.04130 -2.822 0.00868 **
When recoding the factor variable
reg_inter2 <- my_data |>
mutate(vs = fct_rev(vs)) |>
lm(mpg ~ hp*vs, data = _)
the following is returned
> summary(reg_inter2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.00055 3.67278 10.619 2.52e-11 ***
hp -0.15810 0.03893 -4.061 0.000357 ***
vsV-shaped -14.50418 4.58160 -3.166 0.003713 **
hp:vsV-shaped 0.11657 0.04130 2.822 0.008677 **
It is also possible to use
contrasts(my_data$vs) <- contr.sum(2)
to yield
> contrasts(my_data$vs)
[,1]
V-shaped 1
straight -1
Running the regression
reg_inter3 <- my_data |>
lm(mpg ~ hp*vs, data = _)
then returns
> summary(reg_inter3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.74846 2.29080 13.859 4.64e-14 ***
hp -0.09982 0.02065 -4.833 4.37e-05 ***
vs1 -7.25209 2.29080 -3.166 0.00371 **
hp:vs1 0.05828 0.02065 2.822 0.00868 **
The questions
Question 1
It is correct, that the estimate of hp for both groups can be calculated by deleting the other data, isn't it? Like in reg_v and reg_st. Like mathematically correct.
Question 2
Suppose I only calculated the following regression:
reg_inter1 <- my_data |>
lm(mpg ~ hp*vs, data = _)
returning
> summary(reg_inter1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.49637 2.73893 8.944 1.07e-09 ***
hp -0.04153 0.01379 -3.011 0.00547 **
vsstraight 14.50418 4.58160 3.166 0.00371 **
hp:vsstraight -0.11657 0.04130 -2.822 0.00868 **
Now, one can easily calculate the estimate of hp for the other group (V-shaped) by
-0.04153 + (-0.11657) = -0.15810
How can I calculate the std. error, t value, p value, and possibly CI or other statistics for V-shaped based of this regression?
Question 3 (Mainly a reformulation of question 2 in my eyes)
Given these results
> summary(reg_inter3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.74846 2.29080 13.859 4.64e-14 ***
hp -0.09982 0.02065 -4.833 4.37e-05 ***
vs1 -7.25209 2.29080 -3.166 0.00371 **
hp:vs1 0.05828 0.02065 2.822 0.00868 **
I can state that hp is a significant ($p = 4.37e-05$) predictor with $\beta = -0.09982$ for all cars taken together.
I can also calculate
$$\beta_{V-shaped} = -0.09982 + 1 * 0.05828 = -0.04154$$
and
$$\beta_{straight} = -0.09982 + (-1) * 0.05828 = -0.1581$$
for the two subsamples each.
How do I check if these $\beta$ values are significant? Do I have do run the two additional regressions reg_v and reg_st presented at the very beginning, i.e., delete all cars that are built with the opposite engine?
summary, I am reluctant even to guess what you are trying to ask. Could you clarify? – whuber Nov 08 '23 at 19:17V-shaped=-1,straight=1) instead of dummy coding (V-shaped=0,straight=1). Which would result in two interaction terms in the model? – user1 Nov 08 '23 at 19:50vshas exactly two values,V-shapedandstraight, which is kind of an edge case and not explicitly handled in my sources. Nevertheless, I was always able to compute the different $\beta$ values but not the $p$ values. And after reading everything I am neither. I would really appreciate any help. – user1 Nov 09 '23 at 11:27