1

Suppose we have fitted a standard linear model with a categorical variable which has 3 levels A, B and C.

x <- factor(sample(c("A", "B", "C"), 200, replace = TRUE))
y <- rnorm(200)
fit <- lm(y ~ x)

How can we use this model to determine whether the coefficient for level $B$ = coefficient for level $C$?

Shana
  • 251

1 Answers1

1

You can perform two regression, one with separate values for $B$ and $C$, and another in which just observe $x$ being either $B$ or $C$

# individual variables for B as well as C
reg1 = lm(y ~ I(x == "B") + I(x == "C")) 
# either B or C
reg2 = lm(y ~ I(x %in% c("B", "C"))) 

Then you can do an ANOVA analysis (F-Test or chi square) to check whether the p-value suggests to separate $B$ and $C$

# F-Test
anova(reg1, reg2, test = "F")

In your example, the results look as follows

set.seed(1)
x <- factor(sample(c("A", "B", "C"), 200, replace = TRUE))
y <- rnorm(200)

individual variables for B as well as C

reg1 = lm(y ~ I(x == "B") + I(x == "C"))

either B or C

reg2 = lm(y ~ I(x %in% c("B", "C")))

F-Test

anova(reg1, reg2, test = "F")

Results

Analysis of Variance Table

Model 1: y ~ I(x == "B") + I(x == "C") Model 2: y ~ I(x %in% c("B", "C")) Res.Df RSS Df Sum of Sq F Pr(>F) 1 197 200.82
2 198 200.87 -1 -0.051686 0.0507 0.8221

The p-value is far away from any conventional significance levels, suggesting that you can consider $B=C$ (but be aware that conclusions based on p-values are controversial).