0

Suppose we consider the InsectSprays data frame in R, and we create a linear model lm(count ~ spray, data = InsectSprays. Looking at the regression table, we get standard errors and t-values for each possible level of the spray values. I'm wondering how these are computed.

I considered an $F$-test comparing a model fit without a specific level against the model with all levels, but I didn't get the same values.

Here is an example where I tried removing the sprayC level and comparing the models; sprayA is used as the baseline level for each.

lm.full<- lm(count ~ I(1*(spray == 'B')) +
               I(1*(spray == 'C')) + I(1*(spray == 'D')) +
               I(1*(spray == 'E')) + I(1*(spray == 'F')),
           data = InsectSprays)

lm.noC<-lm(count ~I(1(spray == 'B')) + I(1(spray == 'D')) + I(1(spray == 'E')) + I(1(spray == 'F')), data = InsectSprays)

summary(lm.full)$coef anova(lm.noC, lm.full)$F

I also tried comparing the test-statistic values to a two-sample t-test of sprayC against sprayA, and this also didn't yield the same values:

sprayAcounts<-InsectSprays[InsectSprays$spray=="A","count"]
sprayBcounts<-InsectSprays[InsectSprays$spray=="C","count"]

t.test(sprayAcounts, sprayBcounts, var.equal = TRUE)

How are these test-statistics and standard errors computed?

0 Answers0