When does a p-value (standard error) of a linear model coefficient decrease with increasing levels of categorical predictor variable? Why this happens? I fail to see how collinearity and/or regressing residuals on one of the variables [1] is relevant for contrasts of categorical predictors.
Suppose we have:
dat <- data.frame(
label = rep(LETTERS[1:3], each=4),
value = c(
1, 0.96, 0.96, 1.03, # A
0.74, 0.45, 0.01, 0.89, # B
1.00, 1.02, 1.04, 1.06 # C
)
)
Notice that one value is suspicious for level B. In any case, the overall mean of group B seems slightly lower, too. Now, then:
round(coef(summary(lm(value ~ label, dat[1:8, ]))), 3) # levels A and B
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.988 0.137 7.182 0.000
# labelB -0.465 0.194 -2.391 0.054
round(coef(summary(lm(value ~ label, dat))), 3) # levels A, B, and C
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.988 0.113 8.777 0.000
labelB -0.465 0.159 -2.922 0.017
labelC 0.042 0.159 0.267 0.795
The p-value for level B has decreased 3x for the second case. If anything, I would expect it to increase, e.g. to counteract the inflation of family-wise error rate due to multiple testing.
I can't wrap my head around that if A were my baseline condition (control), without going bayesian, I could indefinitely increase my confidence in treatment B to have an effect by just testing more treatments (C, D, E, ..., Z) without ever increasing the sample size in group A nor B.
lmassumes homogeneity of variance. That seems to be the essence of it. 2) Perhaps you also mention that if the homogeneity of variance assumption is not met, one could consult post1 and post2. – Vallo Varik Mar 10 '21 at 17:30