I need to compare results from data with different residual DF as my $x$ variable has different levels. The following is just an example (in R, for demonstration purpose, but this is not a R question):
# first case
set.seed (123)
data1 <- data.frame (y = rnorm (100, 5, 2),
x = sample (c("A", "B"), 100, replace = T))
anova(lm(y~ x, data = data1))
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x 1 2.07 2.0669 0.6177 0.4338
Residuals 98 327.89 3.3459
# second case:
set.seed (123)
data2 <- data.frame (y = rnorm (100, 5, 2),
x = sample (c("A", "B", "C", "D", "E"), 100, replace = T))
anova(lm(y~ x, data = data2))
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x 4 4.89 1.2224 0.3572 0.8384
Residuals 95 325.07 3.4218
Here I have two different DF for the residuals (95 vs. 96) and $x$ (1 vs. 4): Is it valid to compare p-values as such? I know that the F-test considers $x$ and residual while calculating p-value. Is there any extra-caution needed?