Is it valid to compare p-values from test statistics with different DF?

Question

I need to compare results from data with different residual DF as my $x$ variable has different levels. The following is just an example (in R, for demonstration purpose, but this is not a R question):

# first case 
set.seed (123)
data1 <- data.frame (y = rnorm (100, 5, 2), 
 x = sample (c("A", "B"), 100, replace = T))
anova(lm(y~ x, data = data1))
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value Pr(>F)
x          1   2.07  2.0669  0.6177 0.4338
Residuals 98 327.89  3.3459               

# second case: 
 set.seed (123)
data2 <- data.frame (y = rnorm (100, 5, 2), 
 x = sample (c("A", "B", "C", "D", "E"), 100, replace = T))
anova(lm(y~ x, data = data2))
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value Pr(>F)
x          4   4.89  1.2224  0.3572 0.8384
Residuals 95 325.07  3.4218

Here I have two different DF for the residuals (95 vs. 96) and $x$ (1 vs. 4): Is it valid to compare p-values as such? I know that the F-test considers $x$ and residual while calculating p-value. Is there any extra-caution needed?

The predictors A, B, C, etc. have to mean something. What could you sensibly be comparing? You need to expand you question. What do you mean by compare p-value? Do you mean just make an evaluative judgment on the similarity of findings based on the analysis? — John, Jul 11 '12 at 13:00
And what would you say about the comparison of an analysis of A, B, C, D, E to an analysis of A, B even if you could say something? Let's say the p-value for A, B is lower than all 5 conditions analyzed together. What would that tell you about anything? Are A and B even the same thing in those two cases? — John, Jul 11 '12 at 20:53
A and B can be different, however as anova we are testing null hypothesis that the means of classes are equal ... — Ram Sharma, Jul 11 '12 at 21:05
So, if I do an ANOVA of the influence of artichoke and bean consumption on BMI and another on the presence of aardvarks, bears, coyotes, deer, and elephants on zoo attendance what would it mean if the p-values in ANOVA BMI are lower than ANOVA zoo? — John, Jul 11 '12 at 22:20

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

2

You can never compare p-values (ok, you can, but is wrong), regardless of the DF (even when they are equal!). The reason for this is that p-values are random variables, and as such, you only get one realization of them given your data.

For more information about this issue, please refer to this following post. Check Greg Snow's answer, which contains references and a simulation.

edited Apr 13 '17 at 12:44

Community

1

answered Jul 11 '12 at 18:01

Néstor

3,817

I think that this is a bit too emphatic. If one uses p values as indices of evidence against the null hypothesis, as one does in Fisher's significance testing, then it makes sense to ask about the level of evidence provided by the different p values. – Michael Lew Jul 11 '12 at 21:13
To say that one cannot compare p values because they are random variables makes no sense to me. Surely a sample mean is also a random variable, but we meaningfully compare them all the time. – Michael Lew Jul 11 '12 at 21:14
@MichaelLew you actually don't compare means, you take other statistics into account as well :-). You actually perform significance testing in order to see how significant the difference between two means are; how would you go into doing that with p-values? It makes sense to say that you can't compare p-values, because you actually don't know their distributions ;-). – Néstor Jul 11 '12 at 23:25

Is it valid to compare p-values from test statistics with different DF?

1 Answers1